The world generates data at a staggering rate.
At least 30% of an organization's unstructured data is redundant, obsolete or trivial (ROT), with some estimates pushing it much higher, according to ManageEngine.
The rise of 5G and IoT devices fuels this trend even more. Along with the massive amount of information comes an enormous amount of data waste, taxing storage and network systems more than ever. Data waste comes in a variety of forms, such as duplicate emails, outdated documents, bloated web content, unnecessary communications and poor data management. All this data requires more storage capacity, adds data management complexity and translates to higher costs.
Retaining ROT data can also increase security, compliance and legal risks and can also affect productivity and effective decision-making. The more data there is to manage and sort through, the greater the challenges for everyone.
More than ever, organizations must get a handle on their ROT data, but this is no small task. All key stakeholders must participate. Organizations must come up with a way to eliminate ROT data and minimize data storage waste going forward, which requires a shift in thinking across the entire organization. IT teams that are ready to address ROT data should consider the following guidelines when planning data waste-reducing strategies.
1. Invest the necessary time and resources
Rooting out ROT data requires the time and resources necessary to ensure that a good percentage of the waste is eliminated, without deleting data that the organization still requires. IT teams that rush through the process might remove the wrong data, which puts them at risk for compliance or legal violations. Or they might fail to remove a large portion of the ROT data and misspend both time and resources.
An organization must be willing to dedicate the necessary personnel to make this an effective effort. They must also invest in tools that streamline and simplify data management operations. For example, IT teams often benefit from tools that help them discover and catalog their data, as well as determine how employees access and use data. The right tools enable IT teams to automate many of their data management operations to make it easier to identify and delete ROT data. Automation can also prevent ROT data from seeping back into storage systems.
2. Inventory and catalog existing data
To start managing ROT data, take inventory of existing data to determine the amount of data an organization has, where it's located, who owns it, who can access it and how long it's been there. Consider other important factors about the data, such as if it is needed for compliance or business purposes, and any retention policies that may be relevant in deciding whether to retain it. The inventory should consider data in cloud and edge environments, along with data on servers, desktops and on-premises storage systems, such as NAS or SAN.
In conjunction with the discovery process, catalog the data, and use a taxonomy to define, label and group it. Determine which data is ROT. Consider whether the data is necessary, still relevant, outdated, duplicated elsewhere, needed for legal or compliance reasons or in any other way valuable to your organization. However, don't assume that an old document no longer offers value.
3. Delete existing ROT data
After identifying outdated data, begin the deletion process. Remove duplicate data, which can represent a good portion of ROT data. This is where a data deduplication tool is useful, as long as it can meet the needs of the organization. Although many storage systems now include deduplication capabilities, they might not be enough for larger organizations whose data is distributed across multiple storage platforms, in which case they would need a global deduplication tool.
But duplicate data is only one type of data storage waste. Eliminate all ROT data that has accumulated across storage systems. This requires a careful, systematic approach that ensures deletion of only the correct data, without putting good data at risk and without violating legal obligations or regulatory requirements. If there is any question about whether to delete certain data, consider copying it onto a cheaper storage platform, and then delete it off primary storage systems.
4. Implement data retention policies
One of the biggest reasons for the buildup of ROT data is that organizations hang onto their files long after they're needed. An organization should implement an extensive data retention policy that determines what data to retain, how long to retain it and when to delete it. A retention policy might also specify how to organize the data so the organization can search and access it later. The policy's goal is to ensure that data is retained only for as long as it's needed, whether eight weeks, eight months or eight years.
A comprehensive retention policy can help organizations automate compliance, reduce legal and regulatory risks, and lower storage costs. It can also increase the data's relevancy for advanced analytics and efficient decision-making. To be effective, however, the retention policy must consider business requirements, as well as legal and compliance issues. In addition, it should address the needs of different types of data, recognizing that some types are more valuable than others.
5. Create a single source of truth, such as a centralized repository
One of the most recommended strategies for dealing with ROT data -- and managing data in general -- is to consolidate data into a centralized repository. This, of course, seems to contradict such trends as hybrid cloud and edge computing, but a central repository can help reduce data redundancy, simplify data management and make it easier to secure the data. It also reduces the need for users to store data on their own systems.
That said, moving all data into a central repository is not going to work for every organization. What's more important is to create a single source of truth for each category of data. A single source of truth eliminates versioning uncertainty and helps to standardize operations, while improving data quality. It also makes it easier for users to work with the data. They know where to go for the correct version, and they all rely on the same content.
6. Implement plans for handling data waste
Carefully craft a plan that details how to handle ROT data on an ongoing basis. Such a plan goes together with data retention and deduplication policies, as well as overall data governance and management strategies. The ROT plan should define what data to keep, with a goal to reduce the amount of data storage.
Look for ways to decrease the creation of unnecessary data, such as discouraging web bloat or data hoarding. Assign content owners to the data to ensure quality and reduce the potential for ROT data. In addition, analyze current data workflows to identify where employees generate ROT data and what steps the organization can take to eliminate it. Data storage waste reduction planning should also include proper documentation that carefully describes the mechanisms and processes that cope with ROT data.
7. Educate and train personnel
When it comes to combating data waste, an organization relies heavily on the people who work with the data daily. They are, in fact, instrumental in curbing the amount of ROT data the organization generates and stores. For this reason, they should be involved in the planning process wherever possible and practical, and they should be fully informed about any policy changes and waste reduction strategies that could affect how they work with the data.
Educate personnel in the issues surrounding ROT data, and train them in the ways they can reduce data waste, with an emphasis on data reduction best practices. In addition, carefully communicate any changes made to data storage and access processes, such as moving data to a centralized repository, so no one is caught off-guard or unable to do their jobs. In some cases, employees might need specific training in a particular area, such as providing developers with details on how to reduce web bloat. However, everyone who works with data should be educated in how to eliminate data storage waste.