Next time you're meeting with your storage vendor sales rep, try bringing up the topic of archiving. You'll likely see an immediate reaction. The sales rep should be delighted that you recognize the need for archiving (so you do not need to be "educated") and that you are open to discussing a prefabricated "solution." Unfortunately, while it may be tempting to envision a department store aisle filled with pre-integrated, ready-to-use archiving technologies from competing vendors, it's beyond unlikely.
Is there some very good data archiving technology on the market? Yes. NEC Corp.'s HydraStor is a top-notch disk-based platform for data archiving -- if you like disk as an archive platform -- while products from tape vendors like Spectra Logic Corp., with its Black Pearl server and Amazon-Web-Services-derivative DS3 protocol, seem to be on the bleeding edge of object-oriented approaches.
There are also content-addressable storage platforms, deduplication appliances for squeezing more archive data into fewer spindles and even cloud services that invite you to park your nonchanging, rarely accessed data with reliable third parties that will do all of that maintenance and preservation work for you for a fee.
A lot of IT planners, however, don't really want to learn the details of archive design or the competitive differences between products -- because all they really want is to make the data go away. But doing archiving right requires the same discipline that planners would bring to choosing a platform for any other business application and its data. You need to consider the business process that the application serves and its functional requirements. Then you need to envision the application that is being developed to automate certain manual activities in the business process; the data it creates, uses and stores; and key metrics such as data change rates, access profiles and maintenance requirements. Only after these requirements are identified, analyzed and documented should you begin to tackle the design work for the underlying platform.
Starting with a predefined archive platform forces you to fit your archive practice into someone else's workflow, usually with poor outcomes.
To define your data archiving technology requirements, here are five questions to answer:
- How much data will I be archiving and what is the rate of data growth year over year? This very fundamental question affects the suitability of disk versus tape media in terms of their capacity and data ingestion speeds and bandwidth requirements for cloud services.
- How frequently will data be accessed once it is archived? Will reports run that include archive data, and are other kinds of uses possible such as e-discovery processes and litigation holds?
- Can data be abstracted away from hardware to protect against platform obsolescence and vendor lock-in? A commonly cited problem with content-addressable storage is that they are virtual "data roach motels," meaning they facilitate ingestion of data but introduce many barriers intended (by vendors) to impede data migration to competitor kits.
- Does the platform support the kind of archive container we plan to use? Are you planning to store your data in a native file system format, in a specialized object container or in a proprietary wrapper such as Adobe's PDF format?
- The last question is an umbrella category related to staffing costs. What are the labor or administrative requirements of the platform? Do we need full-time staff to administer the archive system or is the operation automated?
Other questions are suggested by the five listed above. However, pursuing answers to these general queries will likely send the planner down the proverbial rabbit's hole of archive platforms, exposing the many issues that must be tackled to make a good job of building an efficient archive platform.
There are exceptions, of course. In certain industries, such as entertainment media, workflows are fairly well defined across all member businesses. Everyone uses the same core set of production and post-production processes and tools, so a data archiving technology that "plugs into" the workflow directly is a good fit for a set of generally settled upon requirements. Other industries with well-defined and widely shared processes and workflows -- like medical imaging, oil and gas well-head data collection and storage, surveillance video and drone telemetry collection -- may also be candidates for archive platform plug-ins in the future.
For the broadest swath of IT decision makers, however, archive technologies are not readily pre-packaged; planners need to build them to specifications derived from their own analysis. There is no legitimate basis for preferring disk-based, tape-based or cloud-based platforms before you understand how your archive needs to function.