Organizing the tsunami of data that often comes with scientific research is a tall order for any IT or data science professional.
This challenge fell to Jay Smestad, senior director of information technology at PacBio, whose job was to rein in the spiraling sequencing data size and associated costs that come when making the data-intensive tools used to map genomic code.
Pacific Biosciences, or PacBio, based in Menlo Park, Calif., creates petabytes of potentially valuable data that doctors, researchers and other Ph.D.-bearing employees might need to access at a moment's notice.
The company, which develops genomic sequencing systems and associated testing accessories, keeps data used in developing sequencers and other product testing for years. It has a variety of on-premises systems, both legacy and more modern, that range from NetApp, Vast Data, Spectra Logic and others.
"At PacBio, storage is our largest single IT cost," Smestad said. "It's the one we wrangle and try to stay on top of."
Smestad and his team chose the Komprise Intelligent Data Management software to automate tiering of less used data, shifting data away from pricy flash arrays needed for immediate, high-intensity workloads to cheaper tape storage for general archiving and deeper cold storage.
The team also uses Komprise automation tools to weigh better storage practices that keep data in the cheapest possible locations without sacrificing accessibility.
Jay SmestadSenior director of information technology, Pacific Biosciences
"I think our media cost is about $0.08 per gig," he said. "It's super cheap. We just back up everything. We don't ever delete unless the users delete [data] on their own."
Sinking in a data ocean
Before the IT team opted for the Komprise software, it had managed a handful of different storage arrays, which shifted data around according to scripts and tools developed by administrators. This often made data management across brands, operating systems and locations an arduous process. Stability was not a guarantee.
"You basically had administrators writing scripts or moving data from one file server to another," Smestad said. "It's a manual process. ... If [IT admins] fat-fingered it, it's gone. It's really kind of scary stuff. People don't like to do that job, [and] I really wanted to see [our storage] more as a commodity."
Further complicating matters was a lack of unified namespaces for organizing user storage spaces and data silos -- a task that Smestad and his team handled manually before adding the automation tools.
The software helps ease when data shifts around the new christened namespaces, Smestad noted, but also helps attach budget figures and estimates for data usage. Those figures were helpful when his department advocated for additional technologies and upgrades.
"It was something I could present to senior management," he said. "Putting something in a framework where you can share it with C-level folks and they get it is a really good part of the tool as well."
Competition among data management software companies continues to grow as demand increases the need for multi-cloud, nonproprietary management tools. Direct competitors to Komprise include StrongBox Data Solutions, Data Dynamics and Aparavi, and platform-specific software from storage hardware vendors including Dell EMC's ClarityNow or the open source iRODS.
Mapping the future
PacBio uses Spectra Logic's BlackPearl storage system to help store older and infrequently used data, with a majority of it backed up to tape, Smestad said. Data that isn't accessed by staff shifts to lower and lower tiers every six months.
"We get about two restore requests a year," he said. "[Our employees] generate [data] and they run their analysis. If it's garbage, they never look at it again."
A Vast Data flash array handles PacBio's more pressing storage and data needs. The IT team maintains this system and purges older data to preserve performance and fast access speeds.
"We have to move off those primary tiers and take that data to those lower-cost tiers quickly," he said. "That's what we do with Komprise."
Smestad expects to consider future Komprise tools to further automate policy creation and data classification.
Tim McCarthy is a journalist living on the North Shore of Massachusetts. He covers cloud and data storage news.