agsandrew - Fotolia
To keep up with data growth that outpaces its storage budget, Pacific Biosciences mixes cloud and on-premises storage from NetApp and data movement software from startup Komprise.
Pacific Biosciences, or PacBio, deploys NetApp Data Fabric technologies to scale capacity and performance between science teams in a shared storage environment. It turns to NetApp partner Komprise software to streamline data migration.
Flexible data movement helps PacBio dynamically assess the power and performance of its genomic sequencing equipment. The company with headquarters in Menlo Park, Calif., makes sequencers used in a variety of fields, including wildlife conservation, improving food supplies and aiding COVID-19 vaccine research.
NetApp Data Fabric enables enterprises to implement hybrid cloud data management. The Data Fabric technology is a software component inside the NetApp Cloud Volumes ONTAP operating system. Customers use Data Fabric to create a virtual NetApp platform in the public cloud that mirrors how data is stored on premises.
Genomics is a data-intensive discipline that requires dense, high-performance storage. Adam Knight, PacBio's senior IT manager, said the NetApp-Komprise combination better enables him to keep pace with data growth, currently at nearly 5 petabytes (PB) and growing.
"We use NetApp Data Fabric to shift performance around as needed to support different areas. Our storage budget isn't increasing at the rate of our data growth. We had to get creative and manage as many items we could under one umbrella. If we had storage islands, we wouldn't be able to leverage the combined effort of our scientists," Knight said.
Storage management to boost manufacturing
The combined scientific effort is key to PacBio's product development. Knight said much of the data generated internally stems from PacBio engineering teams striving to build product improvements into the equipment.
PacBio manufactures long-read sequencers, which allows DNA strands to be compiled in a manner that generates an effective reference of potential gene variations. The large machines have tight manufacturing tolerances and cost hundreds of thousands of dollars.
"The goal for manufacturing our instrument is to support more bandwidth and high-quality data throughput in less time for less cost. We generate a lot of new data each year," Knight said.
PacBio standardized its storage environment on NetApp FAS hybrid systems, using NetApp E-Series arrays as a multi-petabyte archive. Using predefined user policies, Komprise transparently moves data between the NetApp storage, keeping active data on the primary FAS arrays.
In addition, the genomics equipment maker added NetApp StorageGrid Webscale software-defined object storage to streamline tagging and searching of metadata.