lona2010 - Fotolia
Blockchain technology has revolutionized how organizations store data and carry out distributed transactions. Even on public networks, blockchain can maintain secure, reliable and verifiable records accessible to all participants. But blockchain comes with a significant limitation: scalability. As the number of transactions grows, the system becomes slower, more expensive and less sustainable over the long term.
One way to address scalability is sharding, a process that breaks the data into manageable chunks distributed across different nodes. Blockchain sharding is already being used for private blockchain networks. But for public networks, sharding comes with several challenges that must be addressed before public systems can be effectively scaled.
The blockchain dilemma
Blockchain is a distributed ledger technology for recording transactions among one or more participants. In a traditional configuration, the ledger is distributed across multiple nodes, with each node maintaining a complete copy. Blockchain logs each transaction into the ledger in chronological order, and then synchronizes and validates the transactions, which are transparent and verifiable across all nodes.
As the number of transactions grows, so, too, does the ledger's size, resulting in more data being processed and stored on each node. Deploying additional nodes makes the problem worse because more time is needed for verification. Because each node must process every transaction, it's inevitable that users will be confronted with performance and reliability issues as latency increases, throughput decreases and storage costs rise.
Clearly, a better approach is needed for scaling public blockchain systems. Blockchain sharding is one of the most popular approaches. It offers a methodology for spreading out workload processing and data storage so no one node is forced to handle the entire transactional load. Instead, data is partitioned into separate buckets, with each node assigned to a particular partition. In this way, a node processes and stores only the transactions associated with the partition, or shard, to which the node belongs.
A blockchain sharding strategy
The concept of sharding has its roots in database systems that partition data across multiple servers to improve transactional processing. In a similar way, blockchain processing can be partitioned across multiple nodes to enable a parallel execution model that increases performance while reducing the amount of data that each node must process and store. Although the methods used to validate transactional data blocks must be modified, the result is greater throughput and lower latency.
The exact approach used to shard data varies from one application to the next, with no clear consensus on the best strategy. Even so, the underlying concepts are the same. Each node is assigned to an individual shard and is responsible for verifying the transactions within that shard, rather than verifying every transaction across the entire blockchain network.
At the same time, sharding incorporates transactional redundancy to ensure the validity and reliability of the data. After the data is partitioned into multiple shards, each shard is distributed across multiple nodes. For example, if a blockchain network supports 1,000 nodes, the data might be partitioned into 10 shards, with each shard assigned to 100 nodes. In this way, each node processes and stores only one-tenth of the data, but the data is still verified across 100 nodes.
The advantage of blockchain sharding is quickly apparent. Transactions can be processed in parallel, and more transactions can be processed per second -- 10 times the rate of a traditional blockchain approach. At the same time, processing and storage costs are much lower because each node is handling only a tenth of the data.
Four challenges of sharding
Sharding can be an effective strategy for private enterprise blockchain deployments, but using blockchain sharding for a public blockchain network isn't so easy. One of the biggest challenges is inter-shard communication.
When nodes are assigned to a shard, users and applications associated with that node see the shard as an independent blockchain system, rather than as a segment of a larger system. Communication between shards can be difficult to establish and requires a special development effort to implement a communication mechanism. Even with such a mechanism, inter-shard communication can lead to greater overhead, decreasing some of the advantages of sharding.
Sharding can also undermine some of the checks-and-balances that come with a more traditional blockchain approach. With sharding, users no longer download and validate the entire transactional history, so they can't be certain of the data's reliability and immutability, as determined by the chained sequence of transactional blocks. Without these safeguards, it's easier for a hacker to manipulate or control a shard, a situation known as a single-shard takeover, which can lead to lost or compromised data.
Another challenge with blockchain sharding is consensus and verification. Different blockchain approaches rely on different algorithms for reaching consensus across nodes. Two common algorithms are proof of work (PoW) and proof of stake (PoS). Both determine how transactions are verified across a distributed network, but they do so in different ways.
Although a comparison of these algorithms is beyond the scope of this article, the important point is they both can impact how sharding is implemented. In general, PoS is considered a better fit for sharding than PoW, which some consider unsuitable because of how it validates transactions. Unfortunately, many blockchain platforms rely on PoW for delivering services.
The differences in algorithms point to another challenge: lack of standardization for how to implement sharding. There are several different approaches to sharding, and many methodologies are still being researched, developed or tested as stakeholders address the various challenges. Each approach to sharding comes with its own pros and cons, making it more difficult for an industry standard to take hold.
The future of sharding
Scalability remains a significant challenge for public blockchain implementations, and sharding is emerging as one of the primary methods for addressing this issue. But sharding must be approached with caution to ensure it doesn't negatively affect blockchain processes or put data at risk.
It may turn out that blockchain sharding will have to be implemented in conjunction with other technologies -- such as new protocols for communicating across shard borders -- to deliver the necessary scalability. Until then, public blockchain storage will likely remain the monolith it is today, with performance degrading as it gets bigger.
A step-by-step guide to blockchain implementation
How some blockchain problems are being solved
Where blockchain data storage is headed