Is BASE a more scalable alternative to ACID transactions?
ACID is used to summarize the basic properties of a transaction in the database sense of the word, not the logical "business" transaction sense. On the other hand BASE is used to summarize the properties many large scale Web sites follow to relax or reduce the strict interpretation of ACID as implemented in most commercial database products.
Phil Bernstein and I did a lot of research in this area for the second edition of our TP book, Principles of Transaction Processing, which came out last month. The short answer is no.
BASE stands for Basically Available, Soft state, Eventually consistent. The term was formally introduced about a year ago by Dan Pritchett in an article for ACMqueue, although the concepts behind it have been around for several years.
ACID stands for Atomicity, Consistency, Isolation, and Durability. ACID is used to summarize the basic properties of a transaction in the database sense of the word, not the logical "business" transaction sense. On the other hand, BASE is used to summarize the properties many large scale Web sites follow to relax or reduce the strict interpretation of ACID as implemented in most commercial database products.
If you read the article carefully, Dan does not actually suggest using BASE as an general alternative to ACID (although the title seems to imply this). Rather, Dan proposes BASE as an alternative specifically where ACID is used in the two-phase commit protocol for maintaining consistency across replicated data sets. Large Web sites, such as eBay (where Dan works) typically use a lot of replicated data (especially in caches) to improve performance, specifically to reduce latency for user interactions. Also, eBay is tremendous at fast replies, as anyone who has used it knows.
It's worth considering BASE seriously for this type of application, although most Web sites use custom code and products such as GigaSpaces, Oracle Coherence, and IBM's ExtremeScale are only just starting to gain adoption. These products all offer some aspects of BASE oriented systems to manage replicated data sets in memory, and provide greater application level control over when memory updates are flushed to disk.
The essence of what many large Web sites are doing differently, which Dan calls BASE, is that they are breaking the long-held assumption that the first order of business for the database management system is to reliably persist data updates either entirely or not at all. Typically, control is not returned to the application until this is either completely done, or undone (if partially done at the time of a crash, i.e. ACID). The benefit to the application is that in the event of a failure (and computers, being kind of sensitive and complex electronic devices, tend to always fail at just the wrong time) the application does not have to worry about partial updates and knows where to restart when the system recovers.
This assumption, however, translates into relatively expensive disk operations--expensive, that is, relative to memory operations. And this becomes especially expensive when multiple databases or database replicas are involved and exchange two-phase commit messages and perform additional disk operations for the added logging.
The key insight behind BASE style systems is that updating memory, especially replicated memory, is much faster, and allows replies to be returned much more quickly to users. However, this introduces potential inconsistencies, especially for replicated data sets, since the data is no longer immediately written to disk when it's updated. But because large Web sites place a higher priority on achieving a good user experience, they are willing to change the classic assumptions. The added risk for a few users is worth the benefit for the large majority of users.
When the data is written to disk, however, ACID is typically used, just as before. So even when BASE style systems are in use, ACID is typically there too, although not immediately, as in the past, and not for two-phase commit across replicated data sets.
Finally, it's important to point out that we are dealing with a couple of entertaining acronyms here, which are kind of imprecise, and it takes a bit of background study to get a solid perspective on this debate.