shyshka - Fotolia
ChaosSearch looks to bring order to data lakes
Data lakes are like junk drawers in the sky, but new tech from ChaosSearch organizes the mess and makes it searchable. Here, CEO Ed Walsh shares the details and what's next in 2021.
Storing data in the cloud is a relatively inexpensive and easy exercise. However, it's not always as easy for organizations to actually derive value from their cloud data lakes.
Among the vendors trying to make cloud data lakes more useful is Boston-based ChaosSearch, which has developed a data platform that enables users to access and query data using common APIs. ChaosSearch's patented technology does not require users to transform data to a structure that log analysis or data analytics tools need.
On April 29, the company launched its ChaosSearch 2.0 platform, providing organizations with the ability to use data lakes for log analysis using the ElasticSearch API. On Aug. 11, Ed Walsh took over as CEO, replacing co-founder Les Yetton, who remains on the board. Walsh had previously been the General Manager of IBM's storage division. On Dec. 16, ChaosSearch raised a $40 million Series B round of funding led by Stripes and Moore Strategic Ventures.
In this Q&A, Walsh discusses the need to make data lakes more usable and provides insight into where his company is headed.
Why did you join ChaosSearch?
Ed Walsh: It was hard to leave IBM. I was running this little $6 billion division, they were taking care of me and we were growing a business that hadn't grown in a while, so it was a good place to be.
That said, it was a no-brainer to go to ChaosSearch. I knew the technical founder [Thomas Hazel], and had talked to him about this particular idea seven years ago. At that time, it was a little too early. I caught up with him again in February of 2020. They spent years with 20-some-odd developers heads-down, developing this new platform which solves a really hard problem with patented technology. What they solved is the data lake problem.
Basically, you put your data into a data lake, which is cloud object data storage. ChaosSearch provides a thin data layer that allows you to index it and then provides open APIs for the tools that your data analysts use today, without the need for doing any structure or data transformation of that data.
Imagine, instead of the data lake being a junk drawer on the sky, we index it fully; it's fully searchable and ready for analytics at scale using your tools. What we do is we give you views of your data that can be accessed via an open API. So, you can use all the tools you're used to using and don't have to wrestle with getting all your data into a particular structure, before you can actually use your tools.
Why raise money now for ChaosSearch?
Walsh: The platform came out after being worked on for five years with the first release in 2019. We're hitting the market with the right product market fit as it's a huge problem we're helping to solve. We're looking to expand the company, take advantage of this opportunity to keep up with demand.
We weren't looking for money. But basically, we were getting really good enterprise plan traction and investors noticed the traction.
Are you concerned that Amazon Web Services, or another cloud vendor, could simply copy your service and compete with ChaosSearch?
Walsh: What we have built is hard to do. I think there are also particular situations where people have open source projects and a rival can just come along with a service and beat them on cost and performance. So that's not our issue.
This is patented technology, and it's not something simple that we came up with like a new GUI or workflow. This is actually our technology as a data layer. We can partner with the likes of Amazon, Google and Microsoft.
What's next for ChaosSearch in 2021?
Walsh: We currently support Amazon and we'll have Google support in the second quarter of 2021. By mid-year, we'll also publish support for Presto and by the of the year we'll also have Microsoft Azure support.
By the end of the year, you'll also see some use cases for machine learning as we see demand for predictive analysis.
Dremio speeds up cloud data lakes for business intelligence
Upsolver advances open cloud data lake, data pipeline efforts
Incorta 5 advances direct data mapping technology