Nomad_Soul - Fotolia
Many different types of databases are on the market, each serving different needs. A time series database is deployed for multiple kinds of uses, including server monitoring, financial data and sensor data analysis, among others.
One of the most popular time series technologies is the open source InfluxDB time series database, which was developed by commercial open source database vendor InfluxData. The company also sells enterprise-grade products including the InfluxDB Cloud, which reached the 2.0 milestone in September 2019.
Beyond the InfluxDB time series database, a key technology that InfluxData is working on is the Flux data scripting and query language, which takes a different approach than SQL to enable users to analyze time series data.
In this Q&A, Paul Dix, co-founder and CTO of InfluxData, discusses the origins of InfluxDB, why SQL isn't always enough as a query language and where his company is headed in 2020.
What are the origins of the InfluxDB time series database and InfluxData? How did the company and the technology get started?
Paul Dix: So in mid-2012, I started this company and basically we wanted to build a SaaS [software as a service] product for doing real-time metrics and monitoring. Initially my idea was I wanted to do anomaly detection and machine learning on data sets, but to build that we first had to build all the infrastructure, so we could collect time series data at scale and query it.
Fast forward, basically another year, and we went to Y Combinator, we did the winter of 2013 batch and this product wasn't really taking off. But I could see that there was something from an infrastructure perspective. We did have some customers paying us and I talked to them, asked why they were paying us. They told us that they were using our product as a time series platform.
So we pivoted and the goal was initially to build a database, but that later morphed into being an entire platform for working with time series data. My goal was to build something that was generally useful for developers to create their applications with.
Why did you make InfluxDB open source?
Dix: Something that is definitely very true today, and it was also true in 2013, is that developers want to work with open source tools.
Paul DixCo-founder and CTO, InfluxData
The tools developers choose to build their applications with are essentially how they're building their career. Those tools are going to be line items in their resume that says, I know this technology, I can build with these tools. So realistically, they want to adopt open source because they can take that from job to job, company to company and from cloud provider to cloud provider.
Why do cloud native and Kubernetes matter for the InfluxDB time series database?
Dix: As a technology vendor, we have to go to where our customers are. Sometimes our customers are in Azure, and sometimes they're in AWS and sometimes they are in Google, and sometimes they're in their own data center. So basically, we said Kubernetes is the base layer, it's the common layer, it's the lingua franca we can count on to exist out there in the world.
The first version of our cloud product was essentially our enterprise database deployed on AWS and we only ever offered it in AWS, because it was specifically designed for it. Our Cloud 2.0 product was designed from the ground up to run as a stateful application inside of Kubernetes.
We have all three of big cloud providers, and we plan on offering InfluxDB as a managed service inside a customer's own Kubernetes installation. The only thing that we have outside of Kubernetes is object storage that we use for long term storage, but it just so happens that that's something that you can count on existing wherever you go.
Why did you create Flux and how does it relate to SQL?
Dix: When it comes to working with time series data, I think SQL as a language is kind of weird. You have to do these kinds of contortions to make to make it elegant to work with time series data.
I think the most elegant way to think about working with time series data is as functions that do transformations of data. Reasonable engineers can disagree about this, some people love SQL, some people prefer a functional style, that's fine.
But the other thing is, I didn't want to just have a query language. In the time series use case, it's not enough to just be able to write data and query data. You need to be able to process that data in real time, or periodically, so that you can do data enrichment, monitoring and alerting, and you need to be able to do it at scale and distributed wherever you need. So what I wanted is essentially a programming language that will be embeddable into every spot where we run our data storage.
One of the other design goals of Flux is that it's not going to be tied to InfluxDB specifically. We want it to be useful to query data from other data sources.
What's coming in 2020 from InfluxData?
Dix: So we're going to get to the full release of InfluxDB 2.0 as open source. We're going to get to the 1.0 release of Flux.
We also have a user packaging system that is going to be defined so that people will be able to find bits of Flux code, dashboard rules, monitoring, alerting rules, and package them up into one thing that can be executed by the open source InfluxDB or cloud products.