A time series database (TSDB) is a software system optimized to sort and organize information measured by time. A time series is a collection of data points that are gathered at successive intervals and recorded in time order. Some examples of time series data (TSD) include changes to trades in a financial market, statistics collected from microservices, memory alerts, statuses, event data and dynamic assets.
Time series databases are especially useful to monitor access metrics, failure metrics, process behavior and workload monitoring. TSDBs can sort through large and complex amounts of data, making the information more accessible than if it were stored in a traditional database.
What is time series data?
There are several differences between time series data and regular data that includes a time field. For example, changes are inserted rather than overwritten in time series data, showing a history of information. Users can also perform more in-depth analysis with TSD. Furthermore, the real-time analytics capabilities of time series data provide an advantage over more static data. Time series data also reveals a full picture of a system over time and enables analysis of historical trends.
Common characteristics of TSD include:
- Time series data is always collected over a specified time period.
- Data from workloads is new and written as inserts, rather than updated to replace the data that already exists.
- When data is written, it is automatically assigned to the most recent time interval.
Other examples of time series data include:
- server metrics,
- application performance monitoring,
- sensor data,
- network data and
- click rates.
Time series data may also be referred to as traces, trends, profiles or curves.
Why is a TSDB important?
Time series databases can help businesses monitor information in real time and address problems as they occur. They can also be used to predict future problems and prevent them before they happen.
TSDBs are more user-friendly and provide better write rates and stronger query performance despite the large amount of data they organize. In some ways, time series databases perform the same functionality as normal databases. However, trying to use a relational or NoSQL database for time series data would result in much slower and less efficient performance.
Current technology increasingly requires a need to query, stream and analyze information in real time. This includes a need for higher volumes, higher velocities and higher specificity in searching data. In recent years these demands have led to a strong and steady increase in the use of TSDBs.
Querying in a time series database is similar to that in other kinds of databases, but instead of searching by values developers using a TSDB can search by a period of time that has passed, a date range or a particular point in time when an event happened.
Some benefits of using a TSDB include:
- The ability to scan extremely large quantities of data at once.
- If data is collected every millisecond, the database can compress it to a minute or even shorter intervals.
- TSDBs use writeable application program interfaces (APIs).
Use cases and examples
A time series database typically compartmentalizes fixed and dynamic data points. An example of this is when CPU utilization is measured to track performance. The fixed characteristics tracked might include name, data range, time range and units of measurement.
The dynamic metrics might include anything from timestamps to CPU usage percentage to efficiency metrics, as these data points change as they are tracked. Keeping fixed data separate from dynamic data makes it easier for TSDBs to search and bring up specific points of data quickly.
For example, if a company received a complaint that a shipping container sent the wrong product to a customer on a specific date, time series records can provide information on what product was in the container when it shipped. From here, the company can begin to understand and correct the mistake.
Examples of time series database options include InfluxDB, KairosDB, Prometheus and ClickHouse. These examples are open source, meaning anyone can access and edit the original source code.
Other popular TSDBs include:
- OpenTSDB and
Times series databases are usually an extension of PostgreSQL databases, and they share similar features. How a database can be used depends heavily on its features, but most can create, read, update and delete time-value pairs and their associated points. Some TSDBs also perform calculations, interpolation, filtering and analysis.
Some difficulties with time series databases include the massive scale of the data collected. A significant amount of memory storage is needed since large quantities of data need to be indexed with every instance saved. Most companies should develop a practical retention policy to automatically delete information that is no longer relevant. This will ensure there is enough space for new information. Furthermore, TSDBs often require a greater quantity of code as well as more complex code in the applications used to access them.