Alex - stock.adobe.com
The breakup: Why CISOs are decoupling data from their SIEMs
Breaking up is hard to do -- but some CISOs find that decoupling SIEMs from security log data feeds is worth it. Learn about the benefits and challenges.
The traditional enterprise SIEM pulls security log data from sources across the IT environment, then normalizes it, analyzes it and retains it. But because SIEM providers typically charge more to hold more data, organizations generally must retain less data than they would prefer and accept the limitations of subsequent analyses.
Additionally, SIEMs retain data in their own, often proprietary formats. In fact, how SIEM vendors parse and normalize data is one way they differentiate themselves from competitors. Each seeks to use unique schemas, compression techniques and specialized databases to improve both result quality and speed. Consequently, enterprises have limited input into how their data is ingested and digested, and proprietary parsing and formats can make it harder to change vendors.
Some CISOs -- finding the limitations and trade-offs of data ingestion and retention in SIEM too constricting -- are choosing to decouple their security log data feeds from their SIEMs. By doing so, they typically gain freer access to the data, increase control over retention timelines, improve analytical capabilities, rein in SIEM costs and break free of vendor lock-in. But decoupling data from the SIEM also has its challenges and requires significant commitment, investment and planning.
How decoupling data from the SIEM works
To decouple security data sources from the SIEM, security teams insert systems that they control in the middle of these data flows. In practice, this means establishing a separate, dedicated data store to hold the security log data, typically a data lake living in a comparatively inexpensive cloud storage service. It also means establishing a new data pipeline that takes in log data, preprocesses and normalizes it and then dumps it in the data lake. The enterprise then feeds its SIEM with data from the lake.
Benefits of decoupling SIEMs from data pipelines and storage
Establishing an independent, enterprise-controlled data layer between the sources of security log data and the applications that consume it -- e.g., SIEMs and other tools such as user and entity behavior and analytics -- enables the enterprise to do the following:
- Dictate the data schema for log records.
- Completely control filtering of records and easily vary it by destination.
- Completely control the retention horizons for every kind of data from each platform.
- Accurately and easily track all security data sources and all security data consumers.
- Easily enforce consistent adherence to institutional polices on data collection and retention.
- Easily add new security tools that need access to existing data feeds.
- Easily change -- and even drop -- SaaS and SIEM vendors without losing data.
Trading costlier SIEM-based storage for cheaper cloud bulk storage will also probably reduce the cost of storing security data, per se. But -- and this is important to understand -- that cost reduction might not result in net savings, as new tools or services and staff time costs could overbalance those savings.
Challenges of decoupling SIEM from the data layer
Of course, along with its benefits, decoupling data from SaaS or SIEM platforms also comes with challenges. These include the following:
- Designing a powerful, secure, scalable and cost-efficient data lake and data pipeline, including selecting appropriate data exchange protocols and data storage schemata.
- Engineering a powerful, secure, scalable and cost-efficient data lake and data pipeline, including selecting tools and services with which to build it and testing it adequately before putting it into production.
- Migrating to the new architecture without data loss or interruptions in security scanning.
- Operating and supporting the data lake and pipeline efficiently, including ensuring backups and continuity of service in the face of disruptions.
- Coping with latency created by interposing the new layer -- requiring attention in the design, engineering and operations phases, as well as continuous monitoring to ensure latency is within acceptable limits.
- Coping with compliance, as the new data layer must respect and enforce any applicable requirements -- depending on company type, sector and geography -- for data at rest and in motion.
A decoupling toolbox
CISOs creating a new enterprise security data lake will need to determine their strategies in the following areas.
SaaS data extraction
SaaS data extraction tools can be built in house using SaaS APIs. Alternatively, third-party approaches include such proprietary SaaS security posture management platforms as Obsidian Security, NetSkope SSPM and AppOmni, as well as open source tools such as Mondoo and OpenASPM.
Data pipeline
The data pipeline is the ingestion and pre-processing tool that receives raw logs and spits out records for the data lake in standardized format(s). Commercial products here include Cribl, DataDog and Splunk. Open source options include Vector, Logstash and Fluentd.
Data storage
Most larger organizations already have experience with data lakes, as well as preferred vendors, such as Snowflake and Google BigQuery, or open source options, such as Apache HDFS or MinIO.
Enterprises also have to consider data formats. Open standards should be everyone's first choice: Open Cybersecurity Schema Format for the log records heading out to SIEMs or elsewhere, for example, and storage formats such as Apache Parquet or Delta Lake for the data lake proper.
By decoupling cybersecurity data ingestion and retention from their SIEM platforms, CISOs can gain control, flexibility and depth while potentially reducing costs. But they will have to invest significant resources to capture these benefits.
John Burke is CTO and a research analyst at Nemertes Research. Burke joined Nemertes in 2005 with nearly two decades of technology experience. He has worked at all levels of IT, including as an end-user support specialist, programmer, system administrator, database specialist, network administrator, network architect and systems architect.