SIEM isn't dead, it's only disrupted

4 months ago 5

The old SIEM vision was “put everything in one hot pile and index the hell out of it and then run out of the box content over it so you can pivot from known to unknown questions smoothly”… which sounds lovely until you have to pay for it. In that world the SIEM functionality is like a keystone in an arch, it holds the whole security process together, acting as the one place where everyone can search everything. From the entry level SOC contractor to the deepest threat hunting wizard, from the retail MSSP operator to the grandest military-industrial complex member, SIEM could be a common interface. Since SIEM began, new security products have just kind of been coopted into it, because it was so damned useful.

In this model, the SOC and the Incident Response functions both use the SIEM to look at their data. The SOC side uses it to look for known knowns. Data is schematized so the SIEM can use it, which allows rules, dashboards, content matching, workflows, and automation to function. It’s important for these functions to be supported by fast ingest pipeline and rapid time-to-glass. It’s also important for known known rules to be processed into actionable alerts quickly, though compromises are inevitable. There is not a massive need for old data here: a week is arguably overkill, once models of past behavior have been made to feed algorithmic analysis. Known known rule processing is about catching bad guys in the act, not discovering them later. So there’s a bunch of CPU load to schematize and normalize data, a bunch of load to make models from the data, a bunch of load to test the data against rules… it’s not surprising that this set of functionality was designed for the on-prem world. Buying capacity and running it hot until it depreciates or leasing capacity under a “we-will-absolutely-burn-it-all” plan are the underpinning assumptions of SIEM, versus renting capacity for unpredictable and ephemeral loads. In other words, SIEM is designed for an on-prem world, and moving it to cloud has been ruinously expensive. Customers feel trapped in the tool, juggling storage and compute credits against each other and trying to figure out which corners to cut. The SOC needs a week or two of cooked data, but the threat hunters need a year of cooked and raw and compliance suggests many years of raw!

Meanwhile, “unpredictable and ephemeral load” better describes the Incident Response team. Schematization is less important on the Incident Response side, as a rule: they’ll use it if it’s there, but they’ll also disagree about what fields matter and how to model them, and it might be easier to provide access to the raw data. Worse, they’ll also want to do extensive needle-in-a-haystack searches across massive time periods, scanning petabytes of data to see where that indicator started showing up. Massive needle-in-a-haystack searches are the worst case scenario for cost management, and security operations needs lots of them. They’ll also use threat intel and UEBA for risk models, and then they’ll be using more nebulous open source intel and behavioral analytics and gut-driven searches to look for unknown knowns. Now you’re talking about lots of memory and disk for lookup storage, expensive network range address matches, constant recalculation of per-entity risk models. How many entities should be risk modeled in a large enterprise? Two hundred fifty thousand employees, each with a laptop and mobile phone, twenty thousand long-lived server systems, and let’s not even think about containerized microservices right now… corners are going to be cut.

It’s no surprise that the central SIEM vision has been showing stress all along. In its earliest implementations, it depended on a vertically scaled RDBMS core that couldn’t handle the load, so second generation systems rebuilt on more flexible big data cores. That created the problem of index-time versus search-time normalization, along with the opportunity to build sidecar content. Third generation systems have tried various levels of component explosion, from content engines you can run in your own account to everything’s a cost (but don’t worry about it) to a “if you build it they will come” field of dreams (also pay as you go of course). The common core of this more modern vision is “put everything in a data lake for storage, pull out parts to do specific tasks in specific tools.” There’s disagreement over where to use schematization and where to use raw data, but everyone agrees on one thing. In this world SIEM is a function that is only fed selected and schematized data which it knows how to work with. SIEM is great for known-known detection, basic UEBA and risk, automations with SOAR. So we give it what it knows how to use in reasonable amounts, let it do what it needs to do, and stop asking it to support incident review and forensic investigation over months and years of archive. That’s what archive storage is for.

So does SIEM become useless? After all, continually scanning for known knowns has its weaknesses — there’s only so many concurrent rules you can afford to run, and then heuristics and UEBA come off the top of that. We have been down this road before with antivirus software. You can’t not do it though, that’d be silly, and besides there’s rules. It does seem tempting to think about automation and AI here, to continue building SOAR into or alongside SIEM. On the positive side, past-predicts-future automation fits well into a world of known inputs and outputs; if you’re already doing it with a three-ring binder of response documents, it’s worth investigating if a script can do it faster and cheaper. On the negative side, those old days of AV and HIPS and spam-fighting blocklists also had incidents of attackers figuring out the automation and using it to deny legit service. That’s losing the A of Confidentiality Integrity and Availability, not the kind of thing CISOs like to explain to boards. I expect automation will be judiciously added by most, aggressively pursued by some, ignored by others, like any tech fad. But I also expect that SIEM isn’t going to be completely discarded, it’s too useful.

SIEM is no longer the monolithic keystone of the security operations arch. It’s now a component, a modular data analysis tool that fits into a broader world of hierarchical storage, data lakes, and just-in-time analysis. It’s been disrupted.