Why Microsoft’s New Sentinel Data Lake Actually Matters

From a Cybersecurity Architect Who’s Seen the Struggles Firsthand

Over the years, we’ve migrated more than a few SIEM environments to Microsoft Sentinel. And no matter how different the organizations were, the same headaches kept showing up:

🔍 What logs do we really need to keep for detection?
💾 What can we afford to store long-term?
💸 And how do we not break the bank doing it?

Let’s be real, Microsoft Sentinel has always been powerful, but calling it “cost-effective” felt like a bit of a stretch. Especially when it came to storing secondary/compliance log – the kinds of data you don’t need every day, but desperately want during incident investigations.

In past projects, we used all sorts of workarounds: filtering logs, sampling, dumping less-used data elsewhere. In the most recent months, we heavily relied on Auxiliary logs. It sort of worked but always felt like a compromise.

✨ Now There’s a Better Way

With the new Microsoft Sentinel Data Lake, we have a new approach that allows us to short-circuit the problem.

Instead of cramming everything into the Analytics tier and paying high ingestion + query costs, the Data Lake lets you store huge volumes of data way more cheaply. And still access it when you need to.

If you’ve ever had to make that painful choice (“Can we afford to keep these logs?”) this is the answer.

Here’s what I’m seeing in the real world:

💰 Costs drop significantly: Based on early testing, we’re seeing around 85% savings by pushing logs to the Data Lake tier instead of Analytics.
🗄️ Long-term retention is finally realistic: You can store logs for up to 12 years without burning your budget.
🔎 Investigations are smoother: Running historical KQL queries to dig into old incidents is actually doable now, no more “we didn’t keep those logs” regrets.

🤔 So What’s the Catch?

Honestly? Not much, just some trade-offs you need to plan for.

The Analytics tier is still where your hot data lives: logs you use for real-time detection, alerts, dashboards, and fast queries. That’s not changing.

The Data Lake tier is more like cold storage, it’s where your context-rich but high-volume stuff lives: endpoint telemetry, flow data, infrastructure logs, and so on. This is the old “Auxiliary” tier.

📊 Quick Comparison

Feature	Analytics Tier (Hot)	Data Lake Tier (Cold)
Main Use	Real-time detection, dashboards, threat hunting	Long-term storage, compliance, historical hunting
Query Performance	Fast and indexed	Slower (full table scans)
Costs	Higher (standard ingestion + query)	Low ingestion, pay-per-query GB scanned
Retention	Up to 2 years	Up to 12 years
Great For	Azure AD sign-ins, alerts, DNS	EDR logs, network flow, anything high-volume

The cool part? The two tiers work together. You keep the signal in Analytics, and push the noise (but still important noise) to Data Lake. You can now build queries across the two data tiers. It’s not about logging less, it’s about logging smarter.

💬 “The Sentinel Data Lake isn’t just cheaper storage—it’s the missing piece that lets us keep more context, hunt smarter, and finally stop choosing between cost and visibility.”
— Maxim Deweerdt, Solution Architect

🛠 A Few Tips If You’re Considering the Switch

Here’s what I’ve learned working with teams making this move:

⚡ Existing logs: Once you enable Sentinel Data Lake, your existing logs from the Analytics tier are already available there too: no need to duplicate anything.
📦 Auxiliary logs: If you were already using Auxiliary logs, these will now show up in the new Data Lake tier. No changes needed here. After onboarding, auxiliary log tables are no longer available in Microsoft Defender Advanced hunting. Instead, you can access them through data lake exploration KQL queries in the Defender portal.

Also:

🕵️ Historical queries: You can run KQL queries directly against the Data Lake for threat hunting or investigations. It’s slower than querying hot data, but it works, and it’s a game-changer when you need historical context.
🚀 Promote logs on demand: You can even temporarily promote logs from cold to hot if you’re working on a specific case and want faster access.

⚠️ Potential Drawbacks of Moving to Sentinel Data Lake

While Microsoft Sentinel Data Lake brings significant cost savings and long-term retention benefits, there are trade-offs to consider:

Data Lake Interactive Queries limitations: Interactive queries are limited to 30,000 rows or 64 MB of data and timeout after 10 minutes. When selecting a broad time range, your query may exceed these limits. Use KQL jobs for queries that would exceed this limit.
Cost Complexity: The pay-per-query GB pricing model makes cost forecasting harder, and frequent cold-to-hot data promotions can add hidden costs.
Tooling & Workflow Adjustments: After enabling Sentinel Data Lake, Auxiliary logs no longer appear in Defender’s Advanced Hunting. Instead, they can be accessed only via the Data Lake Exploration KQL interface in the Defender portal.
Learning Curve: Analysts need stronger KQL optimization skills to minimize scan costs and improve performance.

Microsoft Sentinel Data Lake: Pros vs. Cons

✅ Pros	⚠️ Cons
85%+ cost savings compared to Analytics tier for high-volume logs	Slower queries due to lack of indexing
Long-term retention (up to 12 years)	Pay-per-query costs add unpredictability
Seamless integration with Analytics tier (hot + cold)	No real-time detection on cold data
Ability to store all context-rich logs without filtering or sampling	Requires log tiering strategy and operational changes
Historical hunting possible with KQL queries	Learning curve for query optimization
Promote cold data to hot for faster access when needed	Defender Advanced Hunting limitations (AUX data moved to Data Lake)

🔮 Looking Ahead: Why This Matters for the Future of the SOC

We talk a lot about AI in security – some of it hype, some of it real. At NVISO, we’re already pretty far down that road. Our SOC is highly automated, and AI helps speed up triage and investigations.

But if you want to get to the next level (where AI agents actually make decisions and take action) they need data. Lots of it. From everywhere. Over time.

That’s where the Data Lake becomes essential. It’s not just cheap storage – it’s the foundation for data-driven, AI-augmented security operations. If you don’t have that rich data layer, your AI will not be efficient.

So yeah, this isn’t just a technical tweak. It’s a shift in how we think about log management and SIEM design.

💡 Final Thoughts

If you’re working with Sentinel already, or planning to move to it, the Data Lake isn’t just “nice to have” – it’s quickly becoming essential. It helps you store more, hunt deeper, and build toward a more intelligent SOC.

My advice?

✅ For NVISO MSS customers, we are taking care of it 😉 Alternatively, Turn on the Data Lake preview and start testing. Remember that this is still in private preview.
✅ Figure out which logs should live where. Your cost savings depend on good tiering decisions.
✅ Ask for help if you need it. This shift touches architecture, analytics, retention policy, and SOC processes.

At NVISO, we’re already helping clients map this out. If you want to talk through it, get real expert advice – reach out.

Maxim Deweerdt

Maxim is the solution architect for NVISO’s Managed Services. He’s an expert in building modern SOC environments and is a principal instructor for SANS, where he teaches the LDR551: Building and Leading Security Operations Centers and the SEC511: Cybersecurity Engineering.

Published July 25, 2025July 25, 2025