How to Tell If Your Snowflake Investment Is Working

TL;DR: A functional Snowflake environment and a productive one are not the same thing, and the gap between them is costing organizations more than they realize. After walking into dozens of these environments, the signals I see are consistent: cost overruns concentrated in four predictable places, business users quietly reverting to parallel data sources, governance policies that protect data so broadly they block access people should legitimately have, and no semantic layer to connect what the business means to what the data says. This post covers how to read those signals, what a healthy environment looks like from the inside, and the five questions I ask before recommending anything.

What this post covers:

The ratio I check before running a single diagnostic query
The four most common Snowflake cost leaks and how to find them
Why the most common governance failure is over-restriction, not under-restriction
Why every Snowflake AI initiative needs a semantic layer first
Five diagnostic questions for every Snowflake environment assessment

How to Tell If Your Snowflake Investment Is Actually Being Used

When I walk into a Snowflake environment that’s been running for a year or two, I’m not looking for what’s broken. I’m looking into whether the investment is actually being used.

There is a ratio that tells me most of what I need to know before I run a single diagnostic query. It’s the ratio of what the organization spends to prepare data to what it spends to use it. In a healthy Snowflake environment, that split is roughly 30% preparation and 70% active use. When the ratio inverts, when the majority of Snowflake activity is ingesting, transforming, and maintaining data rather than querying it, the platform is functioning as infrastructure, not intelligence.

According to a global study of 1,500 enterprise leaders commissioned by Seagate and conducted by IDC, only 32% of data available to enterprises is actually put to work. The remaining 68% goes unleveraged. That figure is from 2020. In my experience, it has not materially improved. Organizations have added more data, more pipelines, and more platforms, but the data that reaches decisions has not kept pace.

I call the extreme version of this condition a data cemetery: data exists, pipelines run, warehouses bill, and almost nobody queries anything. It’s not a Snowflake problem. It’s an architectural one, and I see it more often than most organizations expect.

The second thing I check: are business users really going to Snowflake as their primary source, or have they quietly built parallel sources elsewhere? If a business user has reverted to a spreadsheet, a separate BI tool, or a direct database pull, that tells me more than any utilization metric. It means the investment exists, and the return is going elsewhere.

The Four Most Common Snowflake Cost Leaks (And How to Find Them)

Enterprises now spend an average of $29.3 million annually on data programs, making data one of the largest line items in enterprise technology budgets. And yet nearly 97% of senior data and technology leaders say pipeline failures have slowed analytics or AI programs, with data teams dedicating 53% of their engineering time to maintenance rather than forward work.

Nine times out of 10, when I’m brought into a conversation about Snowflake costs, the organization already knows they have a problem. What they’re missing is visibility into exactly where the problem is concentrating. In my experience, it almost always shows up in one of four places.

Warehouses running continuously without a defined business reason

Snowflake is not a traditional database. Warehouses are not meant to stay on. When I see a warehouse running 24 hours a day, I have to dig into why. And more often than not, it’s a configuration that was set up for a reason that no longer exists. A scheduled job. A refresh cadence. An integration that was built and never reviewed. The fix is usually straightforward once the behavior is visible. But someone has to look.

Queries running against incorrectly sized warehouses

The first reports I install in any Snowflake environment track usage over the previous 30 days and flag queries that aren’t running against the correct warehouse size. The relationship between warehouse size and cost is not linear, and queries routed to warehouses that are inappropriately sized burn credits without a proportional return. Monitoring this on a regular cycle keeps it from compounding quietly.

AI and Cortex features applied without reviewing token costs

This is newer, and I’m seeing it more. Cortex capabilities are genuinely powerful and easy to deploy broadly across data flows without anyone asking whether the token cost is proportional to the value returned. It’s very easy to use AI in ways that cost significantly more than the benefit justifies. The discipline of asking, "Which LLM can best perform the task for the lowest cost?" is not yet an instinct for most data teams.

Storage costs elevated by settings that were never revisited

Time travel settings are too high. Data movement that was necessary for an integration that has since changed. Redundant copies across environments. None of these are dramatic individually. But together, they add up, and they tend to go unreviewed because storage is invisible until it appears on an invoice.

Snowflake Governance and RBAC: Why Over-Restriction Is the Real Problem

Gartner predicted in February 2024 that 80% of data and analytics governance initiatives will fail by 2027 due to a lack of a real or manufactured crisis. I think the crisis has arrived, and it has a name: AI.

For years, governance had a reputation problem. It felt like overhead, process-heavy, progress-slowing, the thing that turned a two-week data request into a six-week one. I understand why organizations treated it that way. The cost of imperfect governance was visible primarily in compliance conversations, not in daily operations.

AI changes that calculation. When a model or agent operates on ungoverned data, it doesn’t slow down and wait for human review. It uses whatever authority it has to work around access issues it encounters. Without governance architecture that AI can operate inside safely, you’re not deploying intelligence at speed. You’re producing untrustworthy outputs at speed, which is a different thing entirely.

But here is what I want to be clear about, because it almost never gets said directly: the most common governance failure I encounter in production Snowflake environments is not that data is too accessible, but that it’s too restricted.

The pattern looks like this. An organization protects sensitive data by granting access at the domain level: a wall around everything in the HR schema, or the finance schema, or the customer records table. Simple to configure. Easy to audit. And exactly wrong for operational use. The policy that was designed to prevent unauthorized access also blocks the analytics and AI access that people in the organization are legitimately entitled to have. The data scientist trying to model employee retention cannot get the data they need. The AI agent trying to surface operational insights sees a permission error and routes around it.

Snowflake's own Data Trends 2024 report found that enterprises that doubled their use of key governance features in the platform increased their use of governed data by nearly 150%. Better governance did not restrict access. It expanded it, because governance applied at a granular, role-based level is what makes data safe to use broadly, rather than theoretically available but practically locked.

What good Snowflake RBAC looks like in practice

SSO and SCIM in place, with role assignments automated at onboarding rather than managed manually after the fact. A well-defined role hierarchy. Data protections applied at the lowest level rather than thrown as a wall around entire domains. Where I see experienced Snowflake DBAs, governance tends to be solid. When organizations assign Snowflake to teams without deep Snowflake experience, SSO and SCIM are usually in place, but the role hierarchy and granular data security policies have not kept pace with the platform's growth.

The rule I apply is that security policies should be specific enough to protect sensitive data and permissive enough to allow the legitimate access that analytics and AI require. If your governance model cannot satisfy both conditions simultaneously, it needs rearchitecting before you layer AI on top of it.

Why Every Snowflake AI Initiative Needs a Semantic Layer First

The question I hear most often when organizations are trying to deploy AI on top of Snowflake is some version of: "We have all this data. Why can't we just ask it questions?"

The answer is almost always the same. The data exists. The layer that translates business questions into governed queries does not.

A Snowflake semantic layer is the mapping between how a business talks about its data and how that data is physically structured in the platform. It’s what allows a query phrased as "what is our churn risk this quarter?" to return an answer that reflects what the business means by "churn risk," not a technically correct answer to a technically malformed question. Without a semantic layer, analysts spend their time reconciling terminology instead of producing insight. AI returns outputs that are precise and wrong. New team members require weeks of context transfer before they can query anything useful.

How to know if you need one

The diagnostic I use is simple. You need a semantic layer if:

Business users are using different terms than your data models reflect
AI queries are returning technically correct answers that no business user would recognize as accurate
Onboarding a new function to Snowflake requires a data engineering project before anyone can run a useful query
Different teams define the same metric differently (e.g., "revenue," "churn risk," "approved vendor") and there is no single governed definition

What it enables in production

A global identity and access management company had hundreds of governed reports and data models in Snowflake. Insight delivery was entirely pull-based. Sales and customer success teams navigated multiple tools to assemble signals that should have been available in a single query, slowing execution and introducing inconsistency into revenue-critical decisions. Sparq deployed Ask.IQ to translate natural-language questions into governed Snowflake queries, aligned business terminology with the underlying data through a semantic layer, and captured every interaction to build a continuous feedback loop. The result was instant natural-language access to governed data, 40-plus distinct use cases identified across the business, and 100% of user questions captured in a feedback loop that surfaced what the organization needed to know but had not yet thought to ask. The semantic layer was the unlock. The data had been there for years. The semantic layer was the unlock.

For a deeper look at what that architecture requires, and the four gaps that separate a reporting estate from one built to carry intelligence, read From Reporting to Intelligence: The Snowflake Estate Built for Agents.

Five Diagnostic Questions for Every Snowflake Environment Assessment

These are the questions I walk through in every Snowflake environment assessment. They take about 30 minutes to answer honestly, and they surface more than most formal audits do.

1. Do your business users trust the data in Snowflake?

If the answer is uncertain, that is the most important signal in the entire assessment. Distrust is the root cause of parallel data sources, manual reconciliation, and AI outputs that get verified against spreadsheets before anyone acts on them. You cannot build a productive data platform on a foundation people do not trust.

2. Are you using Snowflake as a database or a data platform?

A database answers queries. A data platform powers AI connections, external data shares, application integrations, and analytics for every team that needs it. The distinction determines what your Snowflake investment is structurally capable of returning. Most organizations I encounter are operating somewhere between the two without a clear read on which side they are closer to.

3. Are your security policies granular enough for AI?

Domain-level walls are not the answer. The organizations getting the most from Snowflake are applying policy at the role level: specific enough to protect sensitive data, permissive enough for the legitimate access that analytics and AI require. If your policies cannot satisfy both conditions, that is the first thing to fix.

4. Can you onboard another business unit without a multi-month project?

This question surfaces the scalability of the current architecture faster than any technical review. If the answer is no, that constraint will compound with every new team, every new AI workflow, and every new capability the organization wants to deploy.

5. What is your ratio of data preparation cost to active data use?

The target is approximately 30% preparation, 70% use. The further that ratio drifts toward preparation, the more the platform is functioning as a data cemetery rather than a performance engine. If you have not measured this ratio, measuring it is the most useful thing you can do this week.

What to Do With the Answers

If these questions surface something worth going deeper on, our Snowflake Health Scorecard gives you a systematic read across all four areas—data accessibility, cost efficiency, AI and agentic readiness, and governance—in about 15 minutes.

Score your Snowflake estate. Get the Snowflake Health Scorecard →

If you want to work through what you find with someone who has been inside environments like yours, we run weekly office hours for exactly this kind of conversation.

Bring your hardest question. Join a Sparq Snowflake Office Hours session →

Sparq is a Snowflake Elite and CoCo Preferred Partner.

Frequently Asked Questions

How do I optimize Snowflake warehouse costs?

Start with usage reports that track queries over the previous 30 days and flag mismatches between query load and warehouse size. Look for warehouses running continuously without a defined operational reason — these are almost always configurations that were set up for a purpose that has since changed. Review AI and Cortex feature usage to confirm token costs are proportional to the value being returned. Take a look at serverless costs — excess costs might be caused by services like streams and clustering. Storage costs deserve a separate pass: time travel settings and redundant data copies are the two places I see the most silent spend accumulation.

How do I know if my Snowflake environment has a cost problem?

The four most reliable indicators are warehouses running continuously without a defined operational reason, queries routed to incorrectly sized warehouses, AI and Cortex features applied broadly without reviewing whether token costs are proportional to the value returned, and storage costs elevated by time travel settings or data movement that has not been reviewed since it was configured. Most organizations carrying a cost problem already know it. What they typically lack is visibility into which of these four places accounts for the majority of the overrun.

What does the 30/70 ratio mean for Snowflake?

It refers to the split between what an organization spends preparing data — ingesting, transforming, maintaining pipelines — versus actively using it through queries, analytics, and AI. A healthy environment runs approximately 30% preparation, 70% use. When the ratio inverts, the platform is functioning as infrastructure rather than intelligence, and the return on the investment reflects that.

What are Snowflake RBAC best practices?

Apply security policies at the role level, not the domain level. Domain-level walls are easy to configure but routinely block legitimate access alongside the unauthorized access they were designed to prevent. SSO and SCIM should be in place, with role assignments automated at user onboarding rather than managed manually. A well-defined role hierarchy — specific enough to reflect actual job function and data sensitivity — is what makes it possible to give AI agents and analytics tools the access they need without creating governance exposure.

Why do Snowflake governance policies sometimes block access people should have?

The most common configuration mistake I see is applying security policies at the domain level rather than the role level. A policy that protects sensitive HR data by placing a wall around the entire HR schema also blocks analytics access that non-sensitive data in that schema would legitimately support. Applying security at the lowest level — granular, role-based controls that reflect actual job function and data sensitivity — is the approach that makes data broadly usable while protecting what actually needs protecting.

What is a Snowflake semantic layer and how do I know if I need one?

A semantic layer is the mapping between business terminology and the underlying data models in Snowflake. It is what allows a query phrased in business language to return an answer that reflects business meaning rather than raw database logic. You need one if business users are using different terms than your data models reflect, if AI queries return technically correct but contextually wrong answers, or if onboarding a new team to Snowflake requires weeks of context transfer before anyone can run a useful query.

How long does it take to build a Snowflake semantic layer?

With the right architectural approach, a foundational semantic layer can be established in weeks on an existing Snowflake estate without replacing the underlying platform. The timeline depends primarily on how well-documented the underlying data models are and how much alignment exists between business terminology and current data structure. Cortex Semantic Models can accelerate the development cycle significantly, though governance review is still required to ensure the semantic definitions reflect how the business actually uses the data.

How is this different from what Snowflake's new agentic capabilities require?

This post focuses on the present-state questions that determine whether a Snowflake investment is currently returning what it should. The governance architecture, semantic layer, and cost management practices covered here are also the foundation that agentic workloads will require. Organizations that address these gaps now are building infrastructure that compounds in value as Snowflake's agentic capabilities — Agent Identity, Horizon Context, Datastream — move into production. The two conversations are sequential, not separate.

Your Snowflake Environment Is Working. Is Anyone Actually Using It?