Blog Post

Building Conversational Insights That Earn Trust: Our Approach to Snowflake Intelligence

By Nathan Miller, 12.18.2025


 

Marketing and analytics leaders have heard the conversational insights pitch: natural language access to your data, instant answers, no technical skills required. The vision is real. But the path from demo to production, where people actually use and trust the system, is where most implementations stall.

At Merkle, we've been building conversational insights on Snowflake Intelligence for clients across retail, financial services, and consumer brands. We've seen what separates successful deployments from shelfware. It comes down to two things: proving accuracy through rigorous evaluation, and designing for more than reactive Q&A.

Why Snowflake

We've built conversational insights across cloud platforms since the use case is platform-agnostic. But for many organizations, Snowflake is where the data gravity already exists. That's a meaningful starting point.

What makes Intelligence compelling isn't just the text-to-SQL capability. Snowflake also has AI tooling that lives natively in the platform. Cortex Analyst handles natural language queries against structured data. Cortex Search enables semantic retrieval across unstructured content. Cortex Agents can orchestrate across both, along with custom tools. The platform also provides the building blocks for automation: Tasks for scheduling and Alerts for threshold-based triggers that can be combined with the AI capabilities to move beyond reactive Q&A. Plus, Snowflake Intelligence provides a functional UI alongside API accessibility for custom integration.

Beyond Reactive Q&A

Traditional BI—and more broadly, most analytics workflows today—are inherently reactive. Someone notices a problem, requests a report, and waits for an analyst. Conversational interfaces improve the experience by allowing for near-instant deep dive analysis capabilities. However, the user still has to initiate, without always knowing the right questions to ask.

The opportunity for most organizations today is to shift this posture. Scheduled analyses that surface what matters before anyone asks. Trigger-based trend alerts that catch issues before they grow. Proactive delivery of insights when they're still actionable instead of after the opportunity has passed. The name of the game for modern analytics isn't just faster answers. It's rapid delivery of quality, relevant, actionable intelligence. 

The Accuracy Standard

Here's the benchmark that matters: open-source evaluations like BIRD show that human SQL experts achieve roughly 93% accuracy on complex text-to-SQL tasks. That's the standard that conversational systems need to approach, because that's what users are implicitly comparing against when they ask a question and expect a reliable answer.

The question becomes: how do you validate whether your implementation has achieved that accuracy? Where does it succeed and where does it struggle? What needs to change to close the gap?

The only way to answer these questions is through a rigorous evaluation framework. This has become a core focus of our team. We've drawn from top approaches, like Uber's QueryGPT methodology and Microsoft's G-Eval framework, to build a comprehensive evaluation system that assesses accuracy, query quality, semantic correctness, and edge case handling.

We've packaged this into an application that makes the process streamlined. Organizations can leverage their Golden Set—a curated collection of real business questions mapped to expected SQL outputs and behaviors—and run evaluations in one click, all within Snowflake's native infrastructure. Human-in-the-loop tagging allows analytics teams to diagnose what needs updating under the hood before re-evaluating, all in a single UI. This application is expected to be available on Snowflake Marketplace in Q1 2026.

A structured path to close the AI trust gap

 

Building for the Long Term

These systems are not “set it and forget it.” They require ongoing attention. Data models evolve. New questions emerge. The semantic layer that performed well at launch will degrade without maintenance.

Every implementation we deliver includes the scaffolding for continuous improvement: evaluation pipelines, feedback loops, and training for client teams to maintain and extend the system independently. We're not building dependencies, we're building capabilities our clients own. 

What Comes Next

The organizations getting value from AI analytics aren't the ones with the fanciest demos. They're the ones who can demonstrate, with evidence, that their systems work reliably.

That's what we build. If you're exploring Snowflake Intelligence or evaluating your current implementation, we'd welcome the conversation. You can learn more about our full Snowflake partnership here.

You might also like:

Building Conversational Insights That Earn Trust: Our Approach to Snowflake Intelligence