All case studies
National specialty retail · 700+ units

Conversational BI layer that lets category and district managers ask questions about 700+ stores in plain language.

5d → <1m
Report wait time for business users during the pilot
700+
Stores covered by the conversational layer
50,000+
SKUs in the underlying product taxonomy
Role-based
Different answers for category vs district vs finance roles
2
LLMs (Claude, Gemini) tested side-by-side on the actual data
Client context

A national specialty retailer running 700+ stores and 50,000+ SKUs, with business users frustrated by the five-day wait for analysts to pull reports. Leadership wanted a world where walking in Monday morning and asking what do I need to know returned role-specific insights in minutes.

The problem

Their data was already clean in BigQuery and Looker handled the dashboards, but business users couldn't interact with it in their own language. Category managers cared about one product category across all 700 stores. District managers cared about all categories in their 70 stores. Both were frustrated with the same five-day analyst queue, and most of the questions they were waiting for were ones they could have answered themselves if the system spoke their language.

Generic AI tools failed when they tried them. The tools didn't know what product-specific terms meant in context (a 'soft' category at this retailer means something different than at any other), didn't know how margin was calculated across the half-dozen rules the merchandising team applied, and didn't know which calculations lived buried in stored procedures versus LookML. Most conversational BI attempts across the industry fail to reach production for the same reason: the tools don't get customized for the business they're supposed to serve.

What we built
01

Foundation: business-context prompt engineering

Custom prompts that taught the LLM their business: what product categories mean in their taxonomy, how margin gets calculated under each merchandising rule, what 'underperforming' actually signals in their context, what a category or district manager cares about on a Monday morning. The prompts are versioned, reviewed by the analytics team, and updated alongside the underlying data models.

02

Integration: data tagging and metadata layer

A metadata layer mapping how transactions connect to items, categories, managers, stores, regions, inventory, and margin rules. Business logic that lived in stored procedures and LookML files got documented and surfaced to the LLM through structured metadata, so the model knows where to look when a question hits a calculation that isn't a simple SQL join.

03

Activation: smart layer + role-based UX

A Python and FastAPI smart layer with a React frontend, PostgreSQL for the metadata and prompt library, and Redis for caching. Role-based filtering on prompts and results means the same question from a CFO, a category manager, and a district manager returns three different, relevant answers. We tested Claude and Gemini side by side on their actual data and let the results pick the production model.

How we worked

The Blueprint ran three weeks of intense discovery. We sat with category managers, district managers, and the finance team during their actual decision moments (the Monday review, the Wednesday merchandising sync, the monthly close), cataloged every business term that needed prompt context, and traced which calculations lived in LookML versus stored procedures. We came out with a prompt-engineering roadmap and an explicit decision on which LLMs to evaluate in the pilot.

The ten-week Pilot scoped tight to a single category across all 700 stores and a single district across all categories, enough surface area to stress-test the role-based filtering but tight enough to ship in the window. We ran twice-weekly working sessions with the analytics team and a daily standup during the pilot. Claude and Gemini ran in parallel against the same prompts; we kept the better model and retired the other rather than running both in production.

Knowledge transfer ran from day one. The analytics team owned the prompt library by the end of week six and the metadata mapping by the end of the pilot. Our team handled the smart-layer infrastructure and the LLM evaluation harness; theirs handled the business-context work, which is where the system's accuracy actually lives.

Results
  • Pilot report wait time dropped from 5 days to under 1 minute for business users
  • Category managers got stores-across-categories views that used to take analyst time
  • District managers got all-categories-in-my-stores views on demand
  • Role-based prompts meant the same question from a CFO and a district manager returned different, relevant answers
  • Pilot proof points set up the production build across all stores and personas
Our category managers used to wait five days for a report to find out why a category was underperforming. Now they ask the question in plain English and get an answer in a minute, with the right calculation logic baked in.
VP of Merchandising Analytics, National specialty retail client

Outcomes start with a Blueprint. We plan, build and run from there.

Thirty minutes with a 829 Analytics partner. You leave with a prioritized view of what to build first, what's worth waiting on, and the business metric anchoring each move. Whether or not we end up working together.