A national specialty retailer running 700+ stores and 50,000+ SKUs, with business users frustrated by the five-day wait for analysts to pull reports. Leadership wanted a world where walking in Monday morning and asking what do I need to know returned role-specific insights in minutes.
Their data was already clean in BigQuery and Looker handled the dashboards, but business users couldn't interact with it in their own language. Category managers cared about one product category across all 700 stores. District managers cared about all categories in their 70 stores. Both were frustrated with the same five-day analyst queue, and most of the questions they were waiting for were ones they could have answered themselves if the system spoke their language.
Generic AI tools failed when they tried them. The tools didn't know what product-specific terms meant in context (a 'soft' category at this retailer means something different than at any other), didn't know how margin was calculated across the half-dozen rules the merchandising team applied, and didn't know which calculations lived buried in stored procedures versus LookML. Most conversational BI attempts across the industry fail to reach production for the same reason: the tools don't get customized for the business they're supposed to serve.
Custom prompts that taught the LLM their business: what product categories mean in their taxonomy, how margin gets calculated under each merchandising rule, what 'underperforming' actually signals in their context, what a category or district manager cares about on a Monday morning. The prompts are versioned, reviewed by the analytics team, and updated alongside the underlying data models.
A metadata layer mapping how transactions connect to items, categories, managers, stores, regions, inventory, and margin rules. Business logic that lived in stored procedures and LookML files got documented and surfaced to the LLM through structured metadata, so the model knows where to look when a question hits a calculation that isn't a simple SQL join.
A Python and FastAPI smart layer with a React frontend, PostgreSQL for the metadata and prompt library, and Redis for caching. Role-based filtering on prompts and results means the same question from a CFO, a category manager, and a district manager returns three different, relevant answers. We tested Claude and Gemini side by side on their actual data and let the results pick the production model.
The Blueprint ran three weeks of intense discovery. We sat with category managers, district managers, and the finance team during their actual decision moments (the Monday review, the Wednesday merchandising sync, the monthly close), cataloged every business term that needed prompt context, and traced which calculations lived in LookML versus stored procedures. We came out with a prompt-engineering roadmap and an explicit decision on which LLMs to evaluate in the pilot.
The ten-week Pilot scoped tight to a single category across all 700 stores and a single district across all categories, enough surface area to stress-test the role-based filtering but tight enough to ship in the window. We ran twice-weekly working sessions with the analytics team and a daily standup during the pilot. Claude and Gemini ran in parallel against the same prompts; we kept the better model and retired the other rather than running both in production.
Knowledge transfer ran from day one. The analytics team owned the prompt library by the end of week six and the metadata mapping by the end of the pilot. Our team handled the smart-layer infrastructure and the LLM evaluation harness; theirs handled the business-context work, which is where the system's accuracy actually lives.
Our category managers used to wait five days for a report to find out why a category was underperforming. Now they ask the question in plain English and get an answer in a minute, with the right calculation logic baked in.
Thirty minutes with a 829 Analytics partner. You leave with a prioritized view of what to build first, what's worth waiting on, and the business metric anchoring each move. Whether or not we end up working together.