ML Infrastructure RAG

Gromopo — One System, Three Components

by Logan Rudd

Jun 20, 2026

chat (RAG service) gromopo (platform) vouched (on-chain)

Architecture at a glance

Customer order (Next.js 15 · USDC on Solana)
  → addReview → Vouched Anchor program (PDA: one per wallet × restaurant)
  → batch indexer (getProgramAccounts + Borsh)
  → RAG ingest (Vertex AI embeddings · text-embedding-004)
  → Qdrant (multi-tenant · business_id payload filter)
  → FastAPI + LLM query parser (dynamic-k)
  → streaming answers on owner dashboard

The problem

Gromopo answers a deceptively simple question for a restaurant owner — “what are my customers actually saying?” — across review sources that don’t agree on shape or trust model: Google Takeout exports and purchase-verified, on-chain reviews. The system has to unify those, isolate every tenant’s data, and serve grounded answers. It’s my strongest evidence for general infra roles, because most of the work is the production thinking around the ML, not the ML itself. Built solo.

The three components

chat is the RAG service (Python 3.13, FastAPI, Vertex AI gemini-2.5-flash-lite + text-embedding-004, Qdrant over gRPC) — where the infra thinking lives. gromopo is the customer ordering platform and owner dashboard (Next.js 15 App Router, Firebase auth/Firestore, Solana wallet-adapter + SPL USDC payments). vouched is a Solana Anchor program (Rust) storing each review as a PDA, one per wallet per restaurant, cryptographically tied to the purchase wallet.

How they’re tied together

The three components meet at two integration boundaries, and I treat them differently on purpose. On-chain, there’s a real contract: Vouched’s Anchor #[account] struct is the origin, and its generated IDL is the canonical schema — the platform submits addReview, and the indexer reads accounts back via getProgramAccounts and Borsh decoding. Over HTTP, the platform-to-RAG call is coordinated by convention: a JSON request body (query, business_id, …), a shared ingest secret, and FastAPI’s auto-generated OpenAPI docs as the only spec.

What was hard, and what I decided

Integration contracts across a polyglot system. The same review shape lives in three languages — a Rust program, a TypeScript platform, a Python service. The Anchor IDL is the natural source of truth, but today it’s copied into each consumer, and the Python side even re-derives the Borsh byte offsets by hand — so a struct change in the program wouldn’t propagate to either downstream consumer. The lesson I took: in a multi-language system, the schema has to be generated from one origin, not mirrored by hand.

Retrieval that’s correct, and measured. Naive flat top-k is a trap for analytical questions. I built a recall@k evaluation harness (556-review corpus, 20 ground-truth queries), and it showed flat k=50 recovered only 17% of relevant documents on analytical queries — you’d need k=500 to reach 97%. Adding an LLM query parser that extracts structured Qdrant filters (a rating range, for example) pushed recall to 100% by k=50. Retrieval is now query-type-aware: analytical queries fan out wide, comparisons and example-fetches use much smaller k.

Multi-tenancy as a hard boundary. Every point carries a business_id, and every query is payload-filtered on it, so one tenant can never retrieve another’s reviews — isolation enforced at the vector-store query, not in application glue.

Idempotent ingestion. Point IDs derive from review_id, so re-ingesting an export overwrites rather than duplicates; re-runs and retries are safe.

Observability and CI as defaults. Structured logging (structlog) throughout and GitHub Actions CI — a multi-tenant service you can’t see into isn’t production, regardless of how good the model is.