Back to Portfolio

ChemMasters Data Platform

AI Data Platform

Production AI platform for a chemicals distributor — two integrated halves. (1) Email automation: a Celery Beat dispatcher polls inbound mail every 60s, acquires PG advisory locks per email for crash-safe mutual exclusion, hands off to a Unified Email Agent (Claude) that classifies, extracts, and routes; a companion Email Response Agent drafts replies. A truth-set eval harness keyed by Sage OR# regression-tests every Claude prompt change against curated real-world cases before promotion. (2) Data platform: unifies legacy Sage product/customer data with semantic search (Vertex AI Search), AI trip-report generation, and a document pipeline that extracts SDS/TDS PDFs via GCP Document AI. MCP server and REST APIs expose the unified data to AI agents and a Slack/React surface.

Impact & Results

Inbound email triage automated — Claude classifies and extracts per message; humans focus on judgement calls
Zero observed double-processing under worker crash/retry, thanks to PG advisory locks
Prompt iterations ship safely — every change must pass the Sage OR#-keyed truth set
Hours of manual data entry replaced by AI trip reports

Technical Architecture

Celery Beat 60s dispatcher with FSM-modeled per-email lifecycle PostgreSQL advisory locks for crash-safe mutual exclusion across worker pool Unified Email Agent (Claude) — classify + extract + route in one pass Email Response Agent with HITL approval before send Truth-set evaluation harness keyed by Sage OR# (e.g. AU00165) gating every prompt promotion GCP Document AI for SDS/TDS PDF extraction Vertex AI Search for semantic product search FastMCP server architecture Legacy system integration (Sage MySQL)

Challenges & Solutions

Challenge 1

Inbound order/inquiry email volume mixed with replies, forwards, and noise — needed reliable classification + extraction

Solution 1

Celery Beat dispatcher (60s) with orphan recovery + dispatch + auto-queue phases

Challenge 2

Worker crashes / retries could double-process the same email — needed strict mutual exclusion

Solution 2

PG advisory locks per-email — workers can crash safely without double-processing

Challenge 3

Prompt changes risked silent regression on real-world cases — needed a regression gate

Solution 3

Unified Email Agent (Claude) for classify / extract / route in a single pass over raw mail

Challenge 4

Legacy Sage MySQL database with inconsistent data formats

Solution 4

Email Response Agent drafts replies; humans approve before send

Challenge 5

Hundreds of SDS/TDS PDFs with varying layouts

Solution 5

Truth-set eval harness keyed by Sage OR# (e.g. AU00165) — every Claude change is run against curated real-world cases before promotion

Challenge 6

Sales team needed fast product search across thousands of chemicals

Solution 6

GCP Document AI pipeline for automated SDS/TDS extraction and classification

Technology Stack

FastAPI FastMCP SQLModel Celery Celery Beat PostgreSQL PostgreSQL Advisory Locks Redis MySQL (legacy Sage) Vertex AI Search GCP Document AI Anthropic Claude (Sonnet/Opus) Vertex AI / Gemini React Widget Slack SDK