Public retrospectives of platform-scale work at a Fortune-50 healthcare customer, signed bylines on GenAI infrastructure (TrueFoundry, Linux Foundation MCP Dev Summit), Toptal-led client builds shipping under real production load, and open-source projects you can read end-to-end.
A platform-based blueprint for cost-effective, production-grade voice agents — telephony, orchestration, LLM and TTS swaps, evaluation, and the unglamorous reliability work that makes them stick in regulated environments.
- ~95% containment on outbound flows in production deployments.
- Telephony, agent runtime, and evaluation layered into a reusable platform — new agents ship in days, not quarters.
- Public write-up of the architecture pattern on the CVS Health Tech Blog.
Python · Twilio / SIP · FastAPI · Kubernetes · LLM orchestration · Streaming TTS/ASR · Evals
Replacing a brittle, decade-old IVR with LLM-driven conversational flows — without throwing away the integrations, compliance posture, or operational muscle memory the legacy system carried.
- Legacy IVR flows migrated to an agentic conversation layer with measurable lift in self-service resolution.
- Operations team kept their dashboards: existing analytics, transcript review, and quality scoring all carried over.
- Pattern documented publicly on the CVS Health Tech Blog.
Python · LLM tool-use · Streaming ASR/TTS · Telephony · Observability
Senior Engineer · Architecture, design, build, ship · 2025–2026
A persistent, two-tier (short-term + long-term) memory service for LLM applications. Distilled facts replace verbatim history, semantic retrieval surfaces what matters, and the same memory follows users across any model.
- ~10 ms similarity search and ~45 ms end-to-end context preparation, dominated by embedding latency and parallelized against DB fetches.
- Two-tier memory architecture (STM: running summary + last 20 messages; LTM: vector-stored, importance-scored, semantically deduplicated facts) modelled after working vs. long-term human memory.
- Model-agnostic by design: memory lives outside the model, so consumers can swap GPT, Claude, or a fine-tuned in-house model without losing user context.
Python · FastAPI · Postgres · pgvector · Redis (queues) · OpenAI / Anthropic / TrueFoundry AI Gateway
An MCP proxy that replaces hundreds of tool schemas with three meta-tools — cutting 160k+ startup tokens to 456 (a 99.7% reduction) — and adds scoping, audit, and OAuth-aware fan-out to any upstream MCP server, with no client changes.
- 99.7% reduction in MCP startup token cost — 160,143 → 456 tokens — across 248 PostgreSQL + 41 GitHub tools.
- Proxy overhead ≈ -8 ms (within noise) measured against real upstream servers; 93% hit rate on keyword-based tool search.
- 27/27 scope-enforcement security checks pass; OAuth, bearer, header, and pass-through auth modes all supported per-server.
- Conference talk accepted at the Linux Foundation MCP Dev Summit Bengaluru (9-10 June 2026).
Python · MCP · asyncio · FastAPI · tiktoken · Docker · PostgreSQL MCP · GitHub MCP
A SaaS that ingests a PDF and produces a structured, narrated course — personas, ABCD learning objectives, failure-mode analysis, a hierarchical course outline, per-lesson MCQs, AI-narrated videos, and a contextual chat over the source — driven by a resumable 10-state instructional-design state machine.
- Live product — getpraxium.ai — covering ingestion, parsing, instructional-design generation, narrated video, payments, and RAG-powered explore mode.
- Resumable 10-state instructional-design pipeline with per-stage checkpointing — personas → objectives → failure modes → outline → per-subsection content.
- Async Synthesia video generation with polling, per-key-point clips, structured `videos.json` lifecycle, and full delete/regenerate flow.
- RAG explore mode over course chunks with pgvector similarity search and grounded answers with source references.
React 18 · TypeScript · Vite · Tailwind · shadcn/Radix · FastAPI · SQLAlchemy 2.0 (async) · PostgreSQL · pgvector · AWS S3 · Anthropic Claude · OpenAI embeddings · agentic-doc (Landing.ai) · Synthesia · Stripe · Clerk · Langfuse / OpenTelemetry
A live product surface that lets any team browse 47+ Model Context Protocol servers, complete OAuth or API-key handshakes, and try every tool from the browser — without a local agent host.
- Unified discovery + auth + invocation across 47+ third-party MCP servers (SEO, finance, productivity, research, project management, and more) from a single browser surface.
- Three auth modes — OAuth2, API key, anonymous — routed through one consistent UX so consumers don't need per-provider wiring.
- An interactive 'Try it' surface that lets engineers exercise any registered tool against real providers without standing up a local agent host.
MCP · Python · FastAPI · React · TypeScript · OAuth2 · API-key vault
A complete RAG application built to be understood: ingestion, embeddings, retrieval, an evaluation harness, and a UI — small enough to read, real enough to deploy.
- End-to-end reference implementation people can clone, read, and ship from.
- Companion frontend and backend repos kept intentionally small so the architecture is the documentation.
- Multi-tenant RAG framework with semantic chunking, multi-LLM eval harness, and a real operator console.
Python · FastAPI · ARQ · Unstructured.io · Docling · pgvector · Qdrant · LiteLLM · Vite · React · Tailwind · Docker
An AI meeting and voice-agent platform — capture, transcribe, summarize, retrieve, and run live agents on top — built as a clean separation of a Python backend and a modern web operator console so each side can evolve independently.
- Full meeting-intelligence + voice-agent surface: capture, transcript, structured summary, retrieval over past meetings, and live agents on top.
- Clear API contract between backend and operator console so either side can be swapped or extended.
- Public companion repos for ideas explored in the voice and agentic work at scale.
Python · FastAPI · LiveKit Agents · LangGraph · Postgres · Google APIs · Vite · React · TypeScript · Tailwind
An AI product-matching and ordering layer for a Latin-American B2B hardware distributor: hybrid lexical + semantic retrieval over the live SKU catalog, agentic ordering flows for resellers, and a production-grade data plane on Postgres + AWS RDS with full APM tracing.
- Live product surface — ferremundo.com.ec — backed by an AI matching service for the distributor's reseller channel.
- Hybrid retrieval pipeline (lexical + semantic) over the full SKU catalog, aliases, and barcodes — engineered for noisy, multilingual B2B query patterns.
- Production data plane on AWS RDS with cron-safe, incremental, idempotent syncs from the warehouse and soft-delete semantics for retired SKUs.
- Full Datadog APM + structured-JSON logging in production; CDK-managed infrastructure for repeatable deployment.
Python · FastAPI · PostgreSQL (AWS RDS) · Alembic · OpenSearch / hybrid retrieval · Embedding models · Datadog APM · AWS CDK · Docker
A production-grade NL-to-SQL agent built with Google ADK + FastAPI: a SQL Generator, an SQL Executor with retry-with-feedback, and a Response Generator orchestrated as a sequential + loop agent — converting business questions into safe, read-only Postgres queries and natural-language answers.
- Multi-agent ADK pipeline: SQL Generator (LlmAgent) → SQL Executor (Custom) → Response Generator (LlmAgent), orchestrated by SequentialAgent + LoopAgent with bounded retry.
- Safe execution by default: read-only SELECT, statement timeouts, row caps, connection pooling, and schema-aware prompting.
- Session-aware conversation: maintains query history across turns so follow-ups ("and for last month?") resolve against the same context.
- Two reference implementations — Python (ADK) and TypeScript — to fit the client's preferred deployment stack.
Python 3.12 · Google ADK · FastAPI · PostgreSQL · LLM provider (OpenAI / Anthropic) · uv · Pytest