Bangalore, India

Senior Forward Deployed Engineer.
GenAI Architect. Tech Lead.

I help enterprise teams ship production GenAI — voice agents, agentic workflows, RAG, and the infrastructure to make them stick.

PS Prathamesh Saraf

TOP 3% TALENT

Vetted byHire me

AI Specialization

Built with / shipped for

By the numbers

4.4k+ GitHub stars on Cognita (TrueFoundry's OSS RAG framework I co-built)
~95% Containment on the enterprise voice-agent platform I led
99.7% MCP startup-token reduction in mcp-guardian (160k → 456 tokens)
LF·2026 Linux Foundation MCP Dev Summit Bengaluru speaker, June 2026

About

I'm a senior forward-deployed engineer specializing in GenAI — voice agents, agentic workflows, RAG, and the infrastructure underneath. I ship in customer environments, with customer teams, against customer constraints, and I stay through the part where systems have to actually keep working.

For the last two years I've led platform-scale GenAI builds for a Fortune-50 healthcare buyer through TrueFoundry, with a signed engineering byline on the TrueFoundry blog for the memory layer I designed (TrueMem). Before that I was the founding tech lead at an AI startup. Toptal vetted me into the top 3% of engineers, AI specialization.

On the side I write My Adventures with LLMs — the book I wish I'd had when I started — speak at the Linux Foundation (MCP Dev Summit Bengaluru, June 2026, on mcp-guardian), and keep most of the code open.

Engagements

Where I've been doing the work

CVS Health × TrueFoundry · Senior Forward Deployed Engineer

Feb 2024 – Present · Remote (USA)

  • Embedded with a Fortune-50 healthcare buyer as the technical owner of their enterprise GenAI platform — voice agents, agentic workflows, and the platform plumbing underneath.
  • Led the design and rollout of a voice-agent platform that replaced legacy IVR flows; outbound deployments reached ~95% containment without escalation.
  • Core contributor to Cognita — TrueFoundry's open-source RAG framework (4.4k+ GitHub stars).
  • Forward-deployed solutions architect into enterprise GenAI engagements (large pharma, industrial) — scoping, prototyping, hand-off to customer teams.
  • Built internal platforms: a meeting-intelligence + realtime voice product, an MCP registry/guardian for tool governance, and an enterprise RAG framework.

ChatOwl · Technical Lead · Founding Engineer

Dec 2022 – Jan 2024 · Remote (USA)

  • Founding tech lead for an AI-augmented therapeutic-sessions platform — owned roadmap, architecture, and the full backend / infra stack from zero to production.
  • Shipped weekly releases across a cross-functional team of engineers, designers, and clinical advisors; established the engineering standards and review process.
  • Mentored junior engineers; ran hiring loops and onboarding for the first technical hires.

Saarthi.ai · Chatbot Developer

Aug 2020 – Aug 2021 · Bangalore, India

  • Built multilingual text and IVR chatbots on RASA for BFSI and edtech customers across Hindi, English, and regional Indian languages.
  • Authored an automated RASA conversation-testing harness — cut manual QA time by ~50% and made every release reproducible.
  • Drove an analytics-driven lead-generation loop (~20% lift in outbound reach) and a serverless containerization migration that trimmed cloud spend by ~15%.

Selected case studies

A few of the things I've shipped

Public retrospectives of platform-scale work at a Fortune-50 healthcare customer, signed bylines on GenAI infrastructure (TrueFoundry, Linux Foundation MCP Dev Summit), Toptal-led client builds shipping under real production load, and open-source projects you can read end-to-end.

A voice-agent platform for a Fortune-50 healthcare buyer

Senior Forward Deployed Engineer · Tech Lead · 2024 – Present

A platform-based blueprint for cost-effective, production-grade voice agents — telephony, orchestration, LLM and TTS swaps, evaluation, and the unglamorous reliability work that makes them stick in regulated environments.

  • ~95% containment on outbound flows in production deployments.
  • Telephony, agent runtime, and evaluation layered into a reusable platform — new agents ship in days, not quarters.
  • Public write-up of the architecture pattern on the CVS Health Tech Blog.

Python · Twilio / SIP · FastAPI · Kubernetes · LLM orchestration · Streaming TTS/ASR · Evals

From IVR to agentic — modernizing customer interactions

Senior Forward Deployed Engineer · Tech Lead · 2024 – Present

Replacing a brittle, decade-old IVR with LLM-driven conversational flows — without throwing away the integrations, compliance posture, or operational muscle memory the legacy system carried.

  • Legacy IVR flows migrated to an agentic conversation layer with measurable lift in self-service resolution.
  • Operations team kept their dashboards: existing analytics, transcript review, and quality scoring all carried over.
  • Pattern documented publicly on the CVS Health Tech Blog.

Python · LLM tool-use · Streaming ASR/TTS · Telephony · Observability

TrueMem — a model-agnostic memory layer for AI applications

Senior Engineer · Architecture, design, build, ship · 2025–2026

A persistent, two-tier (short-term + long-term) memory service for LLM applications. Distilled facts replace verbatim history, semantic retrieval surfaces what matters, and the same memory follows users across any model.

  • ~10 ms similarity search and ~45 ms end-to-end context preparation, dominated by embedding latency and parallelized against DB fetches.
  • Two-tier memory architecture (STM: running summary + last 20 messages; LTM: vector-stored, importance-scored, semantically deduplicated facts) modelled after working vs. long-term human memory.
  • Model-agnostic by design: memory lives outside the model, so consumers can swap GPT, Claude, or a fine-tuned in-house model without losing user context.

Python · FastAPI · Postgres · pgvector · Redis (queues) · OpenAI / Anthropic / TrueFoundry AI Gateway

mcp-guardian — putting MCP on a diet

Author · Maintainer · Speaker · 2026

An MCP proxy that replaces hundreds of tool schemas with three meta-tools — cutting 160k+ startup tokens to 456 (a 99.7% reduction) — and adds scoping, audit, and OAuth-aware fan-out to any upstream MCP server, with no client changes.

  • 99.7% reduction in MCP startup token cost — 160,143 → 456 tokens — across 248 PostgreSQL + 41 GitHub tools.
  • Proxy overhead ≈ -8 ms (within noise) measured against real upstream servers; 93% hit rate on keyword-based tool search.
  • 27/27 scope-enforcement security checks pass; OAuth, bearer, header, and pass-through auth modes all supported per-server.
  • Conference talk accepted at the Linux Foundation MCP Dev Summit Bengaluru (9-10 June 2026).

Python · MCP · asyncio · FastAPI · tiktoken · Docker · PostgreSQL MCP · GitHub MCP

Praxium — turning any PDF into an interactive, narrated course

Lead engineer · Toptal engagement · 2025

A SaaS that ingests a PDF and produces a structured, narrated course — personas, ABCD learning objectives, failure-mode analysis, a hierarchical course outline, per-lesson MCQs, AI-narrated videos, and a contextual chat over the source — driven by a resumable 10-state instructional-design state machine.

  • Live product — getpraxium.ai — covering ingestion, parsing, instructional-design generation, narrated video, payments, and RAG-powered explore mode.
  • Resumable 10-state instructional-design pipeline with per-stage checkpointing — personas → objectives → failure modes → outline → per-subsection content.
  • Async Synthesia video generation with polling, per-key-point clips, structured `videos.json` lifecycle, and full delete/regenerate flow.
  • RAG explore mode over course chunks with pgvector similarity search and grounded answers with source references.

React 18 · TypeScript · Vite · Tailwind · shadcn/Radix · FastAPI · SQLAlchemy 2.0 (async) · PostgreSQL · pgvector · AWS S3 · Anthropic Claude · OpenAI embeddings · agentic-doc (Landing.ai) · Synthesia · Stripe · Clerk · Langfuse / OpenTelemetry

MCP Gateway Catalog — one catalog, many tools, unified auth

Senior FDE · Platform design & delivery · 2025

A live product surface that lets any team browse 47+ Model Context Protocol servers, complete OAuth or API-key handshakes, and try every tool from the browser — without a local agent host.

  • Unified discovery + auth + invocation across 47+ third-party MCP servers (SEO, finance, productivity, research, project management, and more) from a single browser surface.
  • Three auth modes — OAuth2, API key, anonymous — routed through one consistent UX so consumers don't need per-provider wiring.
  • An interactive 'Try it' surface that lets engineers exercise any registered tool against real providers without standing up a local agent host.

MCP · Python · FastAPI · React · TypeScript · OAuth2 · API-key vault

Yukti — a workable, end-to-end RAG stack

Author · Maintainer · 2024

A complete RAG application built to be understood: ingestion, embeddings, retrieval, an evaluation harness, and a UI — small enough to read, real enough to deploy.

  • End-to-end reference implementation people can clone, read, and ship from.
  • Companion frontend and backend repos kept intentionally small so the architecture is the documentation.
  • Multi-tenant RAG framework with semantic chunking, multi-LLM eval harness, and a real operator console.

Python · FastAPI · ARQ · Unstructured.io · Docling · pgvector · Qdrant · LiteLLM · Vite · React · Tailwind · Docker

AIME — meeting intelligence + voice agents, end-to-end

Author · Maintainer · 2024

An AI meeting and voice-agent platform — capture, transcribe, summarize, retrieve, and run live agents on top — built as a clean separation of a Python backend and a modern web operator console so each side can evolve independently.

  • Full meeting-intelligence + voice-agent surface: capture, transcript, structured summary, retrieval over past meetings, and live agents on top.
  • Clear API contract between backend and operator console so either side can be swapped or extended.
  • Public companion repos for ideas explored in the voice and agentic work at scale.

Python · FastAPI · LiveKit Agents · LangGraph · Postgres · Google APIs · Vite · React · TypeScript · Tailwind

Ferremundo AI — product matching + ordering agents for B2B distribution

Forward Deployed Engineer · Toptal engagement · 2025–2026

An AI product-matching and ordering layer for a Latin-American B2B hardware distributor: hybrid lexical + semantic retrieval over the live SKU catalog, agentic ordering flows for resellers, and a production-grade data plane on Postgres + AWS RDS with full APM tracing.

  • Live product surface — ferremundo.com.ec — backed by an AI matching service for the distributor's reseller channel.
  • Hybrid retrieval pipeline (lexical + semantic) over the full SKU catalog, aliases, and barcodes — engineered for noisy, multilingual B2B query patterns.
  • Production data plane on AWS RDS with cron-safe, incremental, idempotent syncs from the warehouse and soft-delete semantics for retired SKUs.
  • Full Datadog APM + structured-JSON logging in production; CDK-managed infrastructure for repeatable deployment.

Python · FastAPI · PostgreSQL (AWS RDS) · Alembic · OpenSearch / hybrid retrieval · Embedding models · Datadog APM · AWS CDK · Docker

Natural-language database agent for an enterprise SaaS

Senior engineer · Toptal engagement · 2025

A production-grade NL-to-SQL agent built with Google ADK + FastAPI: a SQL Generator, an SQL Executor with retry-with-feedback, and a Response Generator orchestrated as a sequential + loop agent — converting business questions into safe, read-only Postgres queries and natural-language answers.

  • Multi-agent ADK pipeline: SQL Generator (LlmAgent) → SQL Executor (Custom) → Response Generator (LlmAgent), orchestrated by SequentialAgent + LoopAgent with bounded retry.
  • Safe execution by default: read-only SELECT, statement timeouts, row caps, connection pooling, and schema-aware prompting.
  • Session-aware conversation: maintains query history across turns so follow-ups ("and for last month?") resolve against the same context.
  • Two reference implementations — Python (ADK) and TypeScript — to fit the client's preferred deployment stack.

Python 3.12 · Google ADK · FastAPI · PostgreSQL · LLM provider (OpenAI / Anthropic) · uv · Pytest

Publications

Featured writing

Open source

MCP infrastructure, RAG stacks, voice-agent platforms, devtools and personal utilities — repos I maintain or have meaningfully contributed to, all readable end-to-end.

Cognita

★ 4.4k+

Open-source RAG framework I co-built at TrueFoundry — production-ready primitives for ingestion, retrieval, and serving.

Python · LangChain · FastAPI

MCP Gateway Catalog — browse and try Model Context Protocol servers

MCP Gateway Catalog

A browsable, in-browser catalog of Model Context Protocol servers — 47+ providers behind unified OAuth and API-key flows, with a live try-it surface. Built and shipped as a public product UI.

Python · FastAPI · React · TypeScript · MCP

mcp-guardian

MCP proxy that replaces hundreds of tool schemas with three meta-tools — 99.7% startup-token reduction, scoping, audit, OAuth-aware fan-out. Talk at Linux Foundation MCP Dev Summit Bengaluru, June 2026.

Python · MCP · FastAPI · asyncio · Docker

Yukti

RAG-as-a-service backend — Unstructured.io + Docling ingestion, pgvector + Qdrant retrieval, LiteLLM gateway, FastAPI + ARQ workers, multi-LLM eval harness.

Python · FastAPI · pgvector · Qdrant · LiteLLM· archived

Yukti operator console — collections, ingestion, and retrieval surfaces

Yukti — Frontend

Vite + React operator console for the Yukti RAG backend — shadcn/Radix design system, OAuth + RBAC.

Vite · React · TypeScript · Tailwind· archived

AIME — Backend

AI meeting intelligence + voice agents — capture, transcript, structured summary, retrieval, and live agents on top. LiveKit Agents + LangGraph + FastAPI + Postgres.

Python · FastAPI · LiveKit Agents · LangGraph · Postgres· archived

AIME operator console — meetings, calendar, voice-agent configuration

AIME — Frontend

Vite + React operator console for the AIME meeting-intelligence and voice-agent backend — meetings, calendar, integrations, secrets.

Vite · React · TypeScript · Tailwind· archived

mal-code

Companion code for the book *My Adventures with LLMs* — Transformers to DeepSeek in PyTorch, from scratch.

Python · PyTorch

PaymentTracking

PWA expense / income tracker for freelance sole-proprietors — Claude-powered OCR over invoices and FIRA certificates, live Google Sheets ledger, India-tax calculator (Sec 44ADA + new regime), all on Cloudflare Pages + R2.

React · Vite · Hono · Cloudflare Pages Functions · R2 · Claude Haiku

attendee (fork)

Self-hosting fork of attendee-labs/attendee with deployment hardening for meeting-bot workloads — see upstream for active development.

Python · Django· archived

Leadership & community

Contact

Working on something hard?

I take on a small number of forward-deployed engagements at a time. The fastest path is email or LinkedIn; Toptal is the cleanest contracting route.