Remote (US) · Hiring loops: write me

Senior Forward Deployed Engineer.
GenAI Architect.

I help enterprise teams ship production GenAI: voice agents, agentic workflows, RAG, and the infrastructure to make them stick.

PS Prathamesh Saraf

TOP 3% TALENT

Vetted byHire me

AI Specialization

built and shipped for

By the numbers

1.5M / day Customer interactions classified by the IVR intent pipeline I led (per CVS Health Tech Blog)
~95% Containment on the enterprise voice-agent platform I led
99.7% MCP startup-token reduction in mcp-guardian (160k → 456 tokens)
4.4k+ GitHub stars on Cognita, TrueFoundry's OSS RAG framework I co-built

About

I'm a senior forward-deployed engineer specializing in GenAI: voice agents, agentic workflows, RAG, and the infrastructure underneath. I ship in customer environments, with customer teams, against customer constraints, and I stay through the part where systems have to actually keep working.

For the last two years I've led platform-scale GenAI builds for a Fortune-5 healthcare buyer through TrueFoundry, with a signed engineering byline on the TrueFoundry blog for the memory layer I designed, TrueMem. Before that I was the founding tech lead at an AI startup. Toptal vetted me into the top 3% of engineers, AI specialization.

On the side, I wrote My Adventures with LLMs, the book I wish I'd had when I started. I was also a speaker at the Linux Foundation MCP Dev Summit Bengaluru (June 2026), talking about mcp-guardian.

Engagements

Where I've been doing the work

CVS Health × TrueFoundry · Senior Forward Deployed Engineer

Feb 2024 to present · Remote (USA)

  • Embedded with a Fortune-5 healthcare buyer as the technical owner of their enterprise GenAI platform: voice agents, agentic workflows, and the platform plumbing underneath.
  • Led the design and rollout of a voice-agent platform that replaced legacy IVR flows; outbound deployments reached ~95% containment without escalation.
  • Core contributor to Cognita, TrueFoundry's open-source RAG framework (4.4k+ GitHub stars).
  • Forward-deployed solutions architect into enterprise GenAI engagements (large pharma, industrial): scoping, prototyping, hand-off to customer teams.
  • Built internal platforms: a meeting-intelligence and realtime voice product, an MCP registry/guardian for tool governance, and an enterprise RAG framework.

ChatOwl · Technical Lead · Founding Engineer

Dec 2022 to Jan 2024 · Remote (USA)

  • Founding tech lead for an AI-augmented therapeutic-sessions platform; owned roadmap, architecture, and the full backend and infra stack from zero to production.
  • Shipped weekly releases across a cross-functional team of engineers, designers, and clinical advisors; established the engineering standards and review process.

Saarthi.ai · Chatbot Developer

Aug 2020 to Aug 2021 · Bangalore, India

  • Built multilingual text and IVR chatbots on RASA for BFSI and edtech customers across Hindi, English, and regional Indian languages.
  • Authored an automated RASA conversation-testing harness that cut manual QA time by ~50% and made every release reproducible.
  • Drove an analytics-driven lead-generation loop (~20% lift in outbound reach) and a serverless containerization migration that trimmed cloud spend by ~15%.

Selected case studies

A few of the things I've shipped

Public retrospectives of platform-scale work at a Fortune-5 healthcare customer, signed bylines on GenAI infrastructure (TrueFoundry, Linux Foundation MCP Dev Summit), Toptal-led client builds shipping under real production load, and open-source projects you can read end-to-end.

// each card below is a link, click anywhere to read the full case study

Ferremundo AI: fine-tuned retrieval, image embeddings, and LangGraph ordering for B2B hardware distribution

Forward Deployed Engineer · Direct client engagement 2025 to present Ongoing

An AI product-matching and ordering layer for a Latin American B2B hardware distributor. Fine-tuned multilingual E5 text embeddings, contrastively fine-tuned SigLIP-2 image…

Read the case study →

Praxium: source-grounded, narrated courses from any PDF

Lead engineer · Toptal engagement 2025 to present Ongoing

A SaaS that ingests a PDF and produces a structured, narrated course: personas, ABCD learning objectives, failure-mode analysis, a hierarchical outline, source-grounded…

Read the case study →

A voice-agent platform for a Fortune-5 healthcare buyer

Senior Forward Deployed Engineer 2024 to present

A platform-based blueprint for cost-effective, production-grade voice agents: telephony, orchestration, LLM and TTS swaps, evaluation, and the unglamorous reliability work that…

Read the case study →

From IVR to agentic: multi-vector retrieval for pharmacy intent classification

Senior Forward Deployed Engineer 2024 to present

Replacing a BERT plus LLaMA hybrid intent classifier in CVS Health's pharmacy IVR with a multi-vector retrieval pipeline on Qdrant: BM25 sparse retrieval, a fine-tuned dense…

Read the case study →

MCP Gateway Catalog: one catalog, many tools, unified auth

Senior FDE · Platform design and delivery 2025

A live product surface that lets any team browse 47+ Model Context Protocol servers, complete OAuth or API-key handshakes, and try every tool from the browser, without a local…

Read the case study →

mcp-guardian: putting MCP on a diet

Author · Maintainer · Speaker 2026

An MCP proxy that replaces hundreds of tool schemas with three meta-tools, cutting 160k+ startup tokens to 456 (a 99.7% reduction), and adds scoping, audit, and OAuth-aware…

Read the case study →

TrueMem: a model-agnostic memory layer for AI applications

Senior Engineer · Architecture, design, build, ship 2025 to 2026

A persistent, two-tier (short-term + long-term) memory service for LLM applications. Distilled facts replace verbatim history, semantic retrieval surfaces what matters, and the…

Read the case study →

CogenticAI DB Agent: natural-language database queries for an enterprise SaaS

Senior engineer · Toptal engagement (CogenticAI) 2025

A production-grade NL-to-SQL agent built with Google ADK and FastAPI: a SQL Generator, an SQL Executor with retry-with-feedback, and a Response Generator orchestrated as a…

Read the case study →

AIME: meeting intelligence and voice agents, end-to-end

Author · Maintainer 2024

An AI meeting and voice-agent platform (capture, transcribe, summarize, retrieve, and run live agents on top) built as a clean separation of a Python backend and a modern web…

Read the case study →

Yukti: a workable, end-to-end RAG stack

Author · Maintainer 2024

A complete RAG application built to be understood: ingestion, embeddings, retrieval, an evaluation harness, and a UI. Small enough to read, real enough to deploy.

Read the case study →

Publications

Featured writing

Open source

MCP infrastructure, RAG stacks, voice-agent platforms, devtools, and personal utilities. Repos I maintain or have meaningfully contributed to, all readable end-to-end.

Open mcp-guardian on GitHub

mcp-guardian

MCP proxy that replaces hundreds of tool schemas with three meta-tools, a 99.7% startup-token reduction with scoping, audit, and OAuth-aware fan-out. Talk at Linux Foundation MCP Dev Summit Bengaluru, June 2026.

Python · MCP · FastAPI · asyncio · Docker

Open mal-code on GitHub

mal-code

Companion code for the book *My Adventures with LLMs*. Transformers to DeepSeek in PyTorch, from scratch.

github

Python · PyTorch

Open AIME Frontend on GitHub AIME operator console: meetings, calendar, voice-agent configuration

AIME Frontend

Vite + React operator console for the AIME meeting-intelligence and voice-agent backend: meetings, calendar, integrations, secrets.

github

Vite · React · TypeScript · Tailwind· archived

Open AIME Backend on GitHub

AIME Backend

AI meeting intelligence and voice agents: capture, transcript, structured summary, retrieval, and live agents on top. LiveKit Agents, LangGraph, FastAPI, Postgres.

github

Python · FastAPI · LiveKit Agents · LangGraph · Postgres· archived

Open Yukti Frontend on GitHub Yukti operator console: collections, ingestion, and retrieval surfaces

Yukti Frontend

Vite + React operator console for the Yukti RAG backend with a shadcn/Radix design system, OAuth, and RBAC.

github

Vite · React · TypeScript · Tailwind· archived

Open Yukti Backend on GitHub

Yukti Backend

RAG-as-a-service backend with Unstructured.io and Docling ingestion, pgvector and Qdrant retrieval, LiteLLM gateway, FastAPI and ARQ workers, and a multi-LLM eval harness.

github

Python · FastAPI · pgvector · Qdrant · LiteLLM· archived

Open attendee (fork) on GitHub

attendee (fork)

Self-hosting fork of attendee-labs/attendee with deployment hardening for meeting-bot workloads. See upstream for active development.

github

Python · Django· archived

Open PaymentTracking on GitHub

PaymentTracking

PWA expense and income tracker for freelance sole-proprietors. Claude-powered OCR over invoices and FIRA certificates, live Google Sheets ledger, India-tax calculator (Sec 44ADA, new regime), all on Cloudflare Pages and R2.

github

React · Vite · Hono · Cloudflare Pages Functions · R2 · Claude Haiku

Open Cognita on GitHub

Cognita

★ 4.4k+

Open-source RAG framework I co-built at TrueFoundry. Production-ready primitives for ingestion, retrieval, and serving.

github

Python · LangChain · FastAPI· archived

Leadership & community

Contact

Working on something hard?

I take on a small number of forward-deployed engagements at a time. The fastest path is email or LinkedIn; Toptal is the cleanest contracting route.