Prathamesh Saraf · Senior Forward Deployed Engineer

verified credentials

Certified Architect

TOP 3% TALENT

Vetted byHire me

AI Specialization

Speaker

built and shipped for

CVS Health
TrueFoundry
Praxium
Ferremundo
Toptal
CogenticAI
ChatOwl
IISc
IEEE
Saarthi.ai

By the numbers

1.5M / day Customer interactions classified by the IVR intent pipeline I led (per CVS Health Tech Blog)

~95% Containment on the enterprise voice-agent platform I led

99.7% MCP startup-token reduction in MCP-Guardian (160k → 456 tokens)

4.4k+ GitHub stars on Cognita, TrueFoundry's OSS RAG framework I co-built

About

I'm a senior forward-deployed engineer specializing in GenAI: voice agents, agentic workflows, RAG, and the infrastructure underneath. I ship in customer environments, with customer teams, against customer constraints, and I stay through the part where systems have to actually keep working.

I've led platform-scale GenAI builds for a Fortune-5 healthcare buyer, with a signed engineering byline on the memory layer I designed, TrueMem. Before that I was the founding tech lead at an AI startup. Toptal vetted me into the top 3% of engineers, AI specialization.

On the side, I wrote My Adventures with LLMs, the book I wish I'd had when I started. I was also a speaker at the Linux Foundation MCP Dev Summit Bengaluru (June 2026), talking about MCP-Guardian.

Engagements

Where I've been doing the work

CVS Health × TrueFoundry · Senior Forward Deployed Engineer

Feb 2024 to present · Remote (USA)

Embedded with a Fortune-5 healthcare buyer as the technical owner of their enterprise GenAI platform: voice agents, agentic workflows, and the platform plumbing underneath.
Led the design and rollout of a voice-agent platform that replaced legacy IVR flows; outbound deployments reached ~95% containment without escalation.
Core contributor to Cognita, TrueFoundry's open-source RAG framework (4.4k+ GitHub stars).
Forward-deployed solutions architect into enterprise GenAI engagements (large pharma, industrial): scoping, prototyping, hand-off to customer teams.
Built internal platforms: a meeting-intelligence and realtime voice product, an MCP registry/guardian for tool governance, and an enterprise RAG framework.

ChatOwl · Technical Lead · Founding Engineer

Dec 2022 to Jan 2024 · Remote (USA)

Founding tech lead for an AI-augmented therapeutic-sessions platform; owned roadmap, architecture, and the full backend and infra stack from zero to production.
Shipped weekly releases across a cross-functional team of engineers, designers, and clinical advisors; established the engineering standards and review process.

Indian Institute of Science (IISc) · Graduate Researcher, Cloud Systems Lab

Aug 2021 to Apr 2024 · Bangalore, India

M.Tech (Research) in Computational and Data Science at the Cloud Systems Lab; CGPA 8.1 / 10.0.
Lead author on CARL: Cost-Optimized Online Container Placement on VMs using Adversarial RL (IEEE Transactions on Cloud Computing). Recast container-to-VM placement as adversarial RL on top of a semi-optimal vector-bin-packing teacher; the agent learns its own reward function for VM cost minimization and ends up out-performing the teacher it imitates.
Evaluated on realistic Google and Alibaba production cluster traces (5k–10k container requests across 2k–8k VMs): ~16% lower VM cost than classic heuristics and SOTA RL baselines, ~1,900 placement decisions/sec onto ~8,900 candidate VMs, and robust to inference-time workload distribution shift.

Saarthi.ai · Chatbot Developer

Aug 2020 to Aug 2021 · Bangalore, India

Built multilingual text and IVR chatbots on RASA for BFSI and edtech customers across Hindi, English, and regional Indian languages.
Authored an automated RASA conversation-testing harness that cut manual QA time by ~50% and made every release reproducible.
Drove an analytics-driven lead-generation loop (~20% lift in outbound reach) and a serverless containerization migration that trimmed cloud spend by ~15%.

Featured writing

Blog

Notes & deep dives

Long-form, intuition-first writeups: the idea in plain language, then the math, a worked example, and runnable code.

All posts

Selected case studies

A few of the things I've shipped

Public retrospectives of platform-scale work at a Fortune-5 healthcare customer, signed bylines on GenAI infrastructure (TrueFoundry, Linux Foundation MCP Dev Summit), Toptal-led client builds shipping under real production load, and open-source projects you can read end-to-end.

// each card below is a link, click anywhere to read the full case study

SuperHype: human-in-the-loop employee advocacy for LinkedIn

Sole Developer · Product Owner 2026

A platform that turns one announcement into genuine, varied LinkedIn advocacy from your team: each post AI-drafted to the person's voice, approved by a real human, and published…

Read the case study →

MCP-Guardian: putting MCP on a diet

Author · Maintainer · Speaker 2026

An MCP proxy that replaces hundreds of tool schemas with three meta-tools, cutting 160k+ startup tokens to 456 (a 99.7% reduction), and adds scoping, audit, and OAuth-aware…

Read the case study →

MCP Gateway Catalog: one catalog, many tools, unified auth

Senior FDE · Platform design and delivery 2025

A live product surface that lets any team browse 47+ Model Context Protocol servers, complete OAuth or API-key handshakes, and try every tool from the browser, without a local…

Read the case study →

Setu: daily UPSC current affairs turned into practice

Sole Developer · Product Owner 2026 Ongoing

A full-stack SaaS that scrapes Indian Express and PIB daily, classifies articles for UPSC relevance, and generates syllabus-tagged MCQs, available to aspirants the same day.

Read the case study →

TrueMem: a model-agnostic memory layer for AI applications

Senior Engineer · Architecture, design, build, ship 2025 to 2026

A persistent, two-tier (short-term + long-term) memory service for LLM applications. Distilled facts replace verbatim history, semantic retrieval surfaces what matters, and the…

Read the case study →

Scalable, cost-effective voice agents: a platform-based blueprint

Senior Forward Deployed Engineer · Co-author on the public blueprint 2024 to present

A hierarchical voice-agent platform for a Fortune-5 healthcare buyer handling millions of daily customer interactions. A Master Agent orchestrates specialized SLM- and LLM-powered…

Read the case study →

From IVR to agentic: multi-vector retrieval for pharmacy intent classification

Senior Forward Deployed Engineer 2024 to present

Replacing a BERT plus LLaMA hybrid intent classifier in CVS Health's pharmacy IVR with a multi-vector retrieval pipeline on Qdrant: BM25 sparse retrieval, a fine-tuned dense…

Read the case study →

Praxium: source-grounded, narrated courses from any PDF

Lead engineer · Toptal engagement 2025 to present Ongoing

A SaaS that ingests a PDF and produces a structured, narrated course: personas, ABCD learning objectives, failure-mode analysis, a hierarchical outline, source-grounded…

Read the case study →

Ferremundo AI: fine-tuned retrieval, image embeddings, and LangGraph ordering for B2B hardware distribution

Forward Deployed Engineer · Direct client engagement 2025 to present Ongoing

An AI product-matching and ordering layer for a Latin American B2B hardware distributor. Fine-tuned multilingual E5 text embeddings, contrastively fine-tuned SigLIP-2 image…

Read the case study →

CogenticAI DB Agent: natural-language database queries for an enterprise SaaS

Senior engineer · Toptal engagement (CogenticAI) 2025

A production-grade NL-to-SQL agent built with Google ADK and FastAPI: a SQL Generator, an SQL Executor with retry-with-feedback, and a Response Generator orchestrated as a…

Read the case study →

AIME: meeting intelligence and voice agents, end-to-end

Author · Maintainer 2024

An AI meeting and voice-agent platform (capture, transcribe, summarize, retrieve, and run live agents on top) built as a clean separation of a Python backend and a modern web…

Read the case study →

Yukti: a workable, end-to-end RAG stack

Author · Maintainer 2024

A complete RAG application built to be understood: ingestion, embeddings, retrieval, an evaluation harness, and a UI. Small enough to read, real enough to deploy.

Read the case study →

Publications

CARL: Cost-Optimized Online Container Placement on VMs Using Adversarial Reinforcement Learning

IEEE Transactions on Cloud Computing · 2025

Adversarial reinforcement learning formulation for online container-to-VM placement, framed as a multi-dimensional vector-bin-packing problem. A learner agent mimics an offline semi-optimal teacher solver while automatically learning a reward function for VM cost reduction, ending up out-performing the teacher it imitates. Evaluated on Google and Alibaba production cluster traces (5k–10k container requests across 2k–8k VMs): ~16% lower VM cost than classic heuristics and SOTA RL methods, ~1,900 placements per second onto ~8,900 candidate VMs, robust to inference-time workload distribution shift.

Open source

MCP infrastructure, RAG stacks, voice-agent platforms, devtools, and personal utilities. Repos I maintain or have meaningfully contributed to, all readable end-to-end.

MCP-Guardian

MCP proxy that replaces hundreds of tool schemas with three meta-tools, a 99.7% startup-token reduction with scoping, audit, and OAuth-aware fan-out. Talk at Linux Foundation MCP Dev Summit Bengaluru, June 2026.

github

Python · MCP · FastAPI · asyncio · Docker

MAL-Code

Companion code for the book My Adventures with LLMs. Transformers to DeepSeek in PyTorch, from scratch.

github

Python · PyTorch

AIME

AI meeting intelligence and voice agents, end-to-end: capture, transcribe, summarize, retrieve, and run live agents on top, with a self-hosted meeting bot for ingest.

3 repos →

Python · FastAPI · LiveKit · LangGraph · React

Yukti

A complete, end-to-end RAG stack built to be understood: ingestion, embeddings, retrieval, an eval harness, and an operator console. Small enough to read, real enough to deploy.

2 repos →

Python · FastAPI · pgvector · Qdrant · React

PaymentTracking

PWA expense and income tracker for freelance sole-proprietors. Claude-powered OCR over invoices and FIRA certificates, live Google Sheets ledger, India-tax calculator (Sec 44ADA, new regime), all on Cloudflare Pages and R2.

github

React · Vite · Hono · Cloudflare Pages Functions · R2 · Claude Haiku

Cognita

★ 4.4k+

Open-source RAG framework I co-built at TrueFoundry. Production-ready primitives for ingestion, retrieval, and serving.

github

Python · LangChain · FastAPI

Leadership & community

Speaker, Linux Foundation MCP Dev Summit Bengaluru (June 2026), on putting MCP on a diet.
Toptal Top 3% Talent, AI Specialization.
Author, My Adventures with LLMs (Leanpub), with companion code on GitHub.
Student volunteer, IEEE/ACM CCGRID 2023.
Founding member, Coding Dojo (DKTES); ran Python workshops for non-CS students.

Contact

Working on something hard?

I take on a small number of forward-deployed engagements at a time. The fastest path is email or LinkedIn; Toptal is the cleanest contracting route.

Email · pratamesh1867@gmail.com
LinkedIn · linkedin.com/in/sarafpr
X · @S1LV3R_J1NX
GitHub · @S1LV3RJ1NX
Toptal · Hire me