SAN
JAY

APPLIED ML ENGINEER  ·  CS/CE PURDUE  ·  BUILDING AI THAT SHIPS

I build production AI systems — agents, RAG pipelines,
and the infrastructure underneath them. Currently at
Rox.

VIEW WORK ↓
SRIRAM

EXPERIENCE

03
2026

Rox

CURRENT

ENGINEERING INTERN

Building agents at a revenue intelligence startup. Working on the systems that power autonomous outbound — agent orchestration, context retrieval, and real-time execution pipelines that operate without human-in-the-loop.

Agents LLMs Revenue AI
2025

Ernst & Young

EY

APPLIED ML INTERN

Architected production LLM systems at enterprise scale. Built a containerized MCP server with LangGraph to generate architecture diagrams in near real-time. Designed a hybrid RAG system (BM25 + dense vector) that cut critical process turnaround from days to under an hour. Shipped an autonomous document extraction agent for fragmented tax and tariff data.

LangGraph Hybrid RAG MCP FastAPI BM25
2025

Boxsy.io

BOXSY

APPLIED AI SUB-TEAM LEAD

Led design and deployment of an AI Investor Update Agent — LLM-powered, context-aware, reducing manual drafting by 70%. Built end-to-end similarity search and personalization pipelines using pgvector and hybrid retrieval. Defined AI workflows on Vertex AI, integrated with FastAPI and Next.js for real-time delivery.

pgvector Vertex AI FastAPI Next.js

PROJECTS

06
01

Sift

MULTIMODAL LOCAL SEARCH ENGINE

High-performance local retrieval engine for instant semantic search across text, images, audio, and video. Backbone is Qwen3-VL-Embedding-2B mapping all modalities into a unified 2048-dim vector space. Audio handled via a CLAP-to-Qwen adapter — a learned 2-layer MLP trained with contrastive InfoNCE loss on AudioSetCaps. Filesystem watchdog daemon auto-indexes files using BLAKE3 change detection. Results bundled via hybrid scoring: embedding similarity + temporal proximity + filename Jaccard.

Qwen3-VL CLAP Qdrant Whisper InfoNCE PySide6
GITHUB ↗
02

KernelFusion

CUSTOM CUDA + TRITON BENCHMARKING

GPU kernel fusion study — fusing element-wise Add + ReLU into a single kernel to eliminate the global memory round-trip between operations. Benchmarked custom CUDA, Triton, and torch.compile on NVIDIA Nsight Systems. Key finding: torch.compile via Triton reads inputs once into SRAM, computes add+clamp, writes once to VRAM — rivaling hand-written CUDA while bypassing the memory bandwidth bottleneck entirely.

CUDA Triton PyTorch Nsight torch.compile
GITHUB ↗
03

LPCVC 2026

EDGE VISION-LANGUAGE RETRIEVAL · IN PROGRESS

COMPETITION · IN PROGRESS

Entry for the 2026 Low-Power Computer Vision Challenge (Track 1: Open-World Image-to-Text Retrieval). Hard constraint: combined image + text encoder latency under 35ms on Qualcomm XR2 Gen 2. Architecture: MobileCLIP-S2 (Apple CVPR 2024, 63.7% accuracy across 38 zero-shot benchmarks). 3-stage pipeline: COCO contrastive warmup → attribute-discriminative fine-tuning on RefCOCO + Visual Genome with hard negative mining → WiSE-FT weight interpolation. Quantized to INT8 via AIMET PTQ and compiled for Qualcomm AI Hub.

MobileCLIP AIMET INT8 Qualcomm AI Hub WiSE-FT InfoNCE ONNX
LPCV.AI/2026LPCVC ↗
04

Outsync

AGENTIC OUTREACH EMAIL GENERATION

LangGraph multi-step agent that generates personalized academic outreach emails grounded in your actual resume. Pipeline: scrape professor URLs → RAG over resume chunks (ChromaDB + sentence-transformers) → draft → structured refinement pass. Multi-step workflows consistently outperform single-prompt generation — Outsync was built around that principle. Deployed on Vercel + Render with Google OAuth.

LangGraph RAG ChromaDB FastAPI Gemini
GITHUB ↗
05

Glimpse

AI NEWSLETTER DIGEST SAAS

Full-stack SaaS that ingests Gmail newsletters and delivers a single AI-generated daily digest with audio playback. Celery orchestrates parallel summarization — multiple newsletters processed concurrently, then synthesized into a cohesive narrative. Shipped with ElevenLabs TTS, daily email delivery, and 23+ production deploys on Vercel.

Gemini Celery FastAPI ElevenLabs Next.js Redis
GITHUB ↗
06

Backtest Engine

MODULAR TRADING STRATEGY FRAMEWORK

Modular backtesting engine for testing trading strategies on historical equity data. Configurable execution model with market and limit orders, static and dynamic slippage (market-impact based on dollar volume), and multi-asset support via OpenBB. Performance analytics include Sharpe, Sortino, Max Drawdown, and annualized return. Streamlit UI for one-click strategy configuration and metric display.

Python OpenBB Streamlit Pandas NumPy
GITHUB ↗

ABOUT

I care about building AI systems that actually work in production — not demos.

CS/CE student at Purdue University, graduating Dec 2027. I've spent the last year shipping production AI at EY and Boxsy, and this summer I'm at Rox building revenue agents. My work sits at the intersection of applied ML, systems design, and the infrastructure that makes LLMs reliable at scale.

  • CURRENTLY Rox  ·  Engineering Intern
  • EDUCATION Purdue University, CE/CS  ·  Dec 2027
  • FOCUS LLM systems, agents, ML infra
  • LOCATION West Lafayette, IN

CONTACT