Engineering Manager · Applied AI · Cloud
I've spent 20+ years leading engineering teams at State Street, Centene, and EY. Teams up to 12 engineers across the US, India, and Poland. The products I've built serve 40+ enterprise customers, drove $250K/mo in cost savings, and handle real regulatory scrutiny where a bad deployment means financial loss.
Now I'm bringing that same discipline to AI. I'm building a series of production-grade AI systems that cover RAG pipelines, embedding fine-tuning, and multi-agent orchestration. Every project has evaluation frameworks, architecture decision records, and metrics I'd actually trust in a code review. The goal is to lead AI engineering teams with the same rigor I bring to building the systems myself.
🌐 rubyjha.dev · 💼 LinkedIn
These aren't API wrappers. Each project solves a real engineering problem with measurable outcomes, reproducible from committed code.
| # | Project | What I Proved | Key Result | Stack |
|---|---|---|---|---|
| P1 | Synthetic Data Pipeline | Self-correcting generation with 5-layer validation | 36 failures → 0 · 81.7% inter-rater agreement | Python · Pydantic · OpenAI · Instructor |
| P2 | RAG Evaluation Framework | 16-config grid search. Reranking was the single biggest lift | Recall@5 0.625 → 0.747 (+19.5%) · 384+ tests | Python · FAISS · LangChain · RAGAS · Cohere |
| P3 | Contrastive Embedding Fine-Tuning | LoRA hit 96.9% of full fine-tune with 0.32% parameters | Spearman -0.22 → +0.85 · AUC-ROC 0.994 | Python · Sentence-Transformers · PEFT/LoRA |
| P4 | AI Resume Coach | Template choice is statistically significant for scoring | Chi² = 32.74 (p<0.001) · 532 tests · 99% coverage | Python · OpenAI · ChromaDB · FastAPI |
| # | Project | What It Does | Stack |
|---|---|---|---|
| P5 | ShopTalk Knowledge Agent | Production RAG with configurable chunking, hybrid retrieval, reranking, and LLM-as-Judge eval | Python · FAISS · LiteLLM · Instructor |
🗓️ Up Next: Multi-agent systems with CrewAI (P6–P9) covering writing clones, feedback intelligence, Jira automation, and DevOps root-cause analysis. See the full roadmap.
- How I Calibrated an LLM Judge That Approved Everything – my first LLM judge had a 0% failure rate, which meant it was useless.
- Building 9 AI Projects (While Working Full-Time) – the portfolio, the progression, and what I've learned so far.
Leadership: People Management · Hiring & Team Building · Performance & Promotions · Executive Communication · Technical Strategy
Technical: Python · Java · TypeScript · OpenAI API · LangChain · CrewAI · FastAPI · ChromaDB · Azure · Docker · Kubernetes · React · Spring Boot
I build AI systems and the teams that ship them.
rubyjha.dev · LinkedIn · AI Portfolio



