ENTC Undergrad | ML & Systems | Building at the edge of AI and Hardware | Semiconductor AI Enthusiast
- 🎓 Undergrad in Electronics & Telecommunication Engineering at Pune Institute of Computer Technology (PICT) — curious, hardworking
- 💼 Currently interning as a Software Development Intern @ DeepTek.ai — working at the intersection of medical AI, Transformer workflows, and scalable backend systems
- ⚡ Passionate about GPU computing and AI systems — from writing low-level CUDA kernels to deploying end-to-end ML pipelines
- 🎯 Driven by a long-term vision of becoming an AI Engineer in the semiconductor space — where hardware meets intelligence
- 🏍️ Fitness enthusiast, bike rider, and occasional swimmer — I believe a strong body fuels a sharper mind
CUDA · C++ · Parallel Computing · GPU Architecture
- Engineered a FlashAttention-style GPU kernel with shared-memory tiling, online softmax, and fused attention to minimize HBM memory movement
- Achieved 254× over CPU baseline and 70.69× over simple GPU baseline, reaching 303 GFLOPs/s on NVIDIA RTX 3090
- Applied kernel fusion, warp-synchronous computation, and SRAM reuse — avoiding N×N intermediate memory materialization
Python · FastAPI · FAISS · BM25 · Whisper · Docker · PostgreSQL
- Distributed RAG system converting video into a searchable knowledge base via Whisper transcription, semantic chunking, and hybrid FAISS+BM25 retrieval with CrossEncoder re-ranking
- LLM-based Q&A (llama.cpp / Phi-3), timestamp-level retrieval, Redis caching, and PostgreSQL metadata store
Python · OpenCV · TensorFlow Lite · MediaPipe · Flutter · Firebase
- Real-time workout evaluation system achieving ~95% accuracy in posture detection and rep counting
- Optimized TFLite inference reducing latency by 40% for edge deployment; full-stack with Flutter + Firebase
- 🏅 Machine Learning Specialization — Andrew Ng
- 🏅 Complete Data Science, ML, DL, NLP Bootcamp — Krish Naik
- 🏅 Data Analysis Bootcamp — Alexander Freberg
"Transforming attention from a memory-bound workload into a compute-efficient kernel — one CUDA thread at a time."