Senior Cloud Infrastructure / SRE engineer with 10+ years building and operating mission‑critical data and ML platforms on AWS. Focused on Kubernetes (EKS), Terraform, GitOps, and observability for large‑scale Hadoop/Spark/Kafka/Solr systems supporting AI/ML workloads.
- Uptime: 99.9%+ on shared data/ML platforms
- Migrations: bare metal Cloudera → AWS/EKS with zero downtime
- Scale: petabyte‑scale HDFS, tens of millions of Kafka messages per day
- 🧱 Kubernetes platform engineering (EKS admin, upgrades, autoscaling, Helm, Kustomize)
- ☁️ Infrastructure‑as‑Code with Terraform (multi‑account VPC, IAM, EKS, RDS, DR)
- 📊 Observability for data/ML systems (Prometheus, Grafana, Splunk, OpenTelemetry)
- 🔄 Data & streaming platforms (Hadoop/Cloudera, Spark, Kafka, Solr, Trino, Iceberg, Airflow)
- 🔐 Security & access control (Kerberos, Ranger, IAM/RBAC, Okta/Cognito SSO)
- 🤝 Enabling data and ML teams with self‑service platforms and GitOps workflows
Primary: AWS (EKS, EC2, S3, RDS, CloudFormation, Route53, ALB/NLB).
Azure: hybrid IaC with Terraform/Ansible from earlier roles.
- SLO/SLI design • error budgets • incident response • DR drills • on‑call rotation
- Zero Trust • RBAC • SSO (Okta/Cognito) • Kerberos • Ranger policies • least‑privilege IAM
⚠️ If these cards show “empty” stats, it usually means most work is in private repos or the stats service is rate‑limited. That’s normal for enterprise work.
Data & ML Infrastructure
- Migrating on‑prem Cloudera/Hadoop to AWS (EKS + S3 + RDS) with zero downtime
- Running large‑scale Spark, Kafka, Solr, Trino, Iceberg platforms for ML workloads
- Building self‑service data platforms with Terraform modules and GitOps
Kubernetes & Platform Engineering
- EKS upgrades (control plane + nodes), AL2023 AMIs, RBAC, HPA/VPA, cluster autoscaler
- Helm + Kustomize for multi‑env deployments (Dev/QA/Prod)
- Istio for canary rollouts, traffic splitting, and mTLS between services
Observability & SRE
- Prometheus + Grafana dashboards, SLO/error budget tracking, OpenTelemetry traces
- Splunk‑based log analytics for data pipelines and microservices
- PagerDuty on‑call design (L1/L2/L3) and incident management playbooks
Security & Compliance
- Kerberos + Ranger for Hadoop security and fine‑grained access
- Okta SAML SSO + AWS Cognito OAuth2/OIDC for internal services
- SOC 2 / HIPAA / GDPR controls on data platforms
- Zero‑downtime migration of shared AML data platforms from bare metal to AWS
- 45%+ runtime reduction and 30% cost reduction for critical Spark pipelines
- 40%+ MTTD/MTTR improvement via unified metrics, logs, and PagerDuty workflows
- Built Terraform‑based self‑service platform used by 5+ internal teams to provision data/ML environments in under an hour
- ☁️ AWS Cloud Practitioner
- ⎈ Kubernetes Application Developer (CKAD)
- 🏗️ HashiCorp Terraform (training/cert)
- 🤖 AI & Machine Learning for Business
- 📊 AWS: Design and Implement Systems
- 🧠 OCI 2025 Certified AI Foundations Associate
Interested in data/ML platform engineering, Kubernetes on AWS, observability & SRE, and GitOps‑driven infra.

