Senior Cloud Infrastructure Engineer - Kubernetes
Experience: 5–9 years | 3+ years with
EKS/Kubernetes in production
Location: Office – Coimbatore/Bengaluru
About Aivar Innovations
Aivar is an AI -first technology partner where cutting -edge technology meets industry expertise to supercharge your projects. Our AI -augmented teams accelerate development, reduce time -to -market, and deliver exceptional code quality. We bring together the best minds in tech to craft scalable, repeatable solutions that drive real momentum for your business.
Technical Focus
Foundational hire for AI Ops stack.
Own the entire EKS platform: hardened cluster configurations, Terraform
modules, Karpenter GPU -aware autoscaling, multi -tenancy (RBAC, namespace
isolation, network policies), multi -region DR, and cost optimization. Build
infrastructure that runs Llama 70B at sub -second latency on multi -GPU
instances.
Functional Expectations
- Design hardened EKS clusters — private endpoints,
IMDSv2, Pod Security Admission, image scanning, audit logging
- UltraCluster Scale – Experience in building HPCs and
Large cluster suitable for managing AI Ops of SLMs to LLMs
- Build Terraform modules for complete Kubogent stack —
VPC, EKS, GPU/CPU node groups, IAM, networking, storage
- Configure Karpenter for GPU -aware autoscaling across
instance families (G6e, P4d, P5, Inferentia)
- Implement multi -tenancy — namespace isolation,
resource quotas, RBAC, network policies, fair -share scheduling
- Build multi -region DR with automated failover,
cross -region replication, and failover testing
- Optimize cloud spend — Capacity Blocks, Spot
instances, reserved pricing, right -sizing, KubeCost integration
- Design robust network architecture — VPC CNI, private
subnets, security groups, Transit Gateway, private endpoints
Must -Have Technical Skills
- AWS infrastructure — deep VPC, IAM, networking,
multi -account (5+ years)
- Kubernetes/EKS — production clusters, networking
(CNI), storage, RBAC (3+ years)
- Terraform expert — large module codebases, remote
state, workspaces, CI/CD integration
- Karpenter or Cluster Autoscaler in production
- GPU instances on AWS — G -series (L40S), P -series
(A100), NVIDIA GPU operator/device plugins
- Security hardening — Pod Security Admission,
OPA/Gatekeeper, image scanning, secrets management
- Linux systems — performance tuning, storage (EBS, EFS,
FSx for Lustre), kernel parameters
Core Tech Stack
Terraform, AWS (EKS, EC2 GPU, VPC,
IAM, EBS/EFS/FSx, ECR), Karpenter, Helm, Kustomize, ArgoCD, NVIDIA GPU
Operator/DCGM, Calico, Istio, Prometheus/Grafana/KubeCost, OPA/Gatekeeper,
Falco, Trivy
Benefits
Why You’ll Love Working at Aivar
- Learn from Experts: Work directly with former AWS leaders and AI pioneers.
- Direct Ownership: Lead high -impact "greenfield" projects from concept to global launch.
- Modern Tech: Master the latest Generative AI frameworks and cloud -native architectures.
- Real -World Impact: Build mission -critical systems used by major global enterprises.
- Rapid Growth: Scale your career quickly in a high -speed
Diversity and Inclusion
Aivar Innovations is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to gender, gender identity, sexual orientation, religion, disability, age, marital status, caste, or any other protected characteristic, and we are committed to building a diverse, inclusive, and respectful workplace for everyone.