A

GPU/ML Systems Engineer

icon briefcase Job Type : Full Time

Number of Applicants

 : 

000+

Click to reveal the number of candidates who applied for this job.
icon loader
Apply Now
icon loader Apply Now

Let AI Supercharge Your Job Hunt!

JobCopilot scans 500,000+ company career sites daily to find jobs for you

Never miss an opportunity Save hours by auto-filling applications forms Land more interviews with tailored applications
happy man
thunder iconActivate JobCopilot

Job Description - GPU/ML Systems Engineer

GPU/ML Systems Engineer

Experience: 3–7 years | Hands -on GPU
optimization required

Location: Office – Coimbatore/Bengaluru

 

About Aivar Innovations

Aivar is an AI -first technology partner where cutting -edge technology meets industry expertise to supercharge your projects. Our AI -augmented teams accelerate development, reduce time -to -market, and deliver exceptional code quality. We bring together the best minds in tech to craft scalable, repeatable solutions that drive real momentum for your business.


Technical Focus

The specialist who takes AI
deployments from “it works” to “sub -second latency at 40% lower cost.” Own
vLLM/Triton configurations, model quantization (INT8, FP16, 4 -bit), tensor
parallelism on multi -GPU instances, AWS Inferentia optimization, and performance
benchmarking. Proven results: 40% cost reduction on Whisper ASR, 0.41s TTFT on
Llama 70B, 85% throughput gain on YOLO via Inferentia.


Functional Expectations
  • Deploy and tune vLLM with multi -GPU tensor
    parallelism, dynamic batching, PagedAttention, and KV cache optimization for
    LLMs

  • Configure NVIDIA Triton for production multi -model
    serving with custom backends and model ensembles

  • Build TensorRT -LLM optimized model binaries for
    maximum throughput on L40S, A100, and H100 GPUs

  • Implement AWS Inferentia deployments using Neuron SDK
    — model compilation, operator support, performance tuning

  • Run comprehensive load testing (Locust) to map
    performance cliffs, optimal concurrency, and scaling thresholds

  • Execute model quantization (INT8, FP16, GPTQ, AWQ)
    with rigorous quality -accuracy tradeoff analysis

  • Produce detailed benchmark reports with instance
    selection, scaling strategy, and cost -per -token recommendations

  • Neuron: Experience in optimizing models for custom
    accelerators like AWS Inferentia/Trainiums


Must -Have Technical Skills
  • GPU -accelerated ML workloads in production (3+ years)
  • LLM serving — vLLM, TensorRT -LLM, or Triton Inference
    Server (hands -on)

  • GPU architecture — memory hierarchy, tensor cores,
    NVLink, NCCL multi -GPU communication

  • Model quantization — INT8, FP16, mixed precision,
    GPTQ/AWQ

  • CUDA ecosystem — drivers, cuDNN, NVIDIA container
    toolkit

  • Performance engineering — profiling (Nsight,
    nvidia -smi, DCGM), bottleneck analysis, load testing

  • AWS GPU instances — G -series (L40S), P -series (A100),
    instance selection methodology


Core Tech Stack

vLLM, NVIDIA Triton, TensorRT -LLM,
KServe, CUDA/cuDNN/NCCL/DCGM, AWS Inferentia/Neuron SDK, GPTQ/AWQ/bitsandbytes,
Locust, Nsight Systems, Prometheus + DCGM Exporter, AWS (EC2 GPU, EKS, Capacity
Blocks)



Benefits

Why You’ll Love Working at Aivar
  • Learn from Experts: Work directly with former AWS leaders and AI pioneers.
  • Direct Ownership: Lead high -impact "greenfield" projects from concept to global launch.
  • Modern Tech: Master the latest Generative AI frameworks and cloud -native architectures.
  • Real -World Impact: Build mission -critical systems used by major global enterprises.
  • Rapid Growth: Scale your career quickly in a high -speed

Diversity and Inclusion

Aivar Innovations is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to gender, gender identity, sexual orientation, religion, disability, age, marital status, caste, or any other protected characteristic, and we are committed to building a diverse, inclusive, and respectful workplace for everyone.​

Original job GPU/ML Systems Engineer posted on GrabJobs ©. To flag any issues with this job please use the Report Job button on GrabJobs.
Apply Now
Share Job
Share Job

Auto-Apply to GPU ML Systems Engineer Jobs with your AI JobCopilot

thunder icon Auto-Apply with AI

Similar GPU ML Systems Engineer Jobs in India

GrabJobs is the no1 job portal in India, connecting you to thousands of jobs fast! Find the best jobs in India, apply in 1 click and get a job today!

Mobile Apps

Copyright © 2026 Grabjobs Pte.Ltd. All Rights Reserved.