GPU/ML Systems Engineer

Company : Aivar Innovations

Job Type : Full Time

Bengaluru, India

Job Description - GPU/ML Systems Engineer

GPU/ML Systems Engineer

Experience: 3–7 years | Hands -on GPU
optimization required

Location: Office – Coimbatore/Bengaluru

About Aivar Innovations

Aivar is an AI -first technology partner where cutting -edge technology meets industry expertise to supercharge your projects. Our AI -augmented teams accelerate development, reduce time -to -market, and deliver exceptional code quality. We bring together the best minds in tech to craft scalable, repeatable solutions that drive real momentum for your business.

Technical Focus

The specialist who takes AI
deployments from “it works” to “sub -second latency at 40% lower cost.” Own
vLLM/Triton configurations, model quantization (INT8, FP16, 4 -bit), tensor
parallelism on multi -GPU instances, AWS Inferentia optimization, and performance
benchmarking. Proven results: 40% cost reduction on Whisper ASR, 0.41s TTFT on
Llama 70B, 85% throughput gain on YOLO via Inferentia.

Functional Expectations

Deploy and tune vLLM with multi -GPU tensor
parallelism, dynamic batching, PagedAttention, and KV cache optimization for
LLMs
Configure NVIDIA Triton for production multi -model
serving with custom backends and model ensembles
Build TensorRT -LLM optimized model binaries for
maximum throughput on L40S, A100, and H100 GPUs
Implement AWS Inferentia deployments using Neuron SDK
— model compilation, operator support, performance tuning
Run comprehensive load testing (Locust) to map
performance cliffs, optimal concurrency, and scaling thresholds
Execute model quantization (INT8, FP16, GPTQ, AWQ)
with rigorous quality -accuracy tradeoff analysis
Produce detailed benchmark reports with instance
selection, scaling strategy, and cost -per -token recommendations
Neuron: Experience in optimizing models for custom
accelerators like AWS Inferentia/Trainiums

Must -Have Technical Skills

GPU -accelerated ML workloads in production (3+ years)
LLM serving — vLLM, TensorRT -LLM, or Triton Inference
Server (hands -on)
GPU architecture — memory hierarchy, tensor cores,
NVLink, NCCL multi -GPU communication
Model quantization — INT8, FP16, mixed precision,
GPTQ/AWQ
CUDA ecosystem — drivers, cuDNN, NVIDIA container
toolkit
Performance engineering — profiling (Nsight,
nvidia -smi, DCGM), bottleneck analysis, load testing
AWS GPU instances — G -series (L40S), P -series (A100),
instance selection methodology

Core Tech Stack

vLLM, NVIDIA Triton, TensorRT -LLM,
KServe, CUDA/cuDNN/NCCL/DCGM, AWS Inferentia/Neuron SDK, GPTQ/AWQ/bitsandbytes,
Locust, Nsight Systems, Prometheus + DCGM Exporter, AWS (EC2 GPU, EKS, Capacity
Blocks)

Benefits

Why You’ll Love Working at Aivar

Learn from Experts: Work directly with former AWS leaders and AI pioneers.
Direct Ownership: Lead high -impact "greenfield" projects from concept to global launch.
Modern Tech: Master the latest Generative AI frameworks and cloud -native architectures.
Real -World Impact: Build mission -critical systems used by major global enterprises.
Rapid Growth: Scale your career quickly in a high -speed

Diversity and Inclusion

Aivar Innovations is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to gender, gender identity, sexual orientation, religion, disability, age, marital status, caste, or any other protected characteristic, and we are committed to building a diverse, inclusive, and respectful workplace for everyone.

Email address

Why are you reporting this job?

I think it’s a discriminatory or offensive

I think it’s fraudulent or a scam

I think it’s trying to sell something unrelated to the job / it’s asking for money

I think it contains incorrect or broken information

Other

All Job Ads are subject to GrabJobs’s Terms of Service. We allow users to flag postings that may be in violation of those terms. Job Ads may also be flagged by GrabJobs moderation team. However, no moderation system is perfect, and flagging a posting does not ensure that it will be removed.