Logo-of-Material-Group-hiring-for-jobs-in-US-on-GrabJobs

Inference Performance Engineer

icon briefcase Job Type : Full Time

Number of Applicants

 : 

000+

Click to reveal the number of candidates who applied for this job.
icon loader
Apply Now
icon loader Apply Now

Let AI Supercharge Your Job Hunt!

JobCopilot scans 500,000+ company career sites daily to find jobs for you

Never miss an opportunity Save hours by auto-filling applications forms Land more interviews with tailored applications
happy man
thunder iconActivate JobCopilot

Job Description - Inference Performance Engineer

About the role

Serving frontier models at scale requires solving novel systems problems at every layer of the stack. As an Inference Performance Engineer, you'll own the runtime that turns accelerators into a production serving system, optimizing throughput, latency, and cost across thousands of nodes. You'll work alongside hardware and compiler teams operating at the frontier of AI silicon design.

What you'll do

  • Build and improve the inference runtime

  • Design scheduling, continuous batching, KV cache, and prefill/decode disaggregation

  • Implement low-precision kernels and speculative decoding

  • Drive throughput, latency, and cost per token

  • Collaborate with hardware teams on kernels, operators, and graph optimizations

  • Own the OpenAI-compatible API surface and serving protocol

  • Build benchmarking, profiling, and regression infrastructure

What you'll need

  • BS in CS, EE, or related field, or equivalent experience

  • Software engineering experience: Rust, Go, Python, or C++

  • Understanding of concurrency, memory, and tail latency

  • Understanding of modern inference: transformers, attention, KV cache, batching, speculative decoding, quantization

  • Experience with model serving frameworks: vLLM, TGI, SGLang, TensorRT-LLM, llama.cpp, or custom runtimes

  • GPU or ASIC programming experience: CUDA, ROCm, Triton, or vendor-native toolchains

  • Experience with low-precision inference (FP8, FP4, INT4)

  • Profiling and benchmarking experience: Nsight, perf, custom harnesses

What we offer

  • Top-tier compensation structured to recognize and retain the best talent

  • Meaningful equity

  • Comprehensive medical, dental, vision, life, and disability insurance

  • Parental leave for all new parents, including adoptive and surrogate journeys

  • Flexible PTO

  • Paid Holidays

  • Relocation support

 

Equal Employment Opportunity

We're an Equal Opportunity Employer and do not discriminate on the basis of any protected status under applicable law.

Original job Inference Performance Engineer posted on GrabJobs ©. To flag any issues with this job please use the Report Job button on GrabJobs.
Apply Now
Share Job
Share Job

Auto-Apply to Inference Performance Engineer Jobs with your AI JobCopilot

thunder icon Auto-Apply with AI

Similar Inference Performance Engineer Jobs in the US

GrabJobs is the no1 job portal in the US, connecting you to thousands of jobs fast! Find the best jobs in the US, apply in 1 click and get a job today!

Mobile Apps

Copyright © 2026 Grabjobs Pte.Ltd. All Rights Reserved.