Inference Performance Engineer

Company : Material Group

Job Type : Full Time

New York, United States

Number of Applicants

000+

Apply Now

Let AI Supercharge Your Job Hunt!

JobCopilot scans 500,000+ company career sites daily to find jobs for you

Never miss an opportunity Save hours by auto-filling applications forms Land more interviews with tailored applications

Activate JobCopilot

Job Description - Inference Performance Engineer

About the role

Serving frontier models at scale requires solving novel systems problems at every layer of the stack. As an Inference Performance Engineer, you'll own the runtime that turns accelerators into a production serving system, optimizing throughput, latency, and cost across thousands of nodes. You'll work alongside hardware and compiler teams operating at the frontier of AI silicon design.

What you'll do

Build and improve the inference runtime
Design scheduling, continuous batching, KV cache, and prefill/decode disaggregation
Implement low-precision kernels and speculative decoding
Drive throughput, latency, and cost per token
Collaborate with hardware teams on kernels, operators, and graph optimizations
Own the OpenAI-compatible API surface and serving protocol
Build benchmarking, profiling, and regression infrastructure

What you'll need

BS in CS, EE, or related field, or equivalent experience
Software engineering experience: Rust, Go, Python, or C++
Understanding of concurrency, memory, and tail latency
Understanding of modern inference: transformers, attention, KV cache, batching, speculative decoding, quantization
Experience with model serving frameworks: vLLM, TGI, SGLang, TensorRT-LLM, llama.cpp, or custom runtimes
GPU or ASIC programming experience: CUDA, ROCm, Triton, or vendor-native toolchains
Experience with low-precision inference (FP8, FP4, INT4)
Profiling and benchmarking experience: Nsight, perf, custom harnesses

What we offer

Top-tier compensation structured to recognize and retain the best talent
Meaningful equity
Comprehensive medical, dental, vision, life, and disability insurance
Parental leave for all new parents, including adoptive and surrogate journeys
Flexible PTO
Paid Holidays
Relocation support

Equal Employment Opportunity

We're an Equal Opportunity Employer and do not discriminate on the basis of any protected status under applicable law.

Original job Inference Performance Engineer posted on GrabJobs ©. To flag any issues with this job please use the Report Job button on GrabJobs.

Apply Now

Auto-Apply to Similar Jobs

Share Job

Get your Resume Reviewed for Free

Automate Job Applications for Similar Jobs

Auto-Apply to Inference Performance Engineer Jobs with your AI JobCopilot

Auto-Apply with AI

Similar Inference Performance Engineer Jobs in the US

Get your Resume Reviewed for Free

Email address

Why are you reporting this job?

I think it’s a discriminatory or offensive

I think it’s fraudulent or a scam

I think it’s trying to sell something unrelated to the job / it’s asking for money

I think it contains incorrect or broken information

Other

All Job Ads are subject to GrabJobs’s Terms of Service. We allow users to flag postings that may be in violation of those terms. Job Ads may also be flagged by GrabJobs moderation team. However, no moderation system is perfect, and flagging a posting does not ensure that it will be removed.

Setup your job alert:

Frequency

By activating job alerts, I agree to GrabJobs Terms & Privacy Policy. I can unsubscribe to job alerts anytime. Skip

Inference Performance Engineer

Job Description - Inference Performance Engineer

Similar Inference Performance Engineer Jobs in the US

Mobile Apps