LLM Inference Engineer

Company : Majestic Labs

Job Type : Full Time

Los Altos, United States

Number of Applicants

000+

Apply Now

Let AI Supercharge Your Job Hunt!

JobCopilot scans 500,000+ company career sites daily to find jobs for you

Never miss an opportunity Save hours by auto-filling applications forms Land more interviews with tailored applications

Activate JobCopilot

Job Description - LLM Inference Engineer

The Role

In this high-impact role, you are the bridge between cutting-edge custom silicon and production-grade AI. You will own the end-to-end LLM serving stack on Majestic hardware, architecting everything from serving APIs down to KV cache management, batching, and scheduling. Your primary mission is to port leading frameworks like vLLM and SGLang to our accelerator and optimize them for peak performance. Because our architecture offers memory headroom, you won't just match traditional GPUs; you will shatter their limits on throughput, batch sizes, and context lengths. As you hunt down bottlenecks, your insights will directly steer our future kernel, compiler, and hardware development.

What You'll Own

The serving stack, end to end — bring up and adapt a modern inference framework (vLLM, SGLang, or similar) to run on Majestic hardware.
The runtime hot path — continuous batching, the scheduler, paged KV cache, and prefill/decode disaggregation.
Distributed inference at scale — tensor, pipeline, and expert parallelism across accelerators, wired into our collective communication library (CCL).
The multi-modal pipeline — image, audio, and video preprocessing, encoder integration, and mixed-modality batching.
Inference-time techniques — speculative decoding, prefix caching, and structured decoding.
End-to-end performance — profile, benchmark, and hunt down bottlenecks across the full serving path, feeding findings back to the kernel, compiler, and hardware teams.

What We're Looking For

3+ years building or operating production LLM inference and serving systems (5+ preferred).
Deep, hands-on work with a modern inference framework vLLM, SGLang, TensorRT-LLM, Fireworks, or similar including its scheduler, paged attention / KV cache, model executor, and backend integration points.
Strong Python and C++, with the ability to move fluidly between the two.
A real grasp of transformer inference the prefill/decode split, KV cache behavior, and how batching dynamics shape latency and throughput.
Distributed inference experience tensor and pipeline parallelism across multiple devices.
An instinct for performance you can profile an end-to-end stack and chase a regression from the serving API all the way down to the kernel.

Original job LLM Inference Engineer posted on GrabJobs ©. To flag any issues with this job please use the Report Job button on GrabJobs.

Apply Now

Auto-Apply to Similar Jobs

Share Job

Get your Resume Reviewed for Free

Automate Job Applications for Similar Jobs

About the Company

Majestic Labs

We are seeking a highly skilled Senior Power Engineer to take a leadership role in defining the board-level and rack-level power delivery architecture for our next-generation, multi-kilowatt AI server platforms. This critical position owns the end-to-end design of high-current power delivery paths t...

Similar LLM Inference Engineer Jobs in the US

Get your Resume Reviewed for Free

Email address

Why are you reporting this job?

I think it’s a discriminatory or offensive

I think it’s fraudulent or a scam

I think it’s trying to sell something unrelated to the job / it’s asking for money

I think it contains incorrect or broken information

Other

All Job Ads are subject to GrabJobs’s Terms of Service. We allow users to flag postings that may be in violation of those terms. Job Ads may also be flagged by GrabJobs moderation team. However, no moderation system is perfect, and flagging a posting does not ensure that it will be removed.

Setup your job alert:

Frequency

By activating job alerts, I agree to GrabJobs Terms & Privacy Policy. I can unsubscribe to job alerts anytime. Skip

LLM Inference Engineer

Job Description - LLM Inference Engineer

About the Company

Similar LLM Inference Engineer Jobs in the US

Mobile Apps