Nombre de candidats
:000+
Let AI Supercharge Your Job Hunt!
JobCopilot scans 500,000+ company career sites daily to find jobs for you
Kog builds the fastest LLM inference engine on standard datacenter GPUs. Our Kog Inference Engine generates 3,000 output tokens per second per request on a single 8× AMD MI300X node and 2,100 on an 8× NVIDIA H200 node (FP16, batch size 1, no speculative decoding).
We co-design the model architecture and the execution engine together. Our Laneformer model uses Delayed Tensor Parallelism (DTP), a novel architecture that restructures the Transformer dependency graph so inter-GPU communication overlaps with computation rather than blocking it.
We pretrained a 2B-parameter DTP model on 6T tokens on 256 H100 GPUs.
We are a team of 11 people, including 10 engineers and 4 PhDs.
Test it at playground.kog.ai. Read the technical details on the Kog Labs blog.
You will run experiments to understand how architectural decisions propagate through training dynamics and inference behavior, propose and test variants, and turn findings into measurable gains in model quality and generation speed.
Design new model architectures, including routing strategies, attention mechanisms, and MoE structure, with execution constraints as a first-order design input.
Extend the Laneformer thesis by exploring inference-aware architectural variants such as DTP, Ladder Residual, and PT-Transformer, and finding what compounds at scale.
Own the training pipeline across convergence stability, distributed training efficiency, data pipeline design, and evaluation methodology.
Scale the stack to large MoE models such as DeepSeek v4 and Qwen 3, working through routing, expert parallelism, and communication patterns at inference time.
Contribute to building AI agents that will perform architecture research and training experiments autonomously, starting from the research foundations we are building now.
You have trained large models from scratch and understand training dynamics at the level where you can diagnose issues from first principles.
Showing the work is a requirement to move forward in the process. This means a paper, a repository, or a codebase with concrete evidence of what you built and what it produced.
Strong signals include experience with architecture-hardware co-design or inference-aware training, fluency in Transformers and MoE with enough depth to reason across trade-offs, and production-grade training code in PyTorch or JAX.
Post-training work such as speculative decoding, quantization-aware training, or preference optimization is an acceptable starting point if it demonstrates a genuine understanding of how those methods affect model behavior and deployment.
You will spend at least 50% of your time in our Paris office.
Direct access to AMD and NVIDIA datacenter GPUs from day one
A team where creativity and technical judgment carry weight and where the people closest to the problem shape the key decisions
Problems that sit on the critical path of model execution speed and that directly influence what the system can become
A remote-friendly working model, though you'll spend at least 50% of your time in our Paris office.
Compensation aligned with top technical profiles in the Paris GPU Engineering market, including equity
Auto-Apply to GPU Engineer Jobs with your AI JobCopilot
Copyright © 2026 Grabjobs Pte.Ltd. All Rights Reserved.