Number of Applicants
:000+
Let AI Supercharge Your Job Hunt!
JobCopilot scans 500,000+ company career sites daily to find jobs for you
.Astera Labs (NASDAQ: ALAB) provides rack-scale AI infrastructure through purpose-built connectivity solutions. By collaborating with hyperscalers and ecosystem partners, Astera Labs enables organizations to unlock the full potential of modern AI. Astera Labs’ Intelligent Connectivity Platform integrates CXL®, Ethernet, NVLink, PCIe®, and UALink™ semiconductor-based technologies with the company’s COSMOS software suite to unify diverse components into cohesive, flexible systems that deliver end-to-end scale-up, and scale-out connectivity. The company’s custom connectivity solutions business complements its standards-based portfolio, enabling customers to deploy tailored architectures to meet their unique infrastructure requirements. Discover more at www.asteralabs.com.
We are seeking a Performance Analysis Engineer to drive system-level performance optimization across large-scale AI training and inference environments. In this role, you will analyze, profile, and optimize distributed workloads running on high-density accelerator clusters, working across the full stack, from ML frameworks and communication libraries to network fabrics and hardware architecture.
You will play a critical role in ensuring that next-generation AI workloads achieve near-peak hardware efficiency, while directly influencing software architecture, infrastructure design, and future silicon and networking roadmaps.
Execute and profile state-of-the-art training and inference workloads (e.g., LLMs, diffusion models) across large-scale accelerator clusters.
Identify and resolve bottlenecks across compute, memory bandwidth, and interconnect latency that impact end-to-end Job Completion Time (JCT).
Tune and optimize distributed communication backends such as NCCL, RCCL, and MPI.
Improve efficiency of collective operations including All-Reduce, All-to-All, Reduce-Scatter, and broadcast to minimize synchronization overhead.
Conduct deep-dive analysis of network performance, diagnosing issues such as packet loss, congestion, head-of-line blocking, and tail latency.
Partner with infrastructure teams to improve network behavior under real-world AI workloads.
Design and implement intelligent load-balancing strategies and traffic-shaping algorithms.
Prevent network and compute “hot spots” in high-density AI clusters and improve workload fairness and throughput.
Leverage advanced PyTorch capabilities including DistributedDataParallel (DDP), Fully Sharded Data Parallel (FSDP), and torch.compile.
Optimize execution graphs, runtime traces, and memory usage for maximum hardware efficiency.
Apply best practices in kernel fusion, mixed-precision execution (FP16/FP8/INT8), and memory management.
Reduce idle “bubble” time and drive sustained peak FLOPS utilization during training and inference.
Build automated benchmarking suites and performance regression tests.
Develop quantitative models to predict how architectural changes (e.g., attention mechanisms, batch sizes, parallelism strategies) scale across different cluster topologies.
Collaborate closely with systems, infrastructure, and silicon teams to translate performance findings into actionable requirements.
Influence the design of next-generation AI accelerators, NICs, and interconnects.
Education:
Bachelor’s, Master’s, or PhD in Computer Engineering, Electrical Engineering or a related field.
We know that creativity and innovation happen more often when teams include diverse ideas, backgrounds, and experiences, and we actively encourage everyone with relevant experience to apply, including people of color, LGBTQ+ and non-binary people, veterans, parents, and individuals with disabilities.
Astera Labs Early Career
Purpose-built Connectivity Solutions For Intelligent Systems
Read more about the companyAuto-Apply to Performance Analysis Engineer Jobs with your AI JobCopilot
Copyright © 2026 Grabjobs Pte.Ltd. All Rights Reserved.