Performance Analysis Engineer (NCG 2026)

Company : Astera Labs Early Career

Job Type : Full Time

San Jose, Ca

Number of Applicants

000+

Let AI Supercharge Your Job Hunt!

JobCopilot scans 500,000+ company career sites daily to find jobs for you

Never miss an opportunity Save hours by auto-filling applications forms Land more interviews with tailored applications

Activate JobCopilot

Job Description - Performance Analysis Engineer (NCG 2026)

.Astera Labs (NASDAQ: ALAB) provides rack-scale AI infrastructure through purpose-built connectivity solutions. By collaborating with hyperscalers and ecosystem partners, Astera Labs enables organizations to unlock the full potential of modern AI. Astera Labs’ Intelligent Connectivity Platform integrates CXL®, Ethernet, NVLink, PCIe®, and UALink™ semiconductor-based technologies with the company’s COSMOS software suite to unify diverse components into cohesive, flexible systems that deliver end-to-end scale-up, and scale-out connectivity. The company’s custom connectivity solutions business complements its standards-based portfolio, enabling customers to deploy tailored architectures to meet their unique infrastructure requirements. Discover more at www.asteralabs.com.

About the Role

We are seeking a Performance Analysis Engineer to drive system-level performance optimization across large-scale AI training and inference environments. In this role, you will analyze, profile, and optimize distributed workloads running on high-density accelerator clusters, working across the full stack, from ML frameworks and communication libraries to network fabrics and hardware architecture.

You will play a critical role in ensuring that next-generation AI workloads achieve near-peak hardware efficiency, while directly influencing software architecture, infrastructure design, and future silicon and networking roadmaps.

Job Duties

Cluster-Scale Performance Profiling

Execute and profile state-of-the-art training and inference workloads (e.g., LLMs, diffusion models) across large-scale accelerator clusters.

Identify and resolve bottlenecks across compute, memory bandwidth, and interconnect latency that impact end-to-end Job Completion Time (JCT).

Collective Library Optimization

Tune and optimize distributed communication backends such as NCCL, RCCL, and MPI.

Improve efficiency of collective operations including All-Reduce, All-to-All, Reduce-Scatter, and broadcast to minimize synchronization overhead.

Network Fabric Analysis

Conduct deep-dive analysis of network performance, diagnosing issues such as packet loss, congestion, head-of-line blocking, and tail latency.

Partner with infrastructure teams to improve network behavior under real-world AI workloads.

Advanced Load Balancing & Traffic Optimization

Design and implement intelligent load-balancing strategies and traffic-shaping algorithms.

Prevent network and compute “hot spots” in high-density AI clusters and improve workload fairness and throughput.

PyTorch Stack Optimization

Leverage advanced PyTorch capabilities including DistributedDataParallel (DDP), Fully Sharded Data Parallel (FSDP), and torch.compile.

Optimize execution graphs, runtime traces, and memory usage for maximum hardware efficiency.

GPU & Accelerator Utilization

Apply best practices in kernel fusion, mixed-precision execution (FP16/FP8/INT8), and memory management.

Reduce idle “bubble” time and drive sustained peak FLOPS utilization during training and inference.

Performance Modeling & Benchmarking

Build automated benchmarking suites and performance regression tests.

Develop quantitative models to predict how architectural changes (e.g., attention mechanisms, batch sizes, parallelism strategies) scale across different cluster topologies.

Hardware–Software Co-Design

Collaborate closely with systems, infrastructure, and silicon teams to translate performance findings into actionable requirements.

Influence the design of next-generation AI accelerators, NICs, and interconnects.

Requirements & Qualifications

Education:
Bachelor’s, Master’s, or PhD in Computer Engineering, Electrical Engineering or a related field.

Hands-on experience optimizing distributed ML workloads across multi-node accelerator clusters.

Strong understanding of data parallelism, model parallelism, and pipeline parallelism.

Deep knowledge of GPU or accelerator architectures, including compute units, memory hierarchies, and interconnects (PCIe, NVLink, or equivalents).

Experience working with NCCL, RCCL, MPI, or similar collective communication frameworks.

Strong understanding of high-performance networking (Ethernet, InfiniBand, RoCE) and their impact on distributed workloads.

PyTorch & ML Systems Proficiency

Advanced experience with PyTorch, including distributed training internals and execution tracing.

Ability to diagnose and optimize framework-level and runtime bottlenecks.

Comfortable debugging issues across software, firmware, and hardware boundaries.

Strong proficiency in Python and C/C++.

Experience building performance analysis tools, automation, and benchmarking frameworks.

Ability to clearly communicate complex performance findings to cross-functional teams.

Comfortable working in fast-moving, ambiguous environments.

We know that creativity and innovation happen more often when teams include diverse ideas, backgrounds, and experiences, and we actively encourage everyone with relevant experience to apply, including people of color, LGBTQ+ and non-binary people, veterans, parents, and individuals with disabilities.

Original job Performance Analysis Engineer (NCG 2026) posted on GrabJobs ©. To flag any issues with this job please use the Report Job button on GrabJobs.

Auto-Apply to Similar Jobs

Share Job

Get your Resume Reviewed for Free

Automate Job Applications for Similar Jobs

About the Company

Astera Labs Early Career

Purpose-built Connectivity Solutions For Intelligent Systems

Similar Performance Analysis Engineer Jobs in the US

Get your Resume Reviewed for Free

Email address

Why are you reporting this job?

I think it’s a discriminatory or offensive

I think it’s fraudulent or a scam

I think it’s trying to sell something unrelated to the job / it’s asking for money

I think it contains incorrect or broken information

Other

All Job Ads are subject to GrabJobs’s Terms of Service. We allow users to flag postings that may be in violation of those terms. Job Ads may also be flagged by GrabJobs moderation team. However, no moderation system is perfect, and flagging a posting does not ensure that it will be removed.

Setup your job alert:

Frequency

By activating job alerts, I agree to GrabJobs Terms & Privacy Policy. I can unsubscribe to job alerts anytime. Skip