L

IT Engineer 4

icon building Company : Lam Research
icon briefcase Job Type : Full Time

Number of Applicants

 : 

000+

Click to reveal the number of candidates who applied for this job.
icon loader
Apply Now
icon loader Apply Now

Let AI Supercharge Your Job Hunt!

JobCopilot scans 500,000+ company career sites daily to find jobs for you

Never miss an opportunity Save hours by auto-filling applications forms Land more interviews with tailored applications
happy man
thunder iconActivate JobCopilot

Job Description - IT Engineer 4

Cluster Lifecycle Management: Lead the evaluation, planning, configuration, and physical/virtual deployment of multiple large-scale CPU + GPU clusters. System Administration: Perform expert-level Linux system administration, including kernel tuning, security hardening, and OS lifecycle management (e.g., RHEL, Ubuntu, or Rocky Linux). Workload Management: Act as the subject matter expert for SLURM, managing complex partitioning, resource quality of service (QoS), and scheduling optimization for mixed workloads. Infrastructure Design: Architect and build the physical and logical infrastructure for HPC, including high-speed fabric integration (InfiniBand/Ethernet) and power/cooling planning. Software Stack & Modules: Maintain and curate the HPC application stack using software management tools like LMOD or Tcl Modules, ensuring researchers have access to optimized compilers, libraries (MPI, CUDA), and applications. GPU Optimization: Spec and tune GPU environments (e.g., NVIDIA H100/B200), focusing on GPUDirect, NVLink topologies, and containerized runtimes like Apptainer/Singularity. Troubleshooting & Performance: Conduct deep-dive root cause analysis for complex system failures and performance bottlenecks across compute, network, and software layers. Cross-Functional Leadership: Closely own infrastructure projects by coordinating with Networking (low-latency fabric) and Security (compliance, identity management) to ensure all builds meet enterprise standards. Experience with GPU-aware MPI implementations and performance profiling tools (e.g., NVIDIA Nsight, Tau). Knowledge of container orchestration in HPC (e.g., Kubernetes for AI/ML workloads alongside SLURM). Certifications such as RHCE (Red Hat Certified Engineer) or relevant NVIDIA/InfiniBand technical training. Education: BS/MS in Computer Science, Electrical Engineering, or a related field. HPC Experience: 6+ years of hands-on experience managing production-grade HPC clusters. Scheduler Expertise: Deep proficiency in SLURM administration, including writing custom prolog/epilog scripts and managing GRES (Generic Resources) for GPUs. Linux Mastery: Advanced knowledge of Linux internals, shell scripting (Bash), and at least one high-level language (Python or Go). Automation: Extensive experience with configuration management and provisioning tools (e.g., Ansible, Terraform, xCAT, or Warewulf). Networking: Familiarity with HPC-specific networking such as InfiniBand (NDR/HDR) and RoCE v2.
Original job IT Engineer 4 posted on GrabJobs ©. To flag any issues with this job please use the Report Job button on GrabJobs.
Apply Now
Share Job
Share Job

Auto-Apply to IT Engineer Jobs with your AI JobCopilot

thunder icon Auto-Apply with AI

Similar IT Engineer Jobs in India

GrabJobs is the no1 job portal in India, connecting you to thousands of jobs fast! Find the best jobs in India, apply in 1 click and get a job today!

Mobile Apps

Copyright © 2026 Grabjobs Pte.Ltd. All Rights Reserved.