$20,000 - 28,000 monthly
Want to be a part of Asia Pacific & Middle East's (APME) largest, most innovative, and rapidly growing data centre company?
AirTrunk is a technology company with a powerful purpose - to scale and sustain the relentless growth of the region’s digital future. We do this by continuously redefining and delivering hyperscale data centres that meet the needs of our customers - the world’s most transformational companies. And we’re doing so sustainably, for today and tomorrow.
Having opened Australia’s first and largest hyperscale data centres in 2017, we set our eyes on rapid expansion and now operate a platform of hyperscale data centres across the APME region. With backing from our investors, including Blackstone, this is just the beginning…
Come join the A-Team at AirTrunk, where the cloud meets the ground.
A Snapshot
We are seeking a senior infrastructure leader to lead the deployment, performance, and reliability of large-scale bare-metal GPU clusters at the core of next-generation AI factory environments. This individual will play a pivotal role in bringing new compute capacity into production, ensuring infrastructure is commissioned effectively, operated reliably, and continuously optimised to support mission-critical AI workloads.
This is a high-impact leadership role at the intersection of infrastructure engineering, systems performance, production reliability, and operational scale-up.
Responsibilities
Lead the design, deployment, commissioning, and operational readiness of large-scale bare-metal GPU clusters.
Enable infrastructure performance, availability, stability, and reliability across production environments.
Build and scale the engineering capability responsible for cluster bring-up, validation, monitoring, and lifecycle management.
Drive performance optimisation across compute, networking, storage, and systems layers to support demanding AI workloads.
Establish robust operational standards across observability, incident management, fault remediation, change control, and production support.
Develop deployment playbooks, readiness criteria, and repeatable processes for bringing new cluster capacity online.
Lead root cause analysis and resolution of infrastructure incidents, performance degradation, and hardware or systems failures.
Partner closely with adjacent teams across data centre, network, platform, and operations to ensure seamless deployment and stable ongoing operations.
Act as a key technical partner in hardware sourcing and capacity planning, helping define infrastructure requirements and inform GPU, server, network, and related vendor selection.
Support supplier and partner evaluation through technical diligence, architecture assessment, deployment feasibility review, and performance validation.
Define and track key operational metrics, including cluster health, utilisation, throughput, availability, and failure trends.
Support capacity planning and infrastructure scaling as the platform expands.
Key Requirements
Proven experience leading the design, deployment and operation of large-scale compute infrastructure, ideally including GPU clusters, HPC environments, or similarly performance-critical distributed systems.
Deep technical grounding in infrastructure engineering, systems performance, and production reliability.
Strong understanding of GPU-based environments and the operational requirements of compute-intensive workloads.
Experience optimising infrastructure across servers, accelerators, networking, storage, and Linux-based systems.
Strong background in observability, incident response, root cause analysis, and operational resilience within mission-critical environments.
Experience with infrastructure automation, provisioning, and configuration management at scale.
Working knowledge of cluster orchestration and scheduling environments such as Kubernetes, Slurm, or similar, with a clear understanding of their impact on utilisation, performance, and reliability.
Experience contributing to hardware roadmap decisions, technical vendor assessment, or infrastructure sourcing for large-scale compute environments.
Demonstrated ability to build and lead high-performing technical teams in complex, fast-scaling organisations.
Strong judgement, execution capability, and the ability to operate effectively across both technical and leadership audiences.
Preferred Background
Experience in AI infrastructure, hyperscale cloud, HPC, supercomputing, or advanced data centre environments.
Exposure to large-scale NVIDIA GPU deployments and related software ecosystems.
Experience working with OEMs, ODMs, server vendors, or silicon ecosystem partners on technical evaluation and deployment planning.
Experience with high-performance networking, including InfiniBand and/or high-speed Ethernet.
Familiarity with high-density compute environments where power, cooling, and physical design constraints materially affect infrastructure reliability and performance.
Experience in greenfield buildouts or scaling new infrastructure platforms from early deployment into steady-state operations.
Why this role?
This is an opportunity to take on a foundational leadership position within a strategically important growth platform. The successful candidate will have the chance to shape the infrastructure standards, operational model, and engineering capability behind a new generation of large-scale bare-metal AI compute environments, while also influencing critical hardware and capacity decisions at scale.
The AirTrunk Culture
Working at AirTrunk is a once in a lifetime opportunity to fast-track your career and amplify your impact. Whilst you’re helping scale our region’s digital future, we’ll help you Grow@Hyperscale and unleash your full potential.
The pace in which we operate means you’ll feel an electric atmosphere at AirTrunk. We are a team of challengers and collaborative problem solvers who break new ground every day. We do this by living our values, going above and beyond, and being dynamic, transparent, and responsive.
Every AirTrunker brings their own unique background and diverse perspective to find solutions to problems that matter. We make sure you have everything you need to make your mark and thrive in a flexible and safe working environment, where everyone feels welcome. Our benefits empower AirTrunkers to stay positively charged.
Now’s your chance to Grow@Hyperscale.
.
**To all recruitment agencies: AirTrunk does not accept agency resumes. Please do not forward resumes to our jobs alias, AirTrunk employees or any other organisation location. AirTrunk is not responsible for any fees related to unsolicited resumes.**
Copyright © 2026 Grabjobs Pte.Ltd. All Rights Reserved.