S

High Performance Computing Specialist

salary Salary :

$140,000 - 180,000 yearly

icon briefcase Job Type : Full Time

Number of Applicants

 : 

000+

Click to reveal the number of candidates who applied for this job.
icon loader
Apply Now
icon loader Apply Now

Let AI Supercharge Your Job Hunt!

JobCopilot scans 500,000+ company career sites daily to find jobs for you

Never miss an opportunity Save hours by auto-filling applications forms Land more interviews with tailored applications
happy man
thunder iconActivate JobCopilot

Job Description - High Performance Computing Specialist

An elite Montreal based Trading Firm is seeking an HPC Systems Specialist to join a team responsible for designing and operating high performance GPU platforms that support advanced AI and machine learning workloads. This role sits at the intersection of infrastructure engineering, distributed systems, and performance tuning, with ownership spanning from physical hardware through largescale model serving. You will work closely with ML practitioners and infrastructure peers to build reliable, scalable, and highly optimized compute environments.

What You'll Do

  • Build, operate, and continuously improve GPU-based compute platforms supporting large-scale inference and ML workloads
  • Design and deploy distributed model serving architectures across multi-node, multi-GPU environments
  • Operate and evolve Kubernetes clusters with GPU scheduling for AI and ML use cases
  • Configure and tune networking components such as load balancers, firewall rules, and high-throughput interconnects for GPU clusters
  • Develop and optimize storage solutions for model artifacts, checkpoints, and inference caches
  • Diagnose and resolve performance and stability issues across hardware, drivers, networking, and application layers
  • Partner with ML engineers to benchmark models, analyze performance characteristics, and apply inference acceleration strategies
  • Evaluate new GPU hardware, serving frameworks, and infrastructure patterns to improve efficiency and scalability
  • Improve system reliability through observability, alerting, capacity planning, and on-call/incident response processes
  • Automate provisioning and lifecycle management using infrastructure-as-code and scripting

What You Bring

  • Bachelor's or Master's degree in Computer Science, Engineering, or a related discipline
  • 5+ years of experience in managing high performance computing environments
  • Hands-on experience operating GPU compute environments for ML inference or training
  • Familiarity with modern model serving frameworks (e.g., vLLM, SGLang, or similar) and GPU driver/runtime management
  • Strong Linux systems expertise, including networking, storage, and kernel-level performance considerations
  • Practical experience running GPU workloads on Kubernetes at scale
  • Experience with infrastructure automation tools such as Terraform, Ansible, or equivalent
  • Solid understanding of distributed systems concepts, networking fundamentals (TCP/IP, HTTP/2), and load-balancing strategies
  • Proficiency in Python and shell scripting for tooling and automation
  • Experience with monitoring and observability platforms such as Prometheus, Grafana, or comparable tools

This is a hybrid role in the firms Montreal office requiring 3 days per week onsite, and 2 days remote.




Original job High Performance Computing Specialist posted on GrabJobs ©. To flag any issues with this job please use the Report Job button on GrabJobs.
Apply Now
Share Job
Share Job

Auto-Apply to High Performance Computing Specialist Jobs with your AI JobCopilot

thunder icon Auto-Apply with AI

Similar High Performance Computing Specialist Jobs in the US

GrabJobs is the no1 job portal in the US, connecting you to thousands of jobs fast! Find the best jobs in the US, apply in 1 click and get a job today!

Mobile Apps

Copyright © 2026 Grabjobs Pte.Ltd. All Rights Reserved.