T

Lead/Manager Site Reliability Engineering Team (Amsterdam)

icon building Company : Together Ai
icon briefcase Job Type : Full Time

Number of Applicants

 : 

000+

Click to reveal the number of candidates who applied for this job.
icon loader
Apply Now
icon loader Apply Now

Let AI Supercharge Your Job Hunt!

JobCopilot scans 500,000+ company career sites daily to find jobs for you

Never miss an opportunity Save hours by auto-filling applications forms Land more interviews with tailored applications
happy man
thunder iconActivate JobCopilot

Job Description - Lead/Manager Site Reliability Engineering Team (Amsterdam)

About the Role


Lead a team of Site Reliability Engineer (SRE) at Together based out of our office in Amsterdam, you  and the SRE team are responsible for keeping all user-facing services and production systems running smoothly. You are a blend of a pragmatic operator and a software engineer that applies sound engineering principles, operational discipline, and mature automation to our operating environments and codebase.


You specialize in systems (operating systems, storage subsystems, networking), while implementing best practices for availability, reliability and scalability, with varied interests in algorithms and distributed systems.


Responsibilities



  • Be on an on-call (PagerDuty) rotation to respond to incidents that impact availability

  • Manage, develop and coach the SRE Team.

  • Build and run our infrastructure with Ansible, Terraform, and Kubernetes to enable scaling to a massive number of concurrent users

  • Build monitoring systems to ensure the highest quality service for our customers

  • Design and implement operational processes (such as deployments and upgrades)

  • Debug production issues across all services and levels of the stack

  • Identify improvements for the product architecture from the reliability, performance and availability perspectives 

  • Plan the growth of Together AI’s infrastructure


Requirements



  • 7+ years of professional SRE or related experience

  • Ideally 2 years as a Lead SRE

  • Bachelor's degree in Computer Science or a related field or equivalent work experience

  • Expert knowledge of Ansible (roles, playbooks), Terraform, and Kubernetes

  • Proficiency in programming/scripting languages

  • Direct experience in monitoring and observability practices

  • Advanced knowledge of cloud services

  • Ability to thrive in a collaborative environment involving different stakeholders and subject matter experts


About Together AI


Together AI is a research-driven artificial intelligence company. We believe open and transparent AI systems will drive innovation and create the best outcomes for society, and together we are on a mission to significantly lower the cost of modern AI systems by co-designing software, hardware, algorithms, and models. We have contributed to leading open-source research, models, and datasets to advance the frontier of AI, and our team has been behind technological advancement such as FlashAttention, Hyena, FlexGen, and RedPajama. We invite you to join a passionate group of researchers and engineers in our journey in building the next generation AI infrastructure.


Equal Opportunity


Together AI is an Equal Opportunity Employer and is proud to offer equal employment opportunity to everyone regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity, veteran status, and more.


Please see our privacy policy at https://www.together.ai/privacy  

Original job Lead/Manager Site Reliability Engineering Team (Amsterdam) posted on GrabJobs ©. To flag any issues with this job please use the Report Job button on GrabJobs.
Apply Now
Share Job
Share Job

About the Company

Together Ai

Run and fine-tune generative AI models with easy-to-use APIs and highly scalable infrastructure. Train & deploy models at scale on our AI Acceleration Cloud and scalable GPU clusters. Optimize performance and cost.

Read more about the company

Auto-Apply to Site Reliability Engineering Team Lead/Manager Jobs with your AI JobCopilot

thunder icon Auto-Apply with AI

Similar Site Reliability Engineering Team Lead/Manager Jobs in Netherlands

GrabJobs is the no1 job portal in Netherlands, connecting you to thousands of jobs fast! Find the best jobs in Netherlands, apply in 1 click and get a job today!

Mobile Apps

Copyright © 2026 Grabjobs Pte.Ltd. All Rights Reserved.