Job Description - Engineering Manager, AI Observability
At Netflix, our mission is to entertain the world. Together, we are writing the next episode - pushing the boundaries of storytelling, global fandom and making the unimaginable a reality. We are a dream team obsessed with the uncomfortable excitement of discovering what happens when you merge creativity, intuition and cutting-edge technology. Come be a part of what\u2019s next.\n\nAI and ML powers innovation in all areas of the business, including helping members choose the right title for them through personalization, better understanding our audience and our content slate, creating high-quality subtitles, dubbings, images, trailers, and other assets, optimizing our payment processing, and much more. The Artificial Intelligence Platform (AIP) organization builds highly scalable, differentiated AI infrastructure to maximize the business impact of all AI/ML practitioners at Netflix, which is key to accelerating this innovation.\n\nThe Opportunity\n\nThe AI Observability team makes AI, ML, and Agentic systems transparent, reliable, and production-ready at scale. We build end-to-end observability for ML and GenAI workloads, capturing model inputs, features, predictions, outcomes, and behavior across online and batch systems. Our platform enables teams to monitor model performance, data quality, drift, latency, and failures, turning the ML system from a black box into an explainable, debuggable system. We provide developer-friendly libraries, dashboards, and alerts so teams can debug issues, respond to incidents, and ship AI-powered products with confidence.\n\nWe are looking for an experienced AI/ML infrastructure engineering leader to build and lead the next generation of our AI observability platform. You will lead this newly formed team to architect, design, develop, test, and launch a brand-new platform to enable ML practitioners across different business domains to effortlessly collect model inputs, features, and predictions for thousands of large-scale models, including Large Language Models (LLMs), computer vision, and foundation models. \n\nWe are a highly collaborative team. You will be highly cross-functional in partnering with other engineering, product management, machine learning, and data teams to take Netflix\u2019s AI/ML initiatives to the next level. To succeed in this role, you will need a strong background in AI infrastructure and a passion for building scalable, robust systems that enable and accelerate the application of AI Observability to large, complex ML models across diverse domains. \n\nIn this role, you will:\n\n * Partner with ML researchers, engineers, and platform teams to embed \u201cobservability-by-default\u201d into new AI services, ensuring telemetry, monitoring, and evaluation are built into systems from day one.\n\n * Lead the end-to-end observability strategy for AI workloads, including LLMs, generative AI systems, and classical ML models; driving build vs. buy decisions, and scaling solutions across model training, online inference, and agent orchestration \n\n * Drive the evolution of LLM evaluation frameworks, covering prompt instrumentation, response quality measurement, grounding correctness, hallucination rates, and human/LLM\u2011as\u2011a\u2011judge scoring.\n\n\n\n * Define and execute a platform roadmap focused on incremental delivery, with clear success metrics, migration goals, and strong adoption across teams.\n\n * Communicate progress to stakeholders, customers, and senior leadership. \n\n * Hire, grow, and mentor a high-performing engineering team while fostering an inclusive and collaborative culture.\n\n\n\n\nTo succeed in this role, you will need:\n\n * 10+ years of software engineering experience and 3+ years of management experience. \n\n * Experience leading teams responsible for building high-traffic distributed systems and ML infrastructure\n\n * Deep familiarity with AI and ML operations, including model evaluation, drift detection, and continuous monitoring at scale.\n\n * Experience with AI observability and monitoring tools (e.g., Arize AI, Fiddler AI, Weights \u0026 Biases, Vertex AI Model Monitoring, SageMaker Model Monitor)\n\n * Exposure to LLM or generative AI systems, including prompt/result logging, evaluation metrics, LLM-as-a-judge frameworks, and human-in-the-loop review\n\n * Strong technical acumen and can act as a credible technical advisor to the team, set and enforce a high-quality bar for code and system design, and be a mentor for the team.\n\n * Strong communication and collaboration skills, and the ability to build strong relationships with internal customers and external partners. \n\n * A demonstrated ability to develop, drive, and execute a technical vision and roadmap.\n\n * Experience managing a hybrid team with partners and team members distributed across (US) geographies \u0026 time zones.\n\n\n\n\nTo learn more about our AI Platform, you can review the relevant talks/blog posts on the Netflix AI Platform Research website.\n\nGenerally, our compensation structure consists solely of an annual salary; we do not have bonuses. You choose each year how much of your compensation you want in salary versus stock options. To determine your personal top of market compensation, we rely on market indicators and consider your specific job family, background, skills, and experience to determine your compensation in the market range. The range for this role is $523,000.00 - $920,000.00.\n\nNetflix provides comprehensive benefits including Health Plans, Mental Health support, a 401(k) Retirement Plan with employer match, Stock Option Program, Disability Programs, Health Savings and Flexible Spending Accounts, Family-forming benefits, and Life and Serious Injury Benefits. We also offer paid leave of absence programs. Full-time hourly employees accrue 35 days annually for paid time off to be used for vacation, holidays, and sick paid time off. Full-time salaried employees are immediately entitled to flexible time off. See more details about our Benefits here. \n\nNetflix is a unique culture and environment. Learn more here.\n\nInclusion is a Netflix value and we strive to host a meaningful interview experience for all candidates. If you want an accommodation/adjustment for a disability or any other reason during the hiring process, please send a request to your recruiting partner.\n\nWe are an equal-opportunity employer and celebrate diversity, recognizing that diversity builds stronger teams. We approach diversity and inclusion seriously and thoughtfully. We do not discriminate on the basis of race, religion, color, ancestry, national origin, caste, sex, sexual orientation, gender, gender identity or expression, age, disability, medical condition, pregnancy, genetic makeup, marital status, or military service.\n
All Job Ads are subject to GrabJobs’s Terms of Service. We allow users to flag postings that may be in violation of those terms. Job Ads may also be flagged by GrabJobs moderation team. However, no moderation system is perfect, and flagging a posting does not ensure that it will be removed.
Be the first to receive the latest Others Full-Time Jobs in the US.
Setup your job alert:
By activating job alerts, I agree to GrabJobs Terms & Privacy Policy. I can unsubscribe to job alerts anytime.
Skip
GrabJobs is the no1 job portal in the US, connecting you to thousands of jobs fast!
Find the best jobs in the US, apply in 1 click and get a job today!