Logo-of-Vodafone-hiring-for-jobs-in-Portugal-on-GrabJobs

Resilience Engineer

icon building Empresa : Vodafone
icon briefcase Tipo de Emprego : Periodo Integral

Número de Aplicantes

 : 

000+

Click to reveal the number of candidates who applied for this job.
icon loader
icon loader

Let AI Supercharge Your Job Hunt!

JobCopilot scans 500,000+ company career sites daily to find jobs for you

Never miss an opportunity Save hours by auto-filling applications forms Land more interviews with tailored applications
happy man
thunder iconActivate JobCopilot

Descrição do Emprego - Resilience Engineer

Developing and governing resilience strategies across system architecture, deployment, monitoring, and incident response; Defining and tracking stability KPIs (e.g., MTTD, MTTR, error budgets), partnering with performance and operations teams to meet or exceed targets; Designing and implementing fault injection testing, chaos engineering practices, and scenario-based simulations to validate platform robustness; Collaborating with product, infrastructure, architecture and development teams to re-design services with built-in redundancy, failover, and graceful degradation; Driving automation and observability improvements to reduce noise, increase fault detection speed, and support predictive failure mitigation; Contributing to the design and maintenance of our Business Continuity and Disaster Recovery Plan (BCDR), ensuring IoT systems remain resilient and recoverable in the face of unexpected disruptions; Owning the resilience roadmap and continuously assessing emerging threats, technologies, and architectural shifts to guide evolution of stability practices; Evangelizing a culture of resilience through internal communication, workshops, and post-incident learning programs; Deliver high-quality engineering solutions while continuously strengthening the resilience, scalability, and cost efficiency of our IoT platform; Consistently meet or exceed delivery expectations by prioritizing the highest-leverage resilience initiatives that improve customer experience, business outcomes, and financial performance; Build trusted, transparent, and outcome-driven relationships by providing clear technical direction and trade-off recommendations to business and engineering stakeholders. Educated to BSc degree level in Software Engineer or related discipline with Computer Science Strong scripting and automation experience (e.g., Python, Bash, Go, PowerShell), with a demonstrated ability to replace manual processes with reliable, scalable automation; Proven experience designing and operating high-availability, fault-tolerant systems, including the use of chaos engineering techniques and proactive failure-mitigation strategies; Experience applying Business Continuity and resilience standards (e.g., ISO 22301) in the context of real-world platform design and operational readiness; Hands-on experience designing or integrating monitoring, alerting, and automated testing frameworks to support early fault detection and system validation; Broad experience working with Linux-based platforms across on-premises and cloud environments, with an understanding of how infrastructure choices impact reliability, scalability, and recovery; Deep expertise in Site Reliability Engineering principles, including SLOs/SLIs, error budgets, observability, toil reduction, and automation, with the ability to apply them at platform and system scale to guide architectural decisions and long-term resilience strategy; Proven ability to balance long-term platform stability with delivery velocity by making clear, data-driven trade-offs; Strong understanding of security principles, practices, and standards, and the ability to incorporate them into resilient, real-world technical solutions; Deep command of telemetry, logging, and alerting ecosystems (e.g., Prometheus, Grafana, ELK, Datadog, Splunk), with the ability to design signals that enable early fault detection and informed decision-making; Experience defining meaningful SLIs and building dashboards that drive architectural insight, prioritization, and corrective action; Proven experience leading blameless post-incident reviews, root cause analysis, and systemic improvements across multiple teams; Expertise in identifying and addressing system bottlenecks, latency issues, and throughput constraints in distributed environments; Proficiency in forecasting demand, planning capacity, and managing system growth in a cost-efficient and sustainable manner; Strong track record of partnering with software engineering, infrastructure, product, and business teams to embed resilience into the full development lifecycle; Fluency in English.
Original job Resilience Engineer posted on GrabJobs ©. To flag any issues with this job please use the Report Job button on GrabJobs.
Share Job
Share Job

Auto-Apply to Resilience Engineer Jobs with your AI JobCopilot

thunder icon Auto-Apply with AI

Similar Resilience Engineer Jobs in Portugal

O GrabJobs é o portal de empregos número 1 em Portugal, conectando você rapidamente a milhares de empregos de ! Encontre os melhores empregos de em Portugal, candidate-se com apenas 1 clique e consiga um emprego hoje!

Aplicativos de Celular

Copyright © 2026 Grabjobs Pte.Ltd. All Rights Reserved.