Resilience Engineer

Empresa : Vodafone

Tipo de Emprego : Periodo Integral

Lisbon

Número de Aplicantes

000+

Let AI Supercharge Your Job Hunt!

JobCopilot scans 500,000+ company career sites daily to find jobs for you

Never miss an opportunity Save hours by auto-filling applications forms Land more interviews with tailored applications

Activate JobCopilot

Descrição do Emprego - Resilience Engineer

Developing and governing resilience strategies across system architecture, deployment, monitoring, and incident response; Defining and tracking stability KPIs (e.g., MTTD, MTTR, error budgets), partnering with performance and operations teams to meet or exceed targets; Designing and implementing fault injection testing, chaos engineering practices, and scenario-based simulations to validate platform robustness; Collaborating with product, infrastructure, architecture and development teams to re-design services with built-in redundancy, failover, and graceful degradation; Driving automation and observability improvements to reduce noise, increase fault detection speed, and support predictive failure mitigation; Contributing to the design and maintenance of our Business Continuity and Disaster Recovery Plan (BCDR), ensuring IoT systems remain resilient and recoverable in the face of unexpected disruptions; Owning the resilience roadmap and continuously assessing emerging threats, technologies, and architectural shifts to guide evolution of stability practices; Evangelizing a culture of resilience through internal communication, workshops, and post-incident learning programs; Deliver high-quality engineering solutions while continuously strengthening the resilience, scalability, and cost efficiency of our IoT platform; Consistently meet or exceed delivery expectations by prioritizing the highest-leverage resilience initiatives that improve customer experience, business outcomes, and financial performance; Build trusted, transparent, and outcome-driven relationships by providing clear technical direction and trade-off recommendations to business and engineering stakeholders. Educated to BSc degree level in Software Engineer or related discipline with Computer Science Strong scripting and automation experience (e.g., Python, Bash, Go, PowerShell), with a demonstrated ability to replace manual processes with reliable, scalable automation; Proven experience designing and operating high-availability, fault-tolerant systems, including the use of chaos engineering techniques and proactive failure-mitigation strategies; Experience applying Business Continuity and resilience standards (e.g., ISO 22301) in the context of real-world platform design and operational readiness; Hands-on experience designing or integrating monitoring, alerting, and automated testing frameworks to support early fault detection and system validation; Broad experience working with Linux-based platforms across on-premises and cloud environments, with an understanding of how infrastructure choices impact reliability, scalability, and recovery; Deep expertise in Site Reliability Engineering principles, including SLOs/SLIs, error budgets, observability, toil reduction, and automation, with the ability to apply them at platform and system scale to guide architectural decisions and long-term resilience strategy; Proven ability to balance long-term platform stability with delivery velocity by making clear, data-driven trade-offs; Strong understanding of security principles, practices, and standards, and the ability to incorporate them into resilient, real-world technical solutions; Deep command of telemetry, logging, and alerting ecosystems (e.g., Prometheus, Grafana, ELK, Datadog, Splunk), with the ability to design signals that enable early fault detection and informed decision-making; Experience defining meaningful SLIs and building dashboards that drive architectural insight, prioritization, and corrective action; Proven experience leading blameless post-incident reviews, root cause analysis, and systemic improvements across multiple teams; Expertise in identifying and addressing system bottlenecks, latency issues, and throughput constraints in distributed environments; Proficiency in forecasting demand, planning capacity, and managing system growth in a cost-efficient and sustainable manner; Strong track record of partnering with software engineering, infrastructure, product, and business teams to embed resilience into the full development lifecycle; Fluency in English.

Original job Resilience Engineer posted on GrabJobs ©. To flag any issues with this job please use the Report Job button on GrabJobs.

Auto-Apply to Similar Jobs

Share Job

Get your Resume Reviewed for Free

Automate Job Applications for Similar Jobs

Auto-Apply to Resilience Engineer Jobs with your AI JobCopilot

Auto-Apply with AI

Similar Resilience Engineer Jobs in Portugal

Get your Resume Reviewed for Free

Endereço de email

Por que você está reportando esse trabalho?

I think it’s a discriminatory or offensive

I think it’s fraudulent or a scam

I think it’s trying to sell something unrelated to the job / it’s asking for money

I think it contains incorrect or broken information

Other

Todos os Anúncios de Emprego estão sujeitos aos Terms of Service do GrabJobs. Permitimos que os usuários marquem postagens que possam estar em violação desses termos. Anúncios de emprego também podem ser marcados pela equipe de moderação do GrabJobs. No entanto, nenhum sistema de moderação é perfeito, e marcar uma postagem não garante que ela será removida.

Setup your job alert:

Frequency

E-mail

Ao ativar os alertas de emprego, eu concordo com os Terms & Privacy Policy do GrabJobs. Posso cancelar a inscrição nos alertas de emprego a qualquer momento. Pular

Resilience Engineer

Descrição do Emprego - Resilience Engineer

Similar Resilience Engineer Jobs in Portugal

Aplicativos de Celular