Site Reliability Engineer

Syarikat : Fit Pioneer Resources Sdn Bhd

Jenis Pekerjaan : Sepenuh Masa

Kuala Lumpur, Kuala Lumpur

Bilangan Pemohon

000+

Mohon Sekarang

Penerangan Pekerjaan - Site Reliability Engineer

This job is a Site Reliability Engineer role at Huawei. You might like this job because you ensure services maintain ≥99.99% reliability and drive fault recovery with a target MTTR of < 89 minutes, plus participate in dual-cloud drills & change flow management.

RM 3800 - RM 7500

28th floor of G-Tower CSI office near KLCC in KL

Full-Time

few hours ago

Job Description

Outsource to Huawei

Service Lifecycle Management:

Engage in and improve the lifecycle of services from launch to deployment, operation, and optimization for reliability and user experience.
Ensure service reliability by measuring and monitoring availability, latency, and overall system health.

KPIs:

Service Reliability: Maintain service reliability with a Service Level Agreement (SLA) of ≥ 99.99%, ensuring annual downtime does not exceed 52.56 minutes. Prevent live-network accidents due to manual operations.
Fault Recovery: Ensure Mean Time to Recovery (MTTR) meets department Key Performance Indicators (KPI). The 2022 target is less than 89 minutes, subject to annual updates. Achieve a timely closure rate of ≥ 95% for major and critical alarms. Address and resolve major and critical alarms within 24 hours.
Dual-Cloud Drill: Complete dual-cloud drills 100% as required, conducted twice a year. Ensure drill summary materials are archived properly.
Change Flow Management: Meet the annual KPI requirements for the average closure duration of change flows. The 2022 target is less than 4 days, subject to annual updates.
On-Call Duties: Provide on-call support to handle daily alerts, work orders, and upgrades.
Task Execution: Complete tasks such as OS patch upgrades and security hardening according to project schedules.

Job Requirements

Technical Skills:

Proficiency in debugging scripts and automating routine tasks across OS, network, database, or application servers.
Advanced coding experience beyond simple scripting.
Programming skills in at least one language: Java, Python, or Go.
Scripting skills in at least one of the following: Shell, Terraform, Ansible, Chef, or Puppet.
Deep understanding of Unix/Linux operating systems, virtual machines, containers, container management systems, enterprise cloud platforms, and data structures.
Recent graduates with a foundation in Linux is acceptable.
Experienced SREs should have strong Linux proficiency and experience in BSS, OCS, and CBS projects, including testing, maintenance, and a deep understanding of infrastructure, hardware, and system upgrades. Experience managing and maintaining cloud servers (Alibaba, Tencent, AWS, etc.) is a

Professional Skills: