This job is a Site Reliability Engineer role at Huawei. You might like this job because you ensure services maintain ≥99.99% reliability and drive fault recovery with a target MTTR of < 89 minutes, plus participate in dual-cloud drills & change flow management.
RM 3800 - RM 7500
28th floor of G-Tower CSI office near KLCC in KL
Full-Time
few hours ago
Job Description
Outsource to Huawei
Service Lifecycle Management:
- Engage in and improve the lifecycle of services from launch to deployment, operation, and optimization for reliability and user experience.
- Ensure service reliability by measuring and monitoring availability, latency, and overall system health.
KPIs:
- Service Reliability: Maintain service reliability with a Service Level Agreement (SLA) of ≥ 99.99%, ensuring annual downtime does not exceed 52.56 minutes. Prevent live-network accidents due to manual operations.
- Fault Recovery: Ensure Mean Time to Recovery (MTTR) meets department Key Performance Indicators (KPI). The 2022 target is less than 89 minutes, subject to annual updates. Achieve a timely closure rate of ≥ 95% for major and critical alarms. Address and resolve major and critical alarms within 24 hours.
- Dual-Cloud Drill: Complete dual-cloud drills 100% as required, conducted twice a year. Ensure drill summary materials are archived properly.
- Change Flow Management: Meet the annual KPI requirements for the average closure duration of change flows. The 2022 target is less than 4 days, subject to annual updates.
- On-Call Duties: Provide on-call support to handle daily alerts, work orders, and upgrades.
- Task Execution: Complete tasks such as OS patch upgrades and security hardening according to project schedules.
Job Requirements
Technical Skills:
- Proficiency in debugging scripts and automating routine tasks across OS, network, database, or application servers.
- Advanced coding experience beyond simple scripting.
- Programming skills in at least one language: Java, Python, or Go.
- Scripting skills in at least one of the following: Shell, Terraform, Ansible, Chef, or Puppet.
- Deep understanding of Unix/Linux operating systems, virtual machines, containers, container management systems, enterprise cloud platforms, and data structures.
- Recent graduates with a foundation in Linux is acceptable.
- Experienced SREs should have strong Linux proficiency and experience in BSS, OCS, and CBS projects, including testing, maintenance, and a deep understanding of infrastructure, hardware, and system upgrades. Experience managing and maintaining cloud servers (Alibaba, Tencent, AWS, etc.) is a
Professional Skills:
- In-depth knowledge of the SRE role and DevOps processes.
- Strong observation and critical thinking skills to handle business emergencies.
- Adaptability to dynamic environments and proficient problem-solving skills.
- Excellent written and verbal communication skills.
Educational Background:
- Diploma or higher in Computer Science, Electronics, or Communication
- Malay
Skills
Python (Programming Language)
Unix
Additional Info
Company Activity
Last active - 1 hour ago
Experience Level
1 - 5 Years of Experience
Junior Executive
Job Specialisation
Cybersecurity / Network Security, Hardware / Network / Infrastructure (On-Premises / Cloud), System & IT Helpdesk / Database Administrator
28th floor of G-Tower CSI office near KLCC in KL
#J-18808-Ljbffr