Job Purpose: - To ensure Incident Management processes are being followed to ensure quality services by managing the delivery services of support teams regionally
- To ensure notification and escalation to support and management, to ensure right level of attention is given
- To ensure robust and effective alert management operational model
- To establish and maintain a strong working relationship within SRE and across with other line of business & align the IT strategy with business needs and objectives
- To transform and evolve to Site Reliability Engineering
Key Accountabilities: • Prevent reoccurrence of alerts and incident
• Proactively identify issues through incident and problem trending analysis
• Assist in workaround identification and resolution if needed
• Improve Mean Time To Detect (MTTD) / Mean Time To Repair (MTTR) of and incident life cycles through process monitoring, automation and optimizing effort spent on various stage
• Enhance operational productivity through process automation / AI Ops
Responsibilities: • Notifying the participants in the Incident Management process when standards and procedures are not being followed
• Rerouting misdirected Incidents that have not been handled in a timely manner
• Escalating issues in a timely and appropriate fashion
• Identifying incidents which need special attention or escalation
• Identifying exceptions and deviations, as well as management of these situations
• Notifying the participants in the process when standards and procedures are not being followed
• Facilitating the resolution of issues with items not complying with the process
• Overseeing creation and availability of process reports, analyzing reports
• Overseeing completeness and integrity of information collected to conduct daily operations
• Establishing measurements and targets to improve process effectiveness and efficiency
• Acting as an escalation focal point for all roles involved in the process
• Collaborate with stakeholders on process improvement, testing, cutover verification, implementation.
Requirements: • Bachelor degree, and preferably in relevant field of study; with majors in IT.
• Ability to work under pressure (especially during incident/crisis management)
• Adaptable to dynamic/volatile environment and know when/how to take control during a crisis.
• Good communication skills (ability to communicate clearly with different levels of management/staff)
• Focus on the big picture and able to balance business and control requirements
• Able to manage challenges and know when to raise to senior management
• Good knowledge on the utilisation and management of IT-related hardware, software, data communication facilities, procedures and peoples.
• Good understanding of principles and processes of ITIL processes and discipline and adopting best practices into the current environment
• Sound knowledge in cost allocation/recovery methodologies
• Good knowledge of IT trends and directions and their implications to data centre operations
• Good problem management skills
• Supervise and manage a team of operation staff to ensure delivery of quality incident management services to the Bank.
• Work effectively and closely with technical support staff, regulators, other IT groups such as infrastructure & applications, architecture & engineering, Finance, internal and external auditors.