I

Site Reliability (Site Reliability Engineering)

salary Salary :

$9,000 - 9,500 monthly

icon briefcase Job Type : Full Time

Number of Applicants

 : 

000+

Click to reveal the number of candidates who applied for this job.

Let AI Supercharge Your Job Hunt!

JobCopilot scans 500,000+ company career sites daily to find jobs for you

Never miss an opportunity Save hours by auto-filling applications forms Land more interviews with tailored applications
happy man
thunder iconActivate JobCopilot

Job Description - Site Reliability (Site Reliability Engineering)

Job Description

Role Overview

We are seeking a highly skilled Site Reliability Engineer (SRE) to lead the reliability, scalability, and performance of our operations. You will be the primary owner of the AWS cloud infrastructure and the end-to-end DevOps pipelines. Your mission is to treat "operations as a software problem," automating away manual toil and ensuring our AWS environment delivers a seamless experience for both agents and customers.

Key Responsibilities

1. AWS Connect & Service Desk Reliability

- Infrastructure Management: Design, deploy, and maintain the AWS Connect ecosystem, including Contact Flows, Lambda integrations, Lex Bots, and claim phone numbers using Infrastructure as Code (Terraform/CloudFormation).

- Service Availability: Maintain the "always-on" state of the service desk. Manage voice and chat channel reliability, ensuring low latency and high audio quality.

- Integration Support: Oversee the reliability of integrations between AWS Connect and ITSM tools (e.g., ServiceNow, Jira Service Management, or Salesforce).

- Capacity Planning: Proactively monitor and scale telephony quotas, concurrent tasks, and backend compute resources to handle peak service desk traffic.

2. Cloud Infrastructure & Security

- AWS Foundation: Manage core AWS services supporting the platform (EC2, ECS/EKS, S3, Lambda, DynamoDB, and VPC networking).

- Security & Compliance: Implement IAM least-privilege policies, encrypt data at rest/transit (KMS), and ensure the platform meets industry standards (SOC2, HIPAA, or PCI-DSS if applicable).

- Cost Optimization: Monitor cloud spend and implement FinOps practices to optimize AWS Connect and infrastructure costs.

3. DevOps & CI/CD Pipeline Engineering

- Pipeline Ownership: Build and maintain robust CI/CD pipelines (GitLab CI, GitHub Actions, or Jenkins) to automate the deployment of Lambda functions, Lex bots, and infrastructure changes.

- Automated Testing: Integrate automated testing into the pipeline to validate contact flow logic and API integrations before they hit production.

- Reliability as Code: Standardize deployment patterns to ensure environment parity between Sandbox, Staging, and Production.

4. Observability & Incident Response

- Monitoring & Alerting: Develop comprehensive dashboards and alerts using CloudWatch, X-Ray, and third-party tools (Grafana, Datadog, or Splunk) to track SLIs.

- Incident Management: Lead troubleshooting for critical production outages. Conduct blameless post-mortems to identify root causes and prevent recurrence.

- Error Budgets: Define and manage Service Level Objectives (SLOs) and Error Budgets for the service desk platform.

Qualifications

Technical Skills:

- AWS Expertise: Deep knowledge of AWS Connect (Contact Flows, CTRs, CCP customization) and general AWS services (Lambda, DynamoDB, S3, IAM).

- Infrastructure as Code (IaC): Proficient in Terraform (preferred), CloudFormation, or AWS CDK.

- CI/CD Tools: Experience building pipelines in GitLab, GitHub Actions, or AWS CodePipeline.

- Programming: Strong scripting skills in Python or Node.js (specifically for AWS Lambda development).

- Observability: Hands-on experience with AWS CloudWatch, Kinesis (for stream analysis), and logging stacks (ELK or Splunk).

Experience & Education:

- 3+ years of experience in an SRE or DevOps role.

- 2+ years of hands-on experience specifically with Amazon Connect or similar CCaaS (Contact Center as a Service) platforms.

- Experience supporting high-volume Service Desk or Call Center environments.

- Preferred Certifications: AWS Certified DevOps Engineer – Professional or AWS Certified SysOps Administrator

Original job Site Reliability (Site Reliability Engineering) posted on GrabJobs ©. To flag any issues with this job please use the Report Job button on GrabJobs.
Share Job
Share Job

About the Company

ITCAN PTE. LIMITED

ITCAN PTE LTD , headquartered in Singapore, offers a full spectrum of integrated IT S/W Solutions & Services. Empowered to deliver enterprise client Server or web based solutions across the entire value chain, spanning on - site consulting services to turn key S/ W projects Regional Offices :...

Read more about the company

Auto-Apply to Similar Jobs with your AI JobCopilot

thunder icon Auto-Apply with AI
💰

Technology Salaries

Similar Jobs in Singapore

GrabJobs is the no1 job portal in Singapore, connecting you to thousands of jobs fast! Find the best jobs in Singapore, apply in 1 click and get a job today!

Mobile Apps

Copyright © 2026 Grabjobs Pte.Ltd. All Rights Reserved.