Site Reliability (Site Reliability Engineering)

Salary :

$9,000 - 9,500 monthly

Company : ITCAN PTE. LIMITED

Job Type : Full Time

30 Cecil Street Prudential Tower 049712

Number of Applicants

000+

Apply Now

Let AI Supercharge Your Job Hunt!

JobCopilot scans 500,000+ company career sites daily to find jobs for you

Never miss an opportunity Save hours by auto-filling applications forms Land more interviews with tailored applications

Activate JobCopilot

Job Description - Site Reliability (Site Reliability Engineering)

Job Description

Role Overview

We are seeking a highly skilled Site Reliability Engineer (SRE) to lead the reliability, scalability, and performance of our operations. You will be the primary owner of the AWS cloud infrastructure and the end-to-end DevOps pipelines. Your mission is to treat "operations as a software problem," automating away manual toil and ensuring our AWS environment delivers a seamless experience for both agents and customers.

Key Responsibilities

1. AWS Connect & Service Desk Reliability

- Infrastructure Management: Design, deploy, and maintain the AWS Connect ecosystem, including Contact Flows, Lambda integrations, Lex Bots, and claim phone numbers using Infrastructure as Code (Terraform/CloudFormation).

- Service Availability: Maintain the "always-on" state of the service desk. Manage voice and chat channel reliability, ensuring low latency and high audio quality.

- Integration Support: Oversee the reliability of integrations between AWS Connect and ITSM tools (e.g., ServiceNow, Jira Service Management, or Salesforce).

- Capacity Planning: Proactively monitor and scale telephony quotas, concurrent tasks, and backend compute resources to handle peak service desk traffic.

2. Cloud Infrastructure & Security

- AWS Foundation: Manage core AWS services supporting the platform (EC2, ECS/EKS, S3, Lambda, DynamoDB, and VPC networking).

- Security & Compliance: Implement IAM least-privilege policies, encrypt data at rest/transit (KMS), and ensure the platform meets industry standards (SOC2, HIPAA, or PCI-DSS if applicable).

- Cost Optimization: Monitor cloud spend and implement FinOps practices to optimize AWS Connect and infrastructure costs.

3. DevOps & CI/CD Pipeline Engineering

- Pipeline Ownership: Build and maintain robust CI/CD pipelines (GitLab CI, GitHub Actions, or Jenkins) to automate the deployment of Lambda functions, Lex bots, and infrastructure changes.

- Automated Testing: Integrate automated testing into the pipeline to validate contact flow logic and API integrations before they hit production.

- Reliability as Code: Standardize deployment patterns to ensure environment parity between Sandbox, Staging, and Production.

4. Observability & Incident Response

- Monitoring & Alerting: Develop comprehensive dashboards and alerts using CloudWatch, X-Ray, and third-party tools (Grafana, Datadog, or Splunk) to track SLIs.

- Incident Management: Lead troubleshooting for critical production outages. Conduct blameless post-mortems to identify root causes and prevent recurrence.

- Error Budgets: Define and manage Service Level Objectives (SLOs) and Error Budgets for the service desk platform.

Qualifications

Technical Skills:

- AWS Expertise: Deep knowledge of AWS Connect (Contact Flows, CTRs, CCP customization) and general AWS services (Lambda, DynamoDB, S3, IAM).

- Infrastructure as Code (IaC): Proficient in Terraform (preferred), CloudFormation, or AWS CDK.

- CI/CD Tools: Experience building pipelines in GitLab, GitHub Actions, or AWS CodePipeline.

- Programming: Strong scripting skills in Python or Node.js (specifically for AWS Lambda development).

- Observability: Hands-on experience with AWS CloudWatch, Kinesis (for stream analysis), and logging stacks (ELK or Splunk).

Experience & Education:

- 3+ years of experience in an SRE or DevOps role.

- 2+ years of hands-on experience specifically with Amazon Connect or similar CCaaS (Contact Center as a Service) platforms.

- Experience supporting high-volume Service Desk or Call Center environments.

- Preferred Certifications: AWS Certified DevOps Engineer – Professional or AWS Certified SysOps Administrator

Original job Site Reliability (Site Reliability Engineering) posted on GrabJobs ©. To flag any issues with this job please use the Report Job button on GrabJobs.

Apply Now

Auto-Apply to Similar Jobs

Share Job

Get your Resume Reviewed for Free

Automate Job Applications for Similar Jobs

About the Company

ITCAN PTE. LIMITED

ITCAN PTE LTD , headquartered in Singapore, offers a full spectrum of integrated IT S/W Solutions & Services. Empowered to deliver enterprise client Server or web based solutions across the entire value chain, spanning on - site consulting services to turn key S/ W projects Regional Offices :...

Browse the Top Paying Jobs Technology Salaries

Browse Technology Salaries

🔎

People also search for

Back End Developer Jobs

Technology Jobs

Part-Time Jobs

Similar Jobs in Singapore

Get your Resume Reviewed for Free

Email address

Why are you reporting this job?

I think it’s a discriminatory or offensive

I think it’s fraudulent or a scam

I think it’s trying to sell something unrelated to the job / it’s asking for money

I think it contains incorrect or broken information

Other

All Job Ads are subject to GrabJobs’s Terms of Service. We allow users to flag postings that may be in violation of those terms. Job Ads may also be flagged by GrabJobs moderation team. However, no moderation system is perfect, and flagging a posting does not ensure that it will be removed.

Setup your job alert:

Frequency

By activating job alerts, I agree to GrabJobs Terms & Privacy Policy. I can unsubscribe to job alerts anytime. Skip