This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Site Reliability Engineer in Brazil.
In this role, you will help operate and support large-scale, distributed production systems that power critical public infrastructure. You will gain hands-on exposure to real-world reliability engineering, observability, and incident response while working alongside experienced engineers in a mentorship-driven environment. This position offers a unique opportunity to build strong foundations in site reliability engineering and cloud operations. You will contribute directly to improving system performance, stability, and user experience for global developers and users. Operating in a remote-first, collaborative setting, you will learn through real production challenges and continuous improvement. If you are curious, proactive, and eager to grow in infrastructure and reliability engineering, this role provides exceptional learning and career development potential.
Accountabilities:
- Monitor production systems, dashboards, logs, and alerts to ensure high availability and performance across distributed environments.
- Assist in incident detection, triage, escalation, and resolution, following structured on-call rotations with mentorship support.
- Maintain, follow, and continuously improve runbooks, operational procedures, and incident response workflows.
- Support routine operational tasks such as service restarts, upgrades, configuration changes, and system maintenance.
- Contribute to the improvement of monitoring, logging, and alerting systems to enhance observability and reduce operational noise.
- Collaborate with cross-functional teams to investigate production issues, assess user impact, and support root-cause analysis.
- Continuously build knowledge of distributed systems, cloud infrastructure, networking, and blockchain fundamentals.
Requirements:
- Foundational understanding of Linux systems, system processes, and basic networking concepts.
- Familiarity with at least one scripting or programming language such as Python, Bash, or Go.
- Strong interest in site reliability engineering, production operations, and infrastructure monitoring.
- Clear written and verbal communication skills, with a collaborative mindset and willingness to learn.
- Ability to stay calm, structured, and responsive during incidents and operational events.
- Preferred qualifications include exposure to cloud platforms (AWS or GCP), containerization and orchestration tools (Docker, Kubernetes), observability stacks (Grafana, Prometheus, Datadog, ELK), and basic understanding of blockchain or Web3 systems.
Benefits:
- Fully remote, LATAM-friendly work environment.
- Comprehensive medical, dental, and vision healthcare coverage, depending on country and plan.
- Company-matched retirement plan where applicable.
- Home office setup allowance and monthly internet or phone reimbursement.
- Flexible time-off policy supporting work-life balance.
- Company-issued laptop and modern work equipment.
- Mental health support, wellness programs, and family-focused benefits.
- Additional benefits and coverage depending on geographic location.
Why Apply Through Jobgether?
We use an AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements. Our system identifies the top-fitting candidates, and this shortlist is then shared directly with the hiring company. The final decision and next steps (interviews, assessments) are managed by their internal team.
We appreciate your interest and wish you the best!
Data Privacy Notice: By submitting your application, you acknowledge that Jobgether will process your personal data to evaluate your candidacy and share relevant information with the hiring employer. This processing is based on legitimate interest and pre-contractual measures under applicable data protection laws (including GDPR). You may exercise your rights (access, rectification, erasure, objection) at any time.
#LI-CL1
We may use artificial intelligence (AI) tools to support parts of the hiring process, such as reviewing applications, analyzing resumes, or assessing responses. These tools assist our recruitment team but do not replace human judgment. Final hiring decisions are ultimately made by humans. If you would like more information about how your data is processed, please contact us.