At Tribute Technology, we make end-of-life celebrations memorable, meaningful, and effortless through thoughtful and innovative technology solutions. Our mission is to help communities around the world celebrate life and pay tribute to those we love. Our comprehensive platform brings together software and technology to provide a fully integrated experience for all users, whether that is a family, a funeral home, or an online publisher. We are the market leader in the US and Canada, with global expansion plans and a growing international team of more than 400 individuals in the US, Canada, Philippines, and Ukraine.
ABOUT YOU:
We are looking for a detail-oriented and experienced Site Reliability Engineer to join our team. The Site Reliability Engineer will be responsible for creating and implementing scalable work solutions to meet system and application performance goals. You will also be responsible for troubleshooting/Monitoring/ fixing system errors and resolving any relevant issues.
WHAT YOU'LL DO:
- Develop and refine automation and tooling to drive efficiency and reliability in our systems.
- Contribute to the design and implementation of monitoring, alerting, and logging solutions to proactively identify and address potential runtime issues
- Participate in on-call rotations and resolve complex production issues. Identifying bad BOT and scrapers
- Participate in incident response and root cause analysis efforts to ensure the stability and resilience of the applications.
- Accurately document all interactions, troubleshooting steps, and resolutions in the ticketing system to maintain a clear record of activities and support continuous learning within the team.
- Assist in the creation and continuous updating of support documentation, FAQs, and knowledge base articles to enhance client self-service and reduce repeat inquiries.
- Strong experience with cloud platforms (AWS, GCP, APACHE) and containerization technologies (New Relic, Kubernetes).
- Solid understanding of CI/CD pipelines and tools
- Monitoring, Alerting & Observability Concepts and Technologies such as Data Dog, New Relic
- Identify and resolve technical issues, and implement automation opportunities. Take ownership in creating and implementing monitoring procedures, tools, and standards to ensure system and application security.
- Experience using Amazon Web Services (AWS) and/or MicrosoftAzure for cloud-hosted operations is a huge plus.
#J-18808-Ljbffr