We are seeking a skilled Site Reliability Engineer (SRE) with expertise in webMethods to join our team. This role bridges the gap between software development and IT operations, focusing on automation, reliability, and performance of integration platforms. You will be responsible for ensuring the availability, scalability, and resilience of systems powered by WebMethods and other enterprise technologies.
Duties & Responsibilities
Engage and collaborate with cross -functional Product, Engineering, Security, Operations, Infrastructure teams and Vendors to improve MTTD and MTTR
Design, develop, and implement infrastructure & application monitoring to ensure optimal platform availability and performance
Research, analyze and recommend approaches for solving challenging operational issues
Maintain fault -tolerant webMethods integrations and infrastructure.
Support automated failover, load balancing, and redundancy strategies.
Develop and maintain robust knowledge documentation for the Site Reliability Engineering team and its partners
Proactively perform analysis and identify opportunities to innovate, automate, improve efficiency, and achieve cost savings
Perform webMethods version upgrades and environment migrations
Requirements
Basic Qualifications
Bachelor’s degree in Computer Science or related field with continuous and progressive experience
Minimum of 4 years of related experience working with some of these technologies:
Hands -on experience with WebMethods Integration Server, Broker, and MWS.
Experience with Jenkins and CI/CD.
Experience with Apache ActiveMQ.
Strong knowledge of Linux operating systems.
Strong understanding of distributed systems and cloud platforms (Azure, GCP).
Experience working with agile methodologies – Scrum, Kanban & SAFe (Scaled Agile Framework) principles.
Excellent troubleshooting, communication, and documentation skills.
Application Performance Management and Monitoring tools such as New Relic, AppDynamics, SiteSpect, and Datadog
Infrastructure monitoring tools like Zabbix, and Prometheus
Databases eg: MongoDB, Oracle, Couchbase, Redis, MySQL
WebMethods Suite of product version 10.x and 11.x
Log Analytics tools like Splunk, and ELK/Elastic
Preferred Qualifications
Awareness of AI/ML applications in observability and incident response. Familiarity with LLMs and AI -driven automation tools. Understanding of AI -enhanced anomaly detection and predictive analytics