Job Description: LEAD ADMINISTRATOR L1
Dublin 3 Days a week/ 1 Year
Job Description â SRE (Observability & Database Reliability Engineer)
Mandatory Skills: Oracle Database Admin.
Experience: 5\-8 Years.
Role Summary:
We are seeking a Site Reliability Engineer (SRE) with strong Database Reliability and Observability expertise to ensure high availability, performance, and operational visibility of businessâcritical platforms. This role has a strong emphasis on dashboards, observability, Splunk, and operational reporting, along with handsâon database operations in complex production environments.
Key Responsibilities:
SRE & Reliability Engineering
\- Own endâtoâend reliability, availability, and performance of applications and database platforms.
\- Define, implement, and track SLIs, SLOs, and error budgets.
\- Proactively identify reliability risks using metrics, trends, and capacity analysis.
\- Lead production incident management, root cause analysis (RCA), and postâincident reviews.
\- Drive automation to reduce operational toil and improve MTTR.
\- Participate in onâcall rotations and support 24x7 production environments.
Observability, Dashboards & Reporting (Primary Focus)
\- Design and maintain endâtoâend observability covering metrics, logs, alerts, and traces.
\- Build and manage realâtime operational and executive dashboards for system health, availability, latency, and database performance.
\- Strong handsâon experience with Splunk including log ingestion, SPL queries, dashboards, alerts, and reports.
\- Correlate application, infrastructure, and database events to detect issues proactively.
\- Create and publish operational reports (daily / weekly / monthly) covering availability, incidents, SLO compliance, performance KPIs, and capacity trends.
\- Translate technical metrics into actionable insights for engineering and leadership teams.
Database Reliability & Operations
\- Support and operate enterprise databases such as PostgreSQL or Oracle (mandatory experience in at least one).
\- Monitor and tune database performance including queries, indexes, and resource utilization.
\- Design and support high availability, replication, backup, and disaster recovery solutions.
\- Perform database upgrades, patching, migrations, and routine health checks.
\- Integrate database monitoring and logs with observability platforms.
Required Skills & Experience
\- 10+ years of experience in SRE, Production Support, DevOps, or Reliability Engineering roles.
\- Strong expertise in observability and monitoring tools, with mandatory handsâon experience in Splunk.
\- Proven experience in dashboard building and operational reporting.
\- Strong handsâon experience with PostgreSQL or Oracle databases.
\- Solid Linux/Unix administration and troubleshooting skills.
\- Experience with incident response, RCA, and production onâcall support.
\- Proficiency in scripting using Python, Shell, or Bash.
\- Strong analytical and communication skills.
Preferred Skills
\- Experience with cloud platforms such as AWS or Azure.
\- Exposure to Kubernetes, Docker, and containerized environments.
\- Experience with Infrastructure as Code tools such as Terraform or Ansible.
\- Knowledge of capacity planning, forecasting, and performance baselining.
\- Experience supporting regulated or highâavailability systems.
Í
Deliver
No
|
Performance Parameter
|
Measure
|
1
|
Operations of the tower
|
SLA adherence
Knowledge management
CSAT/ Customer Experience
Identification of risk issues and mitigation plans
Knowledge management
|
2
|
New projects
|
Timely delivery
Avoid unauthorised changes
No formal escalations
|