The AI Operations Engineer is responsible for supporting the reliability and operational intelligence of cloud-hosted and AI-enabled services. This role focuses on observability, CI/CD-integrated reliability, alerting, and automated remediation to reduce noise, detect issues early, and improve incident response.
What’s on Offer at D&B Ireland
- 25 days annual leave (plus 2 paid volunteer days & 1 paid un-sick day)
- Holiday buy & sell (the option to buy or sell up to 5 additional days per year)
- Flexible working - hybrid model
- Employee Health Insurance
- Mental Health Support program
- Pension Contribution
- Family Friendly Leave (Maternity, Paternity, Parental, Marriage and Bereavement)
- Life Assurance
- Educational Assistance Program
- Life-Style Account (D&B will match your contributions up to €40 per month and can be used to claim for a range of health-related, leisure or lifestyle activities)
Responsibilities:
- Build and maintain observability standards (logs, metrics, traces, events) and dashboards using Splunk Observability.
- Configure and tune alerts and SLOs to reduce noise and improve signal quality.
- Embed observability and reliability checks into CI/CD pipelines.
- Analyze telemetry to detect anomalies and support faster incident triage and root-cause analysis.
- Implement automated runbooks and remediation workflows using scripts and tooling.
- Operate and optimize telemetry pipelines with a focus on data quality, scale, and cost efficiency.
- Support monitoring of AI/LLM services for latency, errors, and cost anomalies where applicable.
- Own and evolve observability platforms, standards, dashboards, alerting strategies, and SLOs.
Essential skills and/or Certifications:
- Bachelor’s degree in Computer Science, Artificial Intelligence or related field
- Hands-on experience with observability and monitoring tools (e.g., Splunk).
- Experience working in cloud-native environments (GCP preferred).
- Experience with CI/CD pipelines and automation (Python, Bash, or similar).
- Solid understanding of production incidents and operational workflows.
- Deep experience with automation, event correlation, and auto-remediation
- Proficiency in Microsoft Office Suites Skills
- Show an ownership mindset in everything you do; be a problem solver, be curious and be inspired to take action, be proactive, seek ways to collaborate and connect with people and teams in support of driving success.
- Continuous growth mindset, keep learning through social experiences and relationships with stakeholders, experts, colleagues and mentors as well as widen and broaden your competencies through structural courses and programs.
- Where applicable, fluency in English and languages relevant to the working market.