Observability Design and implement robust pipelines that collect and aggregate telemetry data (logs, metrics, events, and traces) from various cloud-native sources. Configure AI-driven anomaly detection to move beyond static thresholds, allowing the system to identify unusual behavior before it triggers a critical outage. Collaborate with software teams to integrate auto-instrumentation libraries into the CI/CD pipeline, ensuring every new service is "observable by default." Automate the deployment of dashboards, alerting rules, and SLO (Service Level Objective) tracking via IaC to ensure consistent visibility across development, staging, and production. Leverage AI-driven operations (AIOps) and distributed tracing to reduce Mean Time to Resolution (MTTR) and lead root-cause analysis for complex, cross-functional system failures. Monitor security event logs (e.g., flow logs, firewall logs) to identify vulnerabilities and ensure systems comply with legal regulations Design and manage automated deployment pipelines using industry standard tools (Spacelift/ HCP Terraform) Establish continuous reconciliation systems that automatically detect and correct unauthorized changes to infrastructure, maintaining the intended state without human intervention. Minimum of 12+ years of related experience with a Bachelor's degree in Computer Since Engineering or related field. Expert-level knowledge of AWS and Azure, including networking topology, IAM, and serverless architectures. Hands-on experience of implementing cloud-native Observability solution Proficiency in Prometheus, Grafana, OpenTelemetry, ELK/Splunk, and modern platforms like Datadog, New Relic, or Dynatrace. Mastery of Terraform/OpenTofu, Pulumi (for programming-based IaC). Expert-level knowledge of OpenTelemetry (OTel) and W3C Trace Context. Proficiency in Go, Python, or Bash to build custom automation scripts and CLI tools. Experience using AI-assisted tools for code generation and infrastructure cost/performance optimization. Bachelor's/Master's degree in Computer Science Engineering, or related field. Certifications in Azure, AWS, DevOps, or Terraform. Experience in large-scale enterprise environments.
All Job Ads are subject to GrabJobs’s Terms of Service. We allow users to flag postings that may be in violation of those terms. Job Ads may also be flagged by GrabJobs moderation team. However, no moderation system is perfect, and flagging a posting does not ensure that it will be removed.
Be the first to receive the latest Others Full-Time Jobs in the US.
Setup your job alert:
By activating job alerts, I agree to GrabJobs Terms & Privacy Policy. I can unsubscribe to job alerts anytime.
Skip
GrabJobs is the no1 job portal in the US, connecting you to thousands of jobs fast!
Find the best jobs in the US, apply in 1 click and get a job today!