Intermediate Site Reliability Engineer at PointClickCare, designing AI-powered solutions for observability and automation in healthcare systems. Focus on building resilient infrastructure using AI and ML.
Responsibilities
Build ML-based anomaly detection and pattern recognition systems.
Enhance telemetry with smart tagging and metadata for better AI insights.
Develop event-driven workflows and self-healing systems using AI triggers.
Automate incident response with generative AI and custom AI agent orchestration.
Use time-series forecasting and predictive modelling to anticipate failures.
Optimise infrastructure with AI-powered autoscaling and cost-aware resource allocation.
Build scalable, fault-tolerant systems in a cloud-native environment.
Participate in on-call rotations and lead incident response for critical systems.
Skilled in API integration for streamlined data exchange and system connectivity.
Run internal AIOps workshops and help teams adopt AI maturity models.
Champion responsible AI practices and ethical automation.
Requirements
5+ years experience in software engineering.
Experience with SRE principles.
Experience with AI/ML in production environments
A passion for automation, intelligent systems, and operational excellence
Strong debugging, problem-solving, and system design skills
Senior DevOps Engineer responsible for cloud infrastructure and deployments. Optimizing AWS services and ensuring system security and reliability for Verizon.
Senior DevOps Engineer responsible for automating infrastructure and building CI/CD pipelines for collaborative robotics company. Collaborating with global engineering teams from the Bangalore office.
Site Reliability Engineer Intern at Tencent working on gaming services and cloud native solutions. Collaborating with global teams to eliminate toil and enhance reliability.
Cloud/DevOps Specialist at N5X managing and optimizing critical cloud infrastructures for Brazilian energy trading. Collaborating with a multidisciplinary team to ensure high availability and performance.
Cloud/Devops Specialist responsible for designing a hybrid architecture combining cloud and on - premises infrastructure for energy trading systems. Collaborating with a multidisciplinary team in a dynamic environment.
Reliability Engineering Specialist utilizing reliability tools and models to improve asset performance at Enbridge. Collaborating across teams to guide investment decisions for safe operations.
DevOps Engineer responsible for structuring and supporting cloud DevOps architecture in Brazil. Working strategically on automation and CI/CD practices with development teams in Pernambuco.
DevSecOps Software Engineer developing secure CI/CD pipelines for Boeing's military software systems. Collaborate with cross - functional teams and implement automation and security best practices.
DevOps Manager responsible for managing a team for multi - cloud solutions supporting the USAF Cloud One project. Focus on scalable cloud - native solutions and CI/CD practices.
Lead Site Reliability Engineer overseeing SRE practices across Azure and GCP platforms. Driving reliability improvements and leading a team at Lloyds Banking Group.