Senior Azure Site Reliability Engineer ensuring the reliability and performance of the Vew SaaS platform on Microsoft Azure. Collaborating with teams to design and implement resilient systems.
Responsibilities
Implement and maintain highly available, scalable, and fault-tolerant systems on Azure
Monitor system health and performance metrics to ensure reliability and proactively address issues
Develop and maintain automation scripts and tools for provisioning, deployment, monitoring, and scaling of services
Configure and maintain monitoring solutions to provide real-time visibility into system health and performance
Respond to and resolve incidents, including root cause analysis, mitigation, and communication with stakeholders
Ensure systems and infrastructure adhere to security best practices and compliance requirements
Identify areas for optimization and implement solutions to improve system reliability, performance, and efficiency
Requirements
Bachelor's degree in Computer Science, Engineering, or related field
Proven experience as a Site Reliability Engineer or similar role, preferably in a SaaS environment
Strong proficiency in Microsoft Azure services, including compute, networking, storage, and monitoring
Experience with automation tools and scripting languages such as PowerShell
Solid understanding of containerization technologies (e.g., Docker, Kubernetes) and orchestration tools
Experience with Bicep/Terraform and ARM templates for Infrastructure as Code (IaC)
Hands-on experience with monitoring and logging tools such as Azure Monitor, Grafana, Prometheus, or Datadog
Knowledge of security best practices, compliance standards (e.g., ISO27001, SOC 2, GDPR), and relevant regulations
Excellent problem-solving skills and the ability to troubleshoot complex technical issues
Strong communication and collaboration skills, with the ability to work effectively in a cross-functional team environment
Azure certifications such as Azure Administrator Associate or Azure Solutions Architect Expert are a 'nice to have'.
Senior Site Reliability Engineer driving observability and reliability for business - critical systems at Incedo. Collaborating with engineering teams to enhance system resilience and performance.
DevSecOps Specialist securing the software development lifecycle at Vanguard. Collaborating with teams to improve application security tooling and processes, and provide development guidance.
Site Reliability Engineer automating infrastructure deployment for Scaleway's sovereign cloud products. Collaborating with product teams to enhance observability and reliability of the platform.
Reliability Engineer responsible for equipment reliability and safety using data - driven analysis for Wood in Aberdeen. Focus on proactive maintenance and operational efficiency.
Principal Safety and Reliability Engineer developing and supporting safety design for mission - critical aerospace systems. Engaging in design reviews and ensuring compliance with requirements.
Cloud DevOps Engineer playing a pivotal role in developing migration plans for Coast Guard Cloud Architecture. Collaborating with teams to ensure effectiveness and best practices in cloud implementation.
Reliability Engineer III at Daimler Truck developing propulsion technology solutions for electrified and conventional axle components. Leading testing and validation for complex powertrain systems.
Electrical Reliability Engineer at Marathon Petroleum maintaining electrical equipment and systems. Collaborating with cross - functional teams and ensuring compliance with electrical codes and standards.
Senior DevOps Engineer focused on GCP platform engineering at healthtech startup. Collaborating with teams to enhance compute and networking capabilities.