Senior Azure Site Reliability Engineer ensuring the reliability and performance of the Vew SaaS platform on Microsoft Azure. Collaborating with teams to design and implement resilient systems.
Responsibilities
Implement and maintain highly available, scalable, and fault-tolerant systems on Azure
Monitor system health and performance metrics to ensure reliability and proactively address issues
Develop and maintain automation scripts and tools for provisioning, deployment, monitoring, and scaling of services
Configure and maintain monitoring solutions to provide real-time visibility into system health and performance
Respond to and resolve incidents, including root cause analysis, mitigation, and communication with stakeholders
Ensure systems and infrastructure adhere to security best practices and compliance requirements
Identify areas for optimization and implement solutions to improve system reliability, performance, and efficiency
Requirements
Bachelor's degree in Computer Science, Engineering, or related field
Proven experience as a Site Reliability Engineer or similar role, preferably in a SaaS environment
Strong proficiency in Microsoft Azure services, including compute, networking, storage, and monitoring
Experience with automation tools and scripting languages such as PowerShell
Solid understanding of containerization technologies (e.g., Docker, Kubernetes) and orchestration tools
Experience with Bicep/Terraform and ARM templates for Infrastructure as Code (IaC)
Hands-on experience with monitoring and logging tools such as Azure Monitor, Grafana, Prometheus, or Datadog
Knowledge of security best practices, compliance standards (e.g., ISO27001, SOC 2, GDPR), and relevant regulations
Excellent problem-solving skills and the ability to troubleshoot complex technical issues
Strong communication and collaboration skills, with the ability to work effectively in a cross-functional team environment
Azure certifications such as Azure Administrator Associate or Azure Solutions Architect Expert are a 'nice to have'.
Senior DevOps Engineer responsible for leading CI/CD pipeline design and optimization. Collaborating with teams to drive DevOps maturity across the enterprise while managing infrastructure automation.
Cloud Operations Engineer ensuring reliable performance of cloud systems at 2Innovate. Focused on automation, incident management, cloud security, and infrastructure monitoring in cloud environments.
AWS DevOps Engineer responsible for delivering scalable digital experiences for EXL's MarTech ecosystem. Engaging in development, maintenance, and collaboration across stakeholders and services.
Senior Site Reliability Engineer managing critical infrastructure at Hornetsecurity. Collaborating with product teams to ensure performance and reliability across services.
Site Reliability Engineer enhancing platform reliability for AI workflows at WRITER. Overseeing automated solutions and cloud infrastructure supporting high - trafficked AI systems.
Site reliability engineer ensuring 24/7 availability of AI - powered workflows at WRITER. Developing and automating robust platforms for high - traffic AI demands.
Site Reliability Engineer maintaining cloud infrastructure for Tricentis SaaS Products. Collaborating closely with engineers, focusing on observability and performance.