Site Reliability Engineer ensuring reliability and performance of Equisoft’s SaaS applications. Collaborating with development and operations teams while managing incidents and optimizing infrastructure.
Responsibilities
Monitor daily SaaS operations to ensure consistent performance, reliability, and availability of services for customers.
Ensure adherence to SLAs (Service Level Agreements) by proactively monitoring and addressing potential issues to maintain high uptime and service quality.
Execute incident management procedures for outages or performance issues, including troubleshooting, root cause analysis, and post-mortem reviews.
Work on improving the operational efficiency of SaaS applications by fine-tuning infrastructure, monitoring systems, and optimizing performance.
Ensure all SaaS applications meet required security and compliance standards, conducting regular audits and addressing vulnerabilities proactively.
Identify areas for process improvement, driving automation initiatives to streamline workflows, reduce manual work, and enhance operational efficiency.
Act as a point of escalation for customer issues related to SaaS applications, working with support teams to resolve high-priority cases.
Monitor, analyze, and report on operational metrics (uptime, response times, incident counts), providing regular updates to stakeholders with updated documentation.
Participate in disaster recovery exercises, ensuring regular backups and testing recovery processes for business continuity.
Ensure SaaS operations align with industry standards and best practices, to provide a structured and effective service management approach.
Work closely with development and operations teams to ensure seamless integration and deployment.
Address and resolve production issues promptly to minimize downtime.
Participating in on-call incidents, troubleshooting issues and performing root cause analysis on rotations to ensure 24/7 system availability.
Requirements
Technical Bachelor’s Degree in Computer Engineering or Information Technology or College Diploma combined with 3 years of relevant experience
3+ years of experience in a similar role (Site Reliability Engineer, Production Support Engineer, DevOps, Programmer or related)
Proven track record of managing and optimizing production systems
Strong knowledge of system administration, networking, and Azure cloud services
Experience with CI/CD pipelines and infrastructure as code (e.g. Terraform)
Experience with monitoring and alerting tools (e.g. Azure Monitor, Application Insights)
Hands-on experience with Azure Kubernetes Service (AKS), Azure Container Instances, and container orchestration
Experience working closely with software development teams
Ability to read and understand code (exemple .Net, C#, Java or Python) to assist in debugging and identifying root causes of issues
Familiarity with application logs, stack traces, and performance profiling tools to pinpoint problems efficiently
Solid understanding of Azure SQL Database, Cosmos DB, and other Azure data services
Excellent knowledge of English (spoken and written)
Benefits
medical
dental
term life/personal accident coverage
wellness sessions
telemedicine program
flexible hours
Educational Support (LinkedIn Learning, LOMA Courses and Equisoft University)
Dev Ops Engineer at DATAGROUP in Rostock managing IT applications and cloud technologies. Collaborating with teams to support client IT transformations in a flexible work environment.
SRE Technical Manager leading reliability engineering teams ensuring performance for Navy IT services. Manage teams, collaborate on automation, and drive continuous improvement in a critical systems environment.
DevOps Engineer responsible for optimizing and securing cloud deployment processes at Axi. Collaborating across technology teams to promote best practices in DevOps methodologies.
Azure Cloud Engineer ensuring safe and scalable cloud environment at Schoologica while contributing to innovative educational solutions with modern cloud technologies.
DevSecOps Engineer responsible for enhancing Thales' secure hosting platforms in public and private clouds. Collaborating with teams to apply modern practices and build resilient infrastructures.
DevOps Engineer specializing in AWS Cloud Infrastructure in a hybrid position. Collaborating within a supportive team to build modern infrastructure for VM - based applications.
Develops high - automation services in Golang or Java within AWS, Kubernetes, and Azure. Supports teams in building secure applications while working in a hybrid environment.
Leading DevOps platform strategy for KIPMI Software's next - generation digital trust products. Collaborating with teams to implement scalable infrastructure and DevSecOps practices.
Join our DevOps team to build and manage GitHub pipelines and cloud - native Azure solutions. Collaborate with teams to drive DevOps best practices and optimize deployments.