About the role

Site Reliability Engineer ensuring reliability and performance of Equisoft’s SaaS applications. Collaborating with development and operations teams while managing incidents and optimizing infrastructure.

Responsibilities

Monitor daily SaaS operations to ensure consistent performance, reliability, and availability of services for customers.
Ensure adherence to SLAs (Service Level Agreements) by proactively monitoring and addressing potential issues to maintain high uptime and service quality.
Execute incident management procedures for outages or performance issues, including troubleshooting, root cause analysis, and post-mortem reviews.
Work on improving the operational efficiency of SaaS applications by fine-tuning infrastructure, monitoring systems, and optimizing performance.
Ensure all SaaS applications meet required security and compliance standards, conducting regular audits and addressing vulnerabilities proactively.
Identify areas for process improvement, driving automation initiatives to streamline workflows, reduce manual work, and enhance operational efficiency.
Act as a point of escalation for customer issues related to SaaS applications, working with support teams to resolve high-priority cases.
Monitor, analyze, and report on operational metrics (uptime, response times, incident counts), providing regular updates to stakeholders with updated documentation.
Participate in disaster recovery exercises, ensuring regular backups and testing recovery processes for business continuity.
Ensure SaaS operations align with industry standards and best practices, to provide a structured and effective service management approach.
Work closely with development and operations teams to ensure seamless integration and deployment.
Address and resolve production issues promptly to minimize downtime.
Participating in on-call incidents, troubleshooting issues and performing root cause analysis on rotations to ensure 24/7 system availability.

Requirements

Technical Bachelor’s Degree in Computer Engineering or Information Technology or College Diploma combined with 3 years of relevant experience
3+ years of experience in a similar role (Site Reliability Engineer, Production Support Engineer, DevOps, Programmer or related)
Proven track record of managing and optimizing production systems
Strong knowledge of system administration, networking, and Azure cloud services
Experience with CI/CD pipelines and infrastructure as code (e.g. Terraform)
Experience with monitoring and alerting tools (e.g. Azure Monitor, Application Insights)
Hands-on experience with Azure Kubernetes Service (AKS), Azure Container Instances, and container orchestration
Experience working closely with software development teams
Ability to read and understand code (exemple .Net, C#, Java or Python) to assist in debugging and identifying root causes of issues
Familiarity with application logs, stack traces, and performance profiling tools to pinpoint problems efficiently
Solid understanding of Azure SQL Database, Cosmos DB, and other Azure data services
Excellent knowledge of English (spoken and written)

Benefits

medical
dental
term life/personal accident coverage
wellness sessions
telemedicine program
flexible hours
Educational Support (LinkedIn Learning, LOMA Courses and Equisoft University)

Hybrid Senior Site Reliability Support Engineer

at Equisoft

About the role

Responsibilities

Requirements

Benefits

Job title

Job type

Experience level

Salary

Degree requirement

Tech skills

Location requirements

Report this job

Similar roles

DevOps Engineer

4DMedical

Software Engineer, DevOps

Proway GmbH

Junior DevOps Engineer

Mission Technologies, a division of HII

Senior DevOps Engineer – AI Startup

Codefy

Corporate Reliability Engineering Manager

Nestle

Associate DevSecOps Engineer

JMA Wireless

DevOps Coordinator

NTT DATA Romania

Senior Reliability Engineer

Hargrove Engineers & Constructors

Senior SRE, Software Engineering – AWS, Scaling Infrastructure

PulseRise Technologies

DevOps Engineer

payabl.