Lead Site Reliability Engineer at Saviynt | Hybrid Hired

About the role

Implement monitoring and alerting systems to guarantee high availability and performance, focused on SLA and availability metrics
Collaborate with engineering and operations teams to identify critical components requiring enhanced availability measures
Design and implement strategies, tooling, and processes to enhance system uptime and reliability
Continuously evaluate and recommend improvements to platform infrastructure and processes
Align the platform with customer needs and business goals by working closely with cross-functional teams
Run the production environment by monitoring availability and taking a holistic view of system health
Build software and systems to monitor platform infrastructure and applications
Monitor and improve reliability, quality, and time-to-market of our suite of software solutions
Measure and optimize system performance, get ahead of customer needs, and innovate for continual improvement
Provide primary operational support and engineering for multiple large-scale distributed software applications
Gather and analyze metrics from operating systems and applications to assist in performance tuning and fault finding

Requirements

Bachelor’s degree or higher in a technology related field (e.g. Engineering, Computer Science, etc.) required, Master’s degree a plus
6+ years professional experience Monitoring and Alerting roles on major cloud platforms (AWS, Azure)
4+ experience in Cloud development (AWS, Azure) and observability skills
3+ years of experience in software development with Python, NodeJS, or Java with a focus on SDLC and automation
Hands-on experience with container orchestration, preferably with Kubernetes
Hands-on experience with building observability, monitoring and alerting on large scale distributed systems
Leadership/design of application and/or infrastructure migration projects from on-prem to cloud
Cloud architecture design and implementation experience
Familiarity with current AWS solutions; Azure experience also considered
Experience with containerized workloads (Helm; AKS & EKS, Docker, JFrog)
Experience with logging and monitoring tools (Prometheus, Grafana, Datadog, AWS Cloudwatch, Azure Monitor, Log Analytics, Fluentd, ELK/OpenSearch, OpenTelemetry)
Network Security knowledge (IAM/Policy, Azure Policy, VPN, Active Directory/RBAC, ACLs, NSG rules, private endpoints)
Proven experience implementing advanced observability practices and techniques at scale
Ability to automate resolution of alerts and automate with scripting languages (Python, Golang, Shell)
Knowledge of managing systems using infrastructure as code tools (Terraform, ARM, Chef)
Solid understanding of Cloud Computing and DevOps concepts
Proven experience in maintaining scalability and resiliency of complex environments
Ability to triage, execute root cause analysis, and be decisive under pressure
Experience managing and interpreting large datasets using query languages and visualization tools
Proficient communication skills and ability to work with diverse teams

Similar roles

Browse all Devops Engineer jobs

48 minutes ago

SC

Sparksoft CorporationSenior DevOps Engineer

Senior DevOps Engineer leading cloud - native solutions at Sparksoft Corporation. Driving automation and system reliability within a fast - paced Agile team.

Hybrid Role

Columbia United States Devops Engineer

1 hour ago

PC

PCCWDevOps Engineer

Platform Engineer focusing on supporting CI/CD pipelines and Kubernetes at PCCW. Responsible for ensuring platform services' reliability and performance, with night - time support as needed.

Onsite Role

Hong Kong Hong Kong Devops Engineer

2 hours ago

BI

Bumble Inc.Senior Site Reliability Engineer

Site Reliability Engineer at Bumble optimizing large - scale Linux environments and ensuring system stability. Focusing on troubleshooting, incident recovery, and performance tuning in complex infrastructures.

Hybrid Role

Austin United States Devops Engineer

$190,000 - $225,000 per year

6 hours ago

BA

BarcoDevOps Engineer

DevOps Engineer enabling R&D team with software creation and operational automation. Defining CI/CD pipelines and implementing devSecOps practices.

Onsite Role

Kortrijk Belgium Devops Engineer

14 hours ago

NV

NVIDIASenior Manager, DevOps Engineering

Senior DevOps Manager overseeing CI/CD processes for NVIDIA Networking products. Leading a team and collaborating with global teams to enhance R&D efficiency and infrastructure.

Onsite Role

Yokneam Israel Devops Engineer

14 hours ago

NV

NVIDIAManager, DevOps Engineering

DevOps Manager overseeing engineering team developing scalable CI/CD processes for NVIDIA Networking products. Enhancing global R&D efficiency in a technology - focused company.

Onsite Role

Yokneam Israel Devops Engineer

15 hours ago

CS

cyan Digital SecuritySenior Site Reliability Engineer – m/f/d

Join Operations Team as Senior Site Reliability Engineer driving operational excellence for cybersecurity solutions. Collaborate across teams to manage production platforms and optimize infrastructure.

Hybrid Role

Wien Austria Devops Engineer

€3,175 - €3,843 per month

17 hours ago

GE

GenetecSoftware Developer – DevOps System Administrator

Software Developer - DevOps System Administrator working within the SCMT team to enhance software application efficiency. Collaborating on tools and scripts for application lifecycle management.

Hybrid Role

Montreal Canada Devops Engineer

17 hours ago

SL

Stefanini LATAMSenior DevOps Engineer

DevOps Engineer managing CI/CD pipelines and Kubernetes deployments at Stefanini. Collaborating with teams to optimize application health and deployment processes.

Hybrid Role

Santiago Chile Devops Engineer

18 hours ago

AL

alphacodersDevOps Engineer

DevOps Engineer working with development teams for seamless feature integration and deployment automation. Focus on CI/CD pipelines, monitoring solutions, and continuous process optimization.

Hybrid Role

Innsbruck Austria Devops Engineer

€65,000 - €75,000 per year