Senior Site Reliability Engineer at ABBYY, working on critical production service designs and reliability improvements on Azure cloud applications
Responsibilities
Сo-own critical production service designs to ensure high reliability is achievable and measurable Drive reliability and observability improvements in the services within the engineering verticals
Using monitoring and telemetry data, help teams make informed decisions on where reliability challenges may exist and help design and build solutions to improve them
You will build SRE dashboards from SLIs to measure SLO adherence
You will be supporting Production applications which are hosted in Azure cloud
Build and improve internal tools and automation software to make maintaining production services easier and safer
Lead reliability-focused practices such as Failure Analysis, Load and Capacity Planning, Service Reviews, Architecture Designs, Incident Postmortems, and others
Developing Infrastructure as a Code.
Define (from design to implementation details) necessary auto-healing and fault-tolerant systems
Point of contact for production application issues, working closely with engineering leadership
Requirements
7-10 Years IT Experience
Proven experience at least one cloud technology - Azure or AWS.Preferibily Azure
Proficient in Kubernetes, AKS, Azure Function, Storage account, and others
Proven experience in Microsoft Technologies, Windows server, IIS(Preferred)
Distributed monitoring experience in Grafana: logging, metrics, tracing, etc.
Matching years of experience to level in an Infrastructure, SRE, DevOps, CloudOps role
Experience working in SRE team in a dynamic and fast paced environment
Experience programming in one or more of the following: C#, Java, Python, .Net, NodeJS, Go,
Experience with Terraform, Ansible, or any similar programming language
Experience with cloud-performant microservices and event-driven architectures
Experience with Kubernetes administration is an added advantage.
Azure DevOps IT Engineer at iKnowHealth managing cloud and hybrid solutions with Microsoft Azure. Responsible for optimizing infrastructure and ensuring system performance in healthcare software.
SRE Manager leading a team in reliability engineering at WEX. Overseeing system stability and balancing feature delivery within Microsoft Azure ecosystem.
Lead DevOps Architect guiding AWS and LaunchDarkly solutions. Overseeing enterprise - grade feature management and technical leadership with hands - on implementation.
Technical liaison for deployment and release infrastructure at Avaya. Resolving complex issues within the CI/CD pipeline and optimizing cloud operations with high autonomy.
Design, develop, and implement intelligent automation and AI - driven solutions at Securonix. Focus on enhancing reliability and efficiency across SaaS Ops environments with AI integration.
DevOps Engineer automating and optimizing software development lifecycle processes at COSMOTE Global Solutions. Designing and managing containerized infrastructure on Azure and implementing CI/CD.
Senior DevOps Engineer at Elliptic shaping DevOps culture and driving automation across engineering teams, providing expertise and leadership across the stack.
Senior Data Reliability Engineer ensuring software reliability and quality across enterprise applications. Collaborating with teams to implement robust on - call processes and maintain data fidelity.
Infrastructure & Cloud Operations Engineer managing AWS and hybrid environments for CV - Library. Hands - on role focused on reliability, automation, and operational excellence.
Site Reliability Engineer building reliable and scalable infrastructure for fintech startup Pave Bank. Collaborating with internal teams to enhance banking platform performance and reliability.