Senior Site Reliability Engineer at ABBYY, working on critical production service designs and reliability improvements on Azure cloud applications
Responsibilities
Сo-own critical production service designs to ensure high reliability is achievable and measurable Drive reliability and observability improvements in the services within the engineering verticals
Using monitoring and telemetry data, help teams make informed decisions on where reliability challenges may exist and help design and build solutions to improve them
You will build SRE dashboards from SLIs to measure SLO adherence
You will be supporting Production applications which are hosted in Azure cloud
Build and improve internal tools and automation software to make maintaining production services easier and safer
Lead reliability-focused practices such as Failure Analysis, Load and Capacity Planning, Service Reviews, Architecture Designs, Incident Postmortems, and others
Developing Infrastructure as a Code.
Define (from design to implementation details) necessary auto-healing and fault-tolerant systems
Point of contact for production application issues, working closely with engineering leadership
Requirements
7-10 Years IT Experience
Proven experience at least one cloud technology - Azure or AWS.Preferibily Azure
Proficient in Kubernetes, AKS, Azure Function, Storage account, and others
Proven experience in Microsoft Technologies, Windows server, IIS(Preferred)
Distributed monitoring experience in Grafana: logging, metrics, tracing, etc.
Matching years of experience to level in an Infrastructure, SRE, DevOps, CloudOps role
Experience working in SRE team in a dynamic and fast paced environment
Experience programming in one or more of the following: C#, Java, Python, .Net, NodeJS, Go,
Experience with Terraform, Ansible, or any similar programming language
Experience with cloud-performant microservices and event-driven architectures
Experience with Kubernetes administration is an added advantage.
DevOps Specialist creating and overseeing Azure hybrid cloud infrastructures for EVLO's battery energy storage solutions. Collaborating with teams to implement cutting - edge technologies in a dynamic environment.
Software Quality and Release Engineer developing and maintaining C++/Python software solutions for aerospace and defense industry. Collaborating on CI/CD automation and feedback documentation.
Site Reliability / DevOps Engineer developing Big Data platforms for clients in Telco and Retail industries. Focus on stability, scalability, and performance of large - scale data processing systems.
Senior DevOps Engineer building and managing big data platforms for clients in telecommunications and finance industries. Ensuring stability, scalability, and performance across cloud and on - premise environments.
Site Reliability Engineer ensuring reliability, automation, and observability across cloud infrastructures for Diligent. Leading initiatives to improve performance in fast - paced environments.
Senior DevOps Engineer leading DevOps design and implementation for gaming projects at Stillfront. Collaborating with international teams to enhance gaming infrastructure and reduce costs.
Mainframe DevOps Engineer at Kyndryl enhancing mainframe delivery practices and migrating SCM to Azure DevOps. Requires extensive Mainframe development experience and DevOps skills.
DevOps/MLOps Engineer designing, automating, and maintaining scalable infrastructure for federal client. Collaborating with software engineers and data scientists for resilient solutions.