Principal Site Reliability Engineer at Early Warning, partnering with teams to design and implement resilience patterns. Managing incidents, overseeing performance, and mentoring staff.
Responsibilities
Design and Implement software and tools to improve the performance - availability, scalability, and latency, while delivering end products to customer with the highest efficiency and meeting all security standards.
Supports the company’s commitment to risk management and protecting the integrity and confidentiality of systems and data.
Build automation and tooling around application management, such as deployments, configuration changes and disaster recovery scenarios.
Design, Implement and evangelize Observability and monitoring systems to proactively detect problems and identify cause.
Evaluate capacity of the application on a continuous basis to provide stats to the Product/Business teams and recommend an efficient path to scale for future needs.
Identify performance bottlenecks and work with cross-functional teams to troubleshoot and resolve issues.
Serve as a technical liaison for the application and provide documents and runbooks to Level 1 and Level 2 teams.
Participate in 24 X 7 on-call rotation.
Be a champion of excellent processes; take the initiative in developing repeatable patterns and standard, re-usable work across teams.
Work directly with application development teams to provide feedback and technical requirements to the software development lifecycle, implementing best-practice microservice design patterns and other modern software development approaches.
Understand and support the adoption of best-practice microservice design patterns and other modern software reliability approaches and techniques.
Be a thought leader: a senior point of expertise on site reliability engineering issues, industry trends and developing technologies.
Be a role model to others on the team.
Coach and mentor team members.
Requirements
Education and experience typically obtained through completion of a Bachelor’s Degree in Business and/or Computer Science or related field.
12+ years of related experience managing large complex projects in a technical or software development environment inclusive of post-graduate degree
Proven ability to lead a team through high priority Incidents and improve the RCA process
Excellent troubleshooting skills and proven experience resolving technical issues in complex environments
Hands-on experience in designing and developing using the one or more of the following technologies - Python, Go, Java
Docker - Experience in Microservices Architecture.
Messaging frameworks such as Kafka, SQS or JMS
Database Technologies like Oracle, Dynamo DB, Aurora etc.
Caching layers such as Redis and memcached
Strong understanding of Linux administration
Experience with CI/CD pipeline implementation including GIT, Chef, Maven, Jenkins etc
Strong understanding of networking fundamentals
Experience in leading cross-functional teams to create technical solutions.
Proven track record designing and building complex end-to-end systems (full stack developer)
Background and drug screen
Benefits
Healthcare Coverage – Competitive medical (PPO/HDHP), dental, and vision plans as well as company contributions to your Health Savings Account (HSA) or pre-tax savings through flexible spending accounts (FSA) for commuting, health & dependent care expenses.
401(k) Retirement Plan – Featuring a 100% Company Safe Harbor Match on your first 6% deferral immediately upon eligibility.
Paid Time Off – Flexible Time Off for Exempt (salaried) employees, as well as generous PTO for Non-Exempt (hourly) employees, plus 11 paid company holidays and a paid volunteer day.
12 weeks of Paid Parental Leave
Maven Family Planning – provides support through your Parenting journey including egg freezing, fertility, adoption, surrogacy, pregnancy, postpartum, early pediatrics, and returning to work.
DevOps Engineer working in Iasi at Ness Digital Engineering, managing cloud environments and deploying software releases in a dynamic team. Responsibilities include monitoring, testing, and debugging systems.
Director of DevOps & Infrastructure at KingMakers overseeing platform and infrastructure strategy for major brands. Leading cross - functional teams in a cloud - native technology environment.
Middleware SRE Global Lead managing web operations for State Street. Leading a team and optimizing middleware platforms in a high - transaction environment.
DevOps Engineer improving infrastructure and automating processes for GPI Consulting GmbH in Hamburg. Collaborating with teams to deliver stable and scalable systems, focusing on innovative digital solutions.
DevOps Engineer focusing on cloud infrastructure for VM - based applications at Polarstern Experts. Join a friendly team with up to 80% remote work opportunities in Köln.
DevOps Engineer developing monitoring solutions, enhancing AIOps and Observability Platform at global healthcare company, Organon. Collaborating across teams to ensure compliance with industry standards and optimizing data processing.
Senior DevOps Engineer responsible for full stack development of energy trading products at Deutsche Börse. Implementing automated solutions and collaborating across product teams in a hybrid setting.
Cloud DevOps Engineer at RELX supporting CI/CD platform and development teams. Focusing on resilient software delivery and troubleshooting across various environments.
Join Boeing AvionX as a Software DevOps Engineer driving automation and CI/CD pipelines for cloud - native systems. Lead initiatives improving deployment pipelines and mentor engineering team.
Senior SRE responsible for ensuring system reliability and performance at Aggrandize. Collaborating with cross - functional teams and implementing SRE best practices.