Site Reliability Engineer responsible for application reliability and scalability at Early Warning Services. Implementing best practices while collaborating with development teams in a hybrid work environment.
Responsibilities
Implement software and tools to improve the performance - availability, scalability, and latency, while delivering end products to customer with the highest efficiency and meeting all security standards.
Build automation and tooling around application management, such as deployments, configuration changes and disaster recovery scenarios.
Implement and evangelize Observability and monitoring systems to proactively detect problems and identify cause.
Evaluate capacity of the application on a continuous basis to provide stats to the Product/Business teams and recommend an efficient path to scale for future needs.
Identify performance bottlenecks and work with cross-functional teams to troubleshoot and resolve issues.
Implement standards across multiple disciplines, systems and practices to improve the overall application delivery.
Work directly with application development teams to provide feedback and technical requirements to the software development lifecycle, implementing best-practice microservice design patterns and other modern software development approaches.
Serve as a technical liaison for the application and provide documents and runbooks to Level 1 and Level 2 teams.
Participate in 24 X 7 on-call rotation.
Be a champion of excellent processes; take the initiative in developing repeatable patterns and standard, re-usable work across teams.
Support the company's commitment to protect the integrity and confidentiality of systems and data.
Requirements
Education and experience typically obtained through completion of a Bachelor’s Degree in Business and/or Computer Science or related field.
3+ years of related experience managing large complex projects in a technical or software development environment inclusive of post-graduate degree
Demonstrated experience in effective Incident and Problem Management
Proven related work experience in a medium to large scale enterprise.
Strong understanding of scripting languages
Hands on experience implementing and using modern Observability solutions.
Linux systems administration
Good knowledge of Git
Experienced with security and encryption protocols.
Comfortable with facilitating collaboration, open communication and reaching across functional borders.
Excellent oral and written communication and people skills.
High level of customer responsiveness, excellent documentation and communication skills and attention to detail.
Background and drug screen.
Benefits
Healthcare Coverage – Competitive medical (PPO/HDHP), dental, and vision plans as well as company contributions to your Health Savings Account (HSA) or pre-tax savings through flexible spending accounts (FSA) for commuting, health & dependent care expenses.
401(k) Retirement Plan – Featuring a 100% Company Safe Harbor Match on your first 6% deferral immediately upon eligibility.
Paid Time Off – Flexible Time Off for Exempt (salaried) employees, as well as generous PTO for Non-Exempt (hourly) employees, plus 11 paid company holidays and a paid volunteer day.
12 weeks of Paid Parental Leave
Maven Family Planning – provides support through your Parenting journey including egg freezing, fertility, adoption, surrogacy, pregnancy, postpartum, early pediatrics, and returning to work.
Release Engineer at Air Apps responsible for optimizing release processes and collaborating with cross - functional teams. Focused on smooth, reliable, and efficient application delivery.
DevOps Engineer responsible for maintaining and optimizing infrastructure at Tenet3. Focused on security, automation, and technical operations within a collaborative team environment.
Site Reliability Engineer II at LexisNexis Risk Solutions building Terraform modules and CI/CD pipelines. Responsible for developing cloud infrastructure and ensuring reliability, security, and observability.
Journeyman Cloud Operations Engineer maintaining cloud infrastructure across DoD organizations. Supporting DevSecOps and ensuring compliance with security requirements in a high - visibility program.
DevOps Engineer supporting cloud modernization for the Department of the Air Force on the Cloud One contract. Involved in systems analysis, security practices, and collaboration with engineering teams.
DevOps Engineer managing cloud - native platforms for Capgemini. Collaborating with development, data/ML, and security teams to deliver scalable solutions on Azure.
Head of IT & DevSecOps at JamLoop, managing internal technology and security improvements. Leading strategy and implementation of cloud infrastructure for efficiency and reliability.
I&E Maintenance and Reliability Engineer at LyondellBasell focused on asset maintenance strategies in a multidisciplinary environment. Collaborating for operational excellence and safety performance at the Pasadena facility.
Manager, DevOps & Cloud Infrastructure overseeing security and operational efficiency in a hybrid environment at Thomson Reuters. Leading teams to deliver secure solutions in on - premises and cloud setups.
DevOps Engineer responsible for building and maintaining the infrastructure of IONOS' AI platform. Collaborating on CI/CD pipelines and ensuring system optimization across various locations.