Site Reliability Engineer responsible for application reliability and scalability at Early Warning Services. Implementing best practices while collaborating with development teams in a hybrid work environment.
Responsibilities
Implement software and tools to improve the performance - availability, scalability, and latency, while delivering end products to customer with the highest efficiency and meeting all security standards.
Build automation and tooling around application management, such as deployments, configuration changes and disaster recovery scenarios.
Implement and evangelize Observability and monitoring systems to proactively detect problems and identify cause.
Evaluate capacity of the application on a continuous basis to provide stats to the Product/Business teams and recommend an efficient path to scale for future needs.
Identify performance bottlenecks and work with cross-functional teams to troubleshoot and resolve issues.
Implement standards across multiple disciplines, systems and practices to improve the overall application delivery.
Work directly with application development teams to provide feedback and technical requirements to the software development lifecycle, implementing best-practice microservice design patterns and other modern software development approaches.
Serve as a technical liaison for the application and provide documents and runbooks to Level 1 and Level 2 teams.
Participate in 24 X 7 on-call rotation.
Be a champion of excellent processes; take the initiative in developing repeatable patterns and standard, re-usable work across teams.
Support the company's commitment to protect the integrity and confidentiality of systems and data.
Requirements
Education and experience typically obtained through completion of a Bachelor’s Degree in Business and/or Computer Science or related field.
3+ years of related experience managing large complex projects in a technical or software development environment inclusive of post-graduate degree
Demonstrated experience in effective Incident and Problem Management
Proven related work experience in a medium to large scale enterprise.
Strong understanding of scripting languages
Hands on experience implementing and using modern Observability solutions.
Linux systems administration
Good knowledge of Git
Experienced with security and encryption protocols.
Comfortable with facilitating collaboration, open communication and reaching across functional borders.
Excellent oral and written communication and people skills.
High level of customer responsiveness, excellent documentation and communication skills and attention to detail.
Background and drug screen.
Benefits
Healthcare Coverage – Competitive medical (PPO/HDHP), dental, and vision plans as well as company contributions to your Health Savings Account (HSA) or pre-tax savings through flexible spending accounts (FSA) for commuting, health & dependent care expenses.
401(k) Retirement Plan – Featuring a 100% Company Safe Harbor Match on your first 6% deferral immediately upon eligibility.
Paid Time Off – Flexible Time Off for Exempt (salaried) employees, as well as generous PTO for Non-Exempt (hourly) employees, plus 11 paid company holidays and a paid volunteer day.
12 weeks of Paid Parental Leave
Maven Family Planning – provides support through your Parenting journey including egg freezing, fertility, adoption, surrogacy, pregnancy, postpartum, early pediatrics, and returning to work.
Lead Infrastructure Engineer at U.S. Bank responsible for managing and configuring cloud systems and infrastructure technologies while promoting automation practices.
Site Reliability Engineer focused on automation and optimization of software application performance. Collaborating with cross - functional teams to enhance scalability and reliability in Chennai/Bangalore.
Site Reliability Engineer ensuring the availability and performance of services for autonomous vehicle operations. Collaborating on system design and automation in a robotics - focused environment.
DevOps Engineer automating continuous deployment and monitoring on AWS for Crown Equipment Corporation. Bridging developers, IT, and external providers for operational efficiency.
Senior DevOps Engineer responsible for leading CI/CD pipeline design and optimization. Collaborating with teams to drive DevOps maturity across the enterprise while managing infrastructure automation.
Cloud Operations Engineer ensuring reliable performance of cloud systems at 2Innovate. Focused on automation, incident management, cloud security, and infrastructure monitoring in cloud environments.
AWS DevOps Engineer responsible for delivering scalable digital experiences for EXL's MarTech ecosystem. Engaging in development, maintenance, and collaboration across stakeholders and services.
Senior Site Reliability Engineer managing critical infrastructure at Hornetsecurity. Collaborating with product teams to ensure performance and reliability across services.
Site Reliability Engineer enhancing platform reliability for AI workflows at WRITER. Overseeing automated solutions and cloud infrastructure supporting high - trafficked AI systems.