Site Reliability Engineer maintaining systems and infrastructure to ensure reliability and performance. Collaborating with developers and automating operational tasks for a robust cloud environment.
Responsibilities
Design and maintain reliable systems and infrastructure
Monitor system reliability and performance
Collaborate with development teams to ensure system robustness
Automate operational tasks and processes
Troubleshoot and resolve issues in production environments
Implement best practices for system availability, security, and performance
Mentor junior SRE team members
Requirements
5+ years of experience in site reliability engineering
Strong background in Linux/Unix systems
Proficient in scripting languages (Python, Bash, etc.)
Experience with cloud providers (AWS, Azure, Google Cloud)
Knowledge of CI/CD tools and processes
Understanding of application architecture and microservices
Excellent troubleshooting skills
Good communication skills and ability to work in a team environment.
Background in networking stack and protocols
Should be available for on-call rotations as needed
Benefits
Medical provided through Cigna (PPO, HSA, EPO options)
Medical provided through Kaiser (HMO option only) for California employees only
Dental provided through Cigna (DPPO & DHMO options)
Nationwide Vision provided through VSP
Flexible Spending Account for Health & Dependent Care
Pre-Tax Account for Commuter Benefit/Parking & Transit (location-specific)
Continuing Education and Professional Development via various integrated platforms, e.g. Udemy and Coursera
Corporate Wellness Program
Employee Assistance Program
Wellness Days
401k Plan
Basic Life, Accidental Life, Supplemental Life Insurance
Short Term & Long Term Disability
Critical Illness, Critical Hospital, and Voluntary Accident Insurance
Tuition Reimbursement (available 6 months after start date, capped)
Paid Time Off (accrued and prorated, maximum of 120 hours annually)
Paid Holidays
Any other statutory leaves, paid time, or other fringe benefits required under state and federal law
Senior DevOps Engineer responsible for leading CI/CD pipeline design and optimization. Collaborating with teams to drive DevOps maturity across the enterprise while managing infrastructure automation.
Cloud Operations Engineer ensuring reliable performance of cloud systems at 2Innovate. Focused on automation, incident management, cloud security, and infrastructure monitoring in cloud environments.
AWS DevOps Engineer responsible for delivering scalable digital experiences for EXL's MarTech ecosystem. Engaging in development, maintenance, and collaboration across stakeholders and services.
Senior Site Reliability Engineer managing critical infrastructure at Hornetsecurity. Collaborating with product teams to ensure performance and reliability across services.
Site Reliability Engineer enhancing platform reliability for AI workflows at WRITER. Overseeing automated solutions and cloud infrastructure supporting high - trafficked AI systems.
Site reliability engineer ensuring 24/7 availability of AI - powered workflows at WRITER. Developing and automating robust platforms for high - traffic AI demands.
Site Reliability Engineer maintaining cloud infrastructure for Tricentis SaaS Products. Collaborating closely with engineers, focusing on observability and performance.