Site Reliability Engineer at healthcare startup Heidi. Improving operational reliability and collaborating with engineers in a hybrid work environment.
Responsibilities
Participate in on-call and incident response:
Respond to production incidents, contribute to service restoration, and support clear communication during incidents.
Improve operational reliability:
Identify recurring issues and reliability risks, and drive fixes through better alerting, automation, system changes, or process improvements.
Own parts of the production environment:
Operate and improve Kubernetes clusters, cloud infrastructure, and core platform services.
Strengthen observability:
Improve dashboards, alerts, logs, and traces so issues are detected earlier.
Reduce operational toil:
Automate repetitive tasks, simplify runbooks, and improve tooling for day-to-day operations.
Support safe change:
Improve deployments, rollback mechanisms, and operational readiness.
Contribute to operational practices:
Write and maintain runbooks, participate in blameless post-mortems.
Collaborate closely with engineers:
Work with product and feature teams to improve production readiness.
Requirements
3–6+ years in SRE, DevOps, Platform, or operations-heavy engineering roles.
Experience supporting production systems and participating in on-call rotations.
Comfortable debugging live systems under pressure.
Senior DevOps Engineer designing and maintaining CI/CD pipelines for Solace Cloud. Collaborating with teams on AWS and Kubernetes to enhance developer experiences.
Analyzing vulnerabilities and implementing security strategies within the software development cycle at Redbelt Security. Ensuring compliance with security requirements and providing guidance to the development team.
Data Center Network Deployment Engineer for NVIDIA's HPC/AI Infrastructure team. Deploying and managing large scale AI Data Centers with a focus on networking and automation.
Deployment Engineer at Megaport expanding global network using technology with collaborative team culture and problem solvers. Engage with stakeholders to deliver effective networking solutions.
Senior DevOps/Infrastructure Engineer at Thndr focusing on cloud infrastructure and DevOps best practices. Leading initiatives to improve scalable and secure financial applications.
DevOps Engineer assisting developers in leveraging DevOps tooling and best practices for Cat Digital applications. Collaborating closely with development teams to optimize delivery and troubleshooting.
Reliability Engineer providing strategic support at Y12 National Security Complex. Enhancing equipment reliability and maintainability through proactive maintenance strategies.
Upper Steering System Design and Release Engineer responsible for managing steering components and suppliers. Engaging in design and development of upper steering systems for Ford vehicles in a hybrid capacity.
Senior DevOps Engineer implementing CI/CD solutions for software projects. Requires expertise in Docker, Azure, and IAC tools in a hybrid work environment.