Technical Lead for DevOps team at Atria Health ensuring reliable, scalable systems and mentoring engineers. Leading cloud infrastructure initiatives using Google Cloud Platform and Terraform.
Responsibilities
Define and drive the technical vision for DevOps practices across the organization
Lead architecture decisions for infrastructure, CI/CD pipelines, and cloud resources
Serve as a technical escalation point for complex infrastructure challenges
Conduct design reviews and provide guidance on reliability, security, and scalability
Design, build, and maintain cloud infrastructure on Google Cloud Platform using Terraform
Own and improve CI/CD pipelines to enable fast, safe deployments
Implement and maintain monitoring, alerting, and observability systems
Drive incident response processes and lead post-mortems to improve system resilience
Partner with product engineering teams to understand their infrastructure needs and translate them into scalable solutions
Work closely with Security to implement and maintain compliance and security best practices
Collaborate with Product and Engineering leadership on capacity planning and technical roadmaps
Mentor and coach DevOps engineers, fostering growth and technical development
Establish and document DevOps standards, runbooks, and best practices
Champion a culture of reliability, automation, and continuous improvement
Requirements
7+ years of software engineering experience, with 3+ years focused on DevOps, SRE, or infrastructure engineering
Deep experience with cloud platforms (GCP strongly preferred; AWS or Azure acceptable)
Proficiency with infrastructure-as-code tools, particularly Terraform
Strong experience with container orchestration (Kubernetes) and CI/CD systems
Demonstrated ability to lead technical initiatives and influence without direct authority
Excellent communication skills and ability to translate complex technical concepts for varied audiences
Experience in healthcare technology or other regulated industries (Preferred)
Familiarity with our backend stack (Node, TypeScript, Express) (Preferred)
Experience building and scaling observability platforms (Preferred)
Track record of improving developer experience and deployment velocity (Preferred)
Benefits
Excellent health and wellness benefits, 100% paid by Atria effective date of hire
OneMedical membership for employees & dependents giving access to 24/7 virtual care
Fertility & family planning
Company-covered preventive health screenings through partner hospitals (Calcium score)
Fitness Perks including Wellhub +
401k contributions and 4% match starting after 6 months
Flexible Time Off
Continuing medical education (CME) and CEU support for professional licensure
Time to give back and make an impact in underserved communities
Site Reliability Engineer ensuring the availability and performance of services for autonomous vehicle operations. Collaborating on system design and automation in a robotics - focused environment.
DevOps Engineer automating continuous deployment and monitoring on AWS for Crown Equipment Corporation. Bridging developers, IT, and external providers for operational efficiency.
Senior DevOps Engineer responsible for leading CI/CD pipeline design and optimization. Collaborating with teams to drive DevOps maturity across the enterprise while managing infrastructure automation.
Cloud Operations Engineer ensuring reliable performance of cloud systems at 2Innovate. Focused on automation, incident management, cloud security, and infrastructure monitoring in cloud environments.
AWS DevOps Engineer responsible for delivering scalable digital experiences for EXL's MarTech ecosystem. Engaging in development, maintenance, and collaboration across stakeholders and services.
Senior Site Reliability Engineer managing critical infrastructure at Hornetsecurity. Collaborating with product teams to ensure performance and reliability across services.
Site Reliability Engineer enhancing platform reliability for AI workflows at WRITER. Overseeing automated solutions and cloud infrastructure supporting high - trafficked AI systems.
Site reliability engineer ensuring 24/7 availability of AI - powered workflows at WRITER. Developing and automating robust platforms for high - traffic AI demands.