Site Reliability Engineer responsible for the reliability of production systems at Modulate. Leading monitoring and incident response efforts as part of a growing engineering team.
Responsibilities
Own and operate production systems supporting Modulate’s APIs and enterprise products
Design and implement monitoring, alerting, and observability systems
Lead incident response, root cause analysis, and postmortem processes
Build and improve on-call rotations and operational workflows
Collaborate with engineers to deploy, maintain, and scale distributed systems
Partner with leadership on infrastructure decisions, roadmaps, and reliability goals
Evaluate and support deployment models including cloud, on-prem, and hybrid environments
Continuously improve system performance, resilience, and scalability
Requirements
Experience deploying and maintaining production software systems
Experience building monitoring and alerting systems for production environments
Experience with on-call rotations and incident response
Strong experience with AWS, Python, and Linux
Familiarity with tools such as CloudWatch, SNS, PagerDuty, or similar technologies
Strong debugging, systems thinking, and problem-solving skills
Ability to communicate effectively during high-pressure incidents
Experience working in fast-paced or early-stage environments
Nice to Have
Experience with AWS services such as EC2, load balancers, RDS, SQS, SES, and CloudWatch
Experience with infrastructure-as-code (e.g., Terraform, CloudFormation)
Experience supporting high-scale, distributed systems
Familiarity with hybrid or on-prem deployment models
Benefits
Competitive salary + equity
Full health, dental, and vision coverage
Flexible PTO, with a strong culture of taking it
Weekly team lunches with dietary accommodations
Hybrid work: core in-office days with flexible remote options
Regular leadership and industry learning sessions
Support for career development and continued learning
Intern assisting in developing a release management tool for SES's Software Center of Expertise. Working with Golang, APIs, and CI/CD processes in Luxembourg.
Machine Learning Engineer responsible for designing and maintaining ML infrastructure on AWS at Roche. Key role in revolutionizing drug discovery using machine learning techniques with a close - knit team.
Senior Site Reliability Engineer operating scalable services in Azure and Kubernetes environments with a focus on reliability and performance improvements.
HPC Architect designing and optimizing high - performance computing solutions for semiconductor equipment. Collaborating with cross - functional teams to enhance compute workload capabilities.
Senior Site Reliability Engineer ensuring reliability, automation, and observability across cloud infrastructure. Focused on building self - service tools and improving performance in fast - paced environments.
Maintenance and Reliability Engineer optimizing preventive maintenance at VistaPrint's automated production facility in Venlo. Collaborating with cross - functional teams to drive continuous improvement in maintenance practices.
Senior Site Reliability Engineering Program & Compliance Manager leading process governance and operational maturity for infrastructure services at cloud contact center provider Five9.
Senior Site Reliability Engineer at Five9 designing Kubernetes on bare metal and hypervisor platforms within private cloud environments. Responsible for architecture, design, and standardization in infrastructure and automation.
DevOps engineer supporting Jenkins - based CI/CD platform in Luxembourg. Managing cloud infrastructure and providing core banking systems support within a collaborative team.
Principal Software Engineer focused on DevSecOps software factory at Northrop Grumman. Working with multi - disciplinary teams to implement DevSecOps practices for aerospace programs across various locations.