Senior Site Reliability Engineer managing cloud infrastructure for SaaS solutions at PROS Holdings. Focusing on reliability, automation, and team collaboration in a hybrid work environment.
Responsibilities
Design, implement, and maintain secure, scalable infrastructure across cloud environments
Analyze cloud environment requirements from various sources, document system designs, and implement necessary modifications
Automate repetitive system tasks and manage system-related activities for internal and external clients, including Professional Services support
Ensure system reliability through robust failover mechanisms, disaster recovery processes, and 24/7 support strategies
Design, implement, and improve monitoring tools to meet SLOs, ensuring a “Monitor by Design” approach is adopted across product teams
Continuously drive reliability improvements through proactive initiatives, data-driven SLO adjustments, and advanced monitoring/alerting solutions
Lead and coordinate disaster recovery testing exercises and capacity planning to enhance system reliability
Identify and reduce operational toil through automation and tool development
Apply and enforce security best practices across cloud environments, while mentoring team members on SLO achievement
Facilitate cross-team communication, provide training, and maintain clear documentation (e.g., runbooks and procedures)
Support cloud environment management and propose technology changes to improve performance and reliability.
Requirements
7+ years of experience as a System Administrator, DevOps Engineer, SRE, or similar role
Deep knowledge of Linux administration, including performance monitoring, tuning and troubleshooting
Experience with cloud network design (Azure preferred, AWS or GCP also considered)
Proficiency in scripting (e.g., Bash, Python) for automation
Experience with version control software (preferably Git)
Experience with configuration management tools (e.g., Puppet, Foreman, Ansible, or similar)
Knowledge of container orchestration tools (e.g., Kubernetes, Docker Swarm, etc.)
In-depth knowledge of monitoring and logging solutions for cloud infrastructure (e.g., Prometheus, Grafana, etc.)
Bachelor’s degree in Computer Science or a related field
Excellent time management, organizational, crisis management, and problem-solving skills
Self-starter, able to work independently without direct supervision
Willingness to innovate, learn, and share knowledge
Excellent verbal and written communication skills
Experience developing and implementing IT security best practices and procedures
Willingness to participate in on-call rotations and respond to incidents in a timely and effective manner
DevOps Engineer supporting cloud modernization for the Department of the Air Force on the Cloud One contract. Involved in systems analysis, security practices, and collaboration with engineering teams.
Journeyman Cloud Operations Engineer maintaining cloud infrastructure across DoD organizations. Supporting DevSecOps and ensuring compliance with security requirements in a high - visibility program.
DevOps Engineer managing cloud - native platforms for Capgemini. Collaborating with development, data/ML, and security teams to deliver scalable solutions on Azure.
Head of IT & DevSecOps at JamLoop, managing internal technology and security improvements. Leading strategy and implementation of cloud infrastructure for efficiency and reliability.
I&E Maintenance and Reliability Engineer at LyondellBasell focused on asset maintenance strategies in a multidisciplinary environment. Collaborating for operational excellence and safety performance at the Pasadena facility.
Manager, DevOps & Cloud Infrastructure overseeing security and operational efficiency in a hybrid environment at Thomson Reuters. Leading teams to deliver secure solutions in on - premises and cloud setups.
DevOps Engineer responsible for building and maintaining the infrastructure of IONOS' AI platform. Collaborating on CI/CD pipelines and ensuring system optimization across various locations.
DevOps Engineer building and supporting cloud infrastructure at PointClickCare. Collaborate with senior engineers and software teams to enhance AI - enabled workloads and improve system reliability.
DevOps specialist working with Kubernetes and Terraform, ensuring project stability and efficiency for Convercus. Join a small, dynamic team in a hybrid work environment.
Cloud & DevOps Engineer at XTEL managing Azure infrastructure and deploying applications. Collaborating within an international team to drive technological excellence.