Hybrid Senior Site Reliability Engineer

Posted 2 weeks ago

Apply now

About the role

  • Design, implement, and maintain secure, scalable infrastructure across cloud environments
  • Analyze cloud environment requirements from various sources, document system designs, and implement necessary modifications
  • Automate repetitive system tasks and manage system-related activities for internal and external clients, including Professional Services support
  • Ensure system reliability through robust failover mechanisms, disaster recovery processes, and 24/7 support strategies
  • Design, implement, and improve monitoring tools to meet SLOs, ensuring a “Monitor by Design” approach is adopted across product teams
  • Continuously drive reliability improvements through proactive initiatives, data-driven SLO adjustments, and advanced monitoring/alerting solutions
  • Lead and coordinate disaster recovery testing exercises and capacity planning to enhance system reliability
  • Identify and reduce operational toil through automation and tool development
  • Apply and enforce security best practices across cloud environments, while mentoring team members on SLO achievement
  • Facilitate cross-team communication, provide training, and maintain clear documentation (e.g., runbooks and procedures)
  • Support cloud environment management and propose technology changes to improve performance and reliability.

Requirements

  • 7+ years of experience as a System Administrator, DevOps Engineer, SRE, or similar role
  • Deep knowledge of Linux administration, including performance monitoring, tuning and troubleshooting
  • Experience with cloud network design (Azure preferred, AWS or GCP also considered)
  • Proficiency in scripting (e.g., Bash, Python) for automation
  • Experience with version control software (preferably Git)
  • Experience with configuration management tools (e.g., Puppet, Foreman, Ansible, or similar)
  • Knowledge of container orchestration tools (e.g., Kubernetes, Docker Swarm, etc.)
  • In-depth knowledge of monitoring and logging solutions for cloud infrastructure (e.g., Prometheus, Grafana, etc.)
  • Bachelor’s degree in Computer Science or a related field
  • Excellent time management, organizational, crisis management, and problem-solving skills
  • Self-starter, able to work independently without direct supervision
  • Willingness to innovate, learn, and share knowledge
  • Excellent verbal and written communication skills
  • Experience developing and implementing IT security best practices and procedures
  • Willingness to participate in on-call rotations and respond to incidents in a timely and effective manner
  • Excellent command of the English language.

Benefits

  • Health insurance
  • Flexible work arrangements
  • Professional development opportunities

Job title

Senior Site Reliability Engineer

Job type

Experience level

Senior

Salary

Not specified

Degree requirement

Bachelor's Degree

Location requirements

Report this job

See something inaccurate? Let us know and we'll update the listing.

Report job