Onsite Site Reliability Engineer 3

Posted 10 hours ago

Apply now

About the role

  • Site Reliability Engineer ensuring reliability and performance of FreeWheel systems. Collaborating with engineering and operations teams for optimization and troubleshooting.

Responsibilities

  • Design and implement monitoring and alerting systems to ensure the stability, reliability, and performance of data platforms.
  • Join in on-call shift to quickly respond to and resolve issues.
  • Develop and maintain automation tools and scripts for deployment, monitoring, backup and disaster recovery.
  • Analyze and optimize the performance of data storage, query performance, and data flows to ensure efficient processing of large-scale datasets, reduce latency, and improve processing speed.
  • Respond quickly to platform failures, perform troubleshooting, and coordinate cross-team efforts to resolve issues and ensure high availability and reliability.
  • Work with engineering teams to analyze and forecast capacity requirements, ensuring the system can handle traffic growth and scale infrastructure accordingly.
  • Document the architecture, configurations, and operational procedures for platforms, ensuring knowledge is shared across the team and providing relevant training.
  • Ensure platforms meet security standards and compliance requirements to prevent breaches or misuse.
  • Collaborate with engineering team, product team, and project management team to support product design and implementation, solving reliability-related issues.

Requirements

  • 3+ years of experience as an SRE, DevOps or Operations Engineer.
  • Experience with cloud platforms (e.g. AWS, OCI, GCP, Azure) is a plus.
  • Hands-on experience with Terraform and infrastructure as code principle is a huge plus.
  • Experience with an automation tool or framework such as Ansible, Terraform, Kubernetes, Docker for automating system deployment.
  • Proficient in at least one programming language, such as Python, Go, Java, or Scala, with the ability to write efficient scripts and automation tools.
  • Familiar with using monitoring and log management tools such as Prometheus, Grafana, ELK Stack, or other similar tools.
  • Excellent communication skills with the ability to convey technical information clearly and concisely to both technical and non-technical stakeholders.
  • Proactive learner eager to grow in operations and governance.

Benefits

  • Best-in-class Benefits to eligible employees.
  • Array of options, expert guidance and always-on tools to support you physically, financially and emotionally.

Job title

Site Reliability Engineer 3

Job type

Experience level

Mid levelSenior

Salary

$99,684 - $149,526 per year

Degree requirement

Bachelor's Degree

Location requirements

Report this job

See something inaccurate? Let us know and we'll update the listing.

Report job