Staff Software Engineer, Site Reliability (SRE) at Character.AI | Hybrid Hired

About the role

Founding Staff Software Engineer supporting site reliability and infrastructure at Character.AI. Collaborating with development team to ensure product reliability and scalability while growing user base.

Responsibilities

Maintain production services and keep them operational.
Develop tools, Instrumentation and automation to monitor and optimize the performance and reliability of our service.
Develop, implement and maintain automation tools and processes to prevent and mitigate service disruptions.
Collaborate with development teams to design and implement scalable, reliable systems, CI/CD processes for deployment.
Establish and support SLAs and SLOs for our site
Provide system monitoring and incident alerts
Participate in on-call rotations to provide support for critical incidents and outages.
Develop plans for site reliability and disaster recovery

Requirements

5+ years of experience in a development focused DevOps/SRE role within a technology organization that has significant scale
Deep experience with and proven success in developing software tools and automation wherever needed using Python and Golang
Expertise with SQL, Linux, CI/CD, Kubernetes, Terraform to support a site/application within a large multi node infrastructure and a growing user base.
Experience working with multiple cloud computing platforms such as GCP is also a must
Demonstrated experience to successfully and reliably troubleshoot technical issues and challenges across a range of platforms and systems
Experience with incident management and event postmortems
Outstanding candidates will have one or more of the following:
Familiarity with GPU clusters and/or HPC environments is preferred
Experience with monitoring and logging tools such as Prometheus and Grafana
Hands-on experience scaling a consumer product from early days into hypergrowth

Benefits

🩺 Top-notch health coverage for you & your family, with majority of the premium covered
💰 We invest in your future with a generous 401(K) contribution
🍼 New parents, we've got you covered with incredible paid leave -up to 20 weeks
🌴 4 weeks of PTO to explore, unwind & come back recharged
🍽️ Daily in-office catering plus a monthly Doordash stipend to help keep you fueled no matter where you are**
✨ Monthly wellness stipend to support you in your health journey

Similar roles

Browse all Devops Engineer jobs

3 hours ago

RO

Senior Machine Learning Engineer – DevOps, AI for Drug Discovery

Roche

Machine Learning Engineer responsible for designing and maintaining ML infrastructure on AWS at Roche. Key role in revolutionizing drug discovery using machine learning techniques with a close - knit team.

Onsite Role

New York City United States Devops Engineer

$141,100 - $262,100 per year

4 hours ago

RE

Senior Site Reliability Engineer

RELX

Senior Site Reliability Engineer operating scalable services in Azure and Kubernetes environments with a focus on reliability and performance improvements.

Onsite Role

Chennai India Devops Engineer

5 hours ago

AM

High-Performance Computing DevOps Architect

Applied Materials

HPC Architect designing and optimizing high - performance computing solutions for semiconductor equipment. Collaborating with cross - functional teams to enhance compute workload capabilities.

Onsite Role

Bangalore India Devops Engineer

6 hours ago

VS

Site Reliability Engineer

VALCE Talent Solutions

Senior Site Reliability Engineer ensuring reliability, automation, and observability across cloud infrastructure. Focused on building self - service tools and improving performance in fast - paced environments.

Hybrid Role

Guadalajara Mexico Devops Engineer

11 hours ago

VI

Maintenance and Reliability Engineer

Vista

Maintenance and Reliability Engineer optimizing preventive maintenance at VistaPrint's automated production facility in Venlo. Collaborating with cross - functional teams to drive continuous improvement in maintenance practices.

Onsite Role

Venlo Netherlands Devops Engineer

12 hours ago

FI

Senior Site Reliability Engineering Program – Compliance Manager

Five9

Senior Site Reliability Engineering Program & Compliance Manager leading process governance and operational maturity for infrastructure services at cloud contact center provider Five9.

Hybrid Role

United States Devops Engineer

$90,000 - $250,300 per year

12 hours ago

FI

Senior Site Reliability Engineer – Compute Platforms

Five9

Senior Site Reliability Engineer at Five9 designing Kubernetes on bare metal and hypervisor platforms within private cloud environments. Responsible for architecture, design, and standardization in infrastructure and automation.

Hybrid Role

United States Devops Engineer

$82,300 - $228,800 per year

13 hours ago

CB

DevOps Engineer

CBTW

DevOps engineer supporting Jenkins - based CI/CD platform in Luxembourg. Managing cloud infrastructure and providing core banking systems support within a collaborative team.

Hybrid Role

Luxembourg Luxembourg Devops Engineer

last week

NG

Engineer Software – DevSecOps/DevOps

Northrop Grumman

Software Engineer - DevSecOps designing modern software systems for aerospace programs at Northrop Grumman. Collaborating with multi - disciplinary teams in an Agile environment to implement DevSecOps lifecycle.

Onsite Role

San Diego United States Devops Engineer

$79,300 - $137,600 per year

last week

NG

Principal Software Engineer – DevSecOps, DevOps

Northrop Grumman

Principal Software Engineer focused on DevSecOps software factory at Northrop Grumman. Working with multi - disciplinary teams to implement DevSecOps practices for aerospace programs across various locations.

Hybrid Role

San Diego United States Devops Engineer

$98,400 - $171,000 per year