About the role

SRE Senior Engineer ensuring the reliability of large-scale distributed systems at Beyond Soluções. Overseeing data platform SLIs and SLOs while implementing automation and advanced observability.

Responsibilities

Reliability Engineering: Define and monitor critical SLIs and SLOs for the data platform (job latency, workspace availability, Delta Lake integrity).
Advanced Observability: Implement end-to-end telemetry (logs, metrics and traces) to detect failures before they impact the business.
Automation and IaC: Eliminate manual work through automation, ensuring Databricks infrastructure is treated as code.
Incident Management and Post-mortems: Lead diagnosis of complex incidents in Spark/Azure environments and conduct blameless root-cause analyses to prevent recurrence.
Cost Efficiency (FinOps): Optimize consumption of compute resources (Databricks clusters) and Azure storage without compromising performance.
Self-Service Culture: Develop tools and abstractions that enable Data Engineers to operate autonomously and securely.
Capacity Planning: Manage platform capacity to support exponential growth in data volumes and AI/ML models.

Experience in SRE or DevOps: Solid background ensuring availability of large-scale distributed systems.
Data Ecosystem Expertise: Mandatory experience (2+ years) with Azure and Databricks (especially workspace administration and cluster optimization).
Programming and Automation: Proficient in Python for building automation tools and scripts.
Big Data Troubleshooting: Deep knowledge of debugging Apache Spark jobs, analyzing bottlenecks in Delta Lake and cloud networking.
Observability: Experience with tools such as Azure Monitor, Grafana, Prometheus or Datadog for creating intelligent alerts.
Proven experience with Azure and Databricks is desirable.
Experience with CI/CD for Data Engineering (DataOps).
Familiarity with data governance and security (Unity Catalog).

Flexible Meal and Food Allowance
Health Insurance
Dental Plan
Wellhub and TotalPass
Bio Ritmo gym exclusive for employees: at the Headquarters complex
Profit Sharing (PLR)
Equity Program: "Porto em Ação" — complementary to PLR until 2025
Sand and multipurpose courts: at the Headquarters complex
Transportation Voucher / Commuting Allowance
Van transportation services; available at main access stations to Porto (Luz, Barra Funda, Santa Cecília and Júlio Prestes)
Extended Parental Leave: up to 40 days for all family configurations
Extended Maternity Leave: 6 months
On-site Medical Clinic with specialties: at Headquarters and Barra Funda
Childcare or nanny subsidy
Life Insurance
Private Pension Plan - PortoPrev
Discounts on Products and Services
Tuition Assistance: reimbursement for undergraduate, graduate or MBA programs
Monthly running events: subsidy for major road races in São Paulo
Language reimbursement (English or Spanish)
Porto Theater: exclusive sessions for employees
Library
Rest room: at the Headquarters complex
Game room: at the Headquarters complex
Massage and podiatry services: at the Headquarters complex