Senior Site Reliability Engineer leading AI Native platform operations in a growing B2B generative AI startup. Ensuring infrastructure reliability and scalability for services.
Responsibilities
Design cloud and on‑prem infrastructure, and lead Docker/Kubernetes operations (optimizing autoscaling, rollouts, security).
Develop reliable pipelines (Git/gates/automation) and implement end-to-end observability (SLOs/SLIs/SLAs, logs/metrics/tracing).
Operate microservices (service mesh, resilience patterns) and manage critical data (PostgreSQL HA/tuning).
Manage secrets, access policies, supply chain security and system hardening.
Implement Infrastructure as Code and GitOps (Terraform/Helm/ArgoCD).
Lead incident response and postmortems with data- and AI-driven continuous improvement.
Align with Engineering, Product, Data and ML teams.
Requirements
6+ years in SRE/DevOps/Platform engineering at high scale.
Strong expertise in Kubernetes, Docker, CI/CD, observability (SLOs), PostgreSQL, microservices architecture, security, and experience with IaC and GitOps.
Passion for applying LLMs/AI to operations.
Experience with Node.js/Python, NestJS/React, Git/Cursor, GCP (other clouds a plus), PostgreSQL, Docker/Kubernetes, Terraform/Helm/ArgoCD.
Experience with AI SDKs/LLMs, operational automations (n8n/Crew.ai), vector databases (RAG/pgvector), Kafka/RabbitMQ, FinOps, chaos engineering, SAST/DAST.
Benefits
True autonomy and a highly collaborative environment;
Direct influence on product and team development;
Opportunity to grow with the business from the ground up;
Fixed salary of R$28,000/month (PJ contract) plus real possibility of Stock Options;
Development Operations Engineer supporting enterprise application development in Java and/or C. Ensuring high availability and operational excellence in modern payment solutions.
Site Reliability Engineer designing and supporting Kubernetes environments for F5's UDF platform. Collaborating with cross - functional teams to ensure reliability and operational excellence.
Senior Site Reliability Engineer ensuring operational excellence for multi - datacenter infrastructure at F5. Developing automation tools and APIs in Python and Go.
DevOps Engineer needed to develop a new OpenXDR solution on AWS, processing security data from multiple sources. Join a leading cybersecurity company in Slovakia.
DevOps Engineer at Castalia Systems automating and optimizing toolchain and CI/CD pipelines. Designing Azure infrastructure and ensuring collaboration between development and operations teams.
Senior DevOps Engineer managing Kubernetes and AI - driven workflows at Hex Trust. Supporting blockchain infrastructure while implementing best DevOps practices.
Lead DevSecOps Software Developer at Leidos enhancing automation for air traffic operations. Collaborating on safety - critical systems within a hybrid work environment.