Cloud Infrastructure Engineer responsible for Langfuse Cloud operations and observability at scale. Managing AWS and ClickHouse deployment to ensure performance and cost optimization.
Responsibilities
Own Langfuse Cloud operations: You'll run our production environments on AWS ECS Fargate and ClickHouse Cloud. You'll manage deployments, autoscaling, capacity planning, and cost optimization — making sure we stay fast and affordable as traffic scales.
Build world-class observability: You'll own our Datadog setup end to end — dashboards, alerts, and SLOs. When something degrades, you'll ensure we know before our customers do. You'll build the monitoring culture that lets the whole team ship with confidence.
Make self-hosting effortless: Thousands of teams run Langfuse on their own infrastructure. You'll own and evolve our Helm chart, Docker Compose configuration, and deployment documentation. You'll turn "works on my machine" into "works on every machine" — from a single-node setup to a multi-region enterprise deployment.
Automate everything: CI/CD pipelines, infrastructure-as-code, automated scaling, zero-downtime deployments. You'll replace manual processes with automation that makes the team faster and the platform more reliable.
Scale for what's next: We're growing fast and new product directions — like complex long-running agent observability and real-time evaluation — push the infrastructure in new ways. You'll be thinking ahead about what breaks at 10x scale and building the foundation before we get there. 10x is always just one quarter away here at Langfuse.
Harden security and compliance: As more enterprises adopt Langfuse, you'll help ensure our cloud and self-hosted deployments meet the security and compliance bar that large organizations require.
Requirements
Strong infrastructure or SRE engineer who gets excited about running systems at scale and making them better every day
Experience operating production workloads on AWS (ECS/Fargate, networking, IAM, S3, etc.) or on comparable hyperscale vendors.
IT Infrastructure Engineer at Sumegre delivering second - level IT support and troubleshooting assistance. Responsible for network infrastructure maintenance and collaboration with server owners to ensure reliability.
Site Infrastructure Engineer managing HVAC and utility systems at SABIC. Overseeing maintenance, project activities, and long - term asset strategies for operational efficiency.
Key engineer developing and operating Web Application Firewall (WAF) platforms at Lloyds Banking Group. Enhancing security and performance while working with modern engineering practices.
Lead Infrastructure Engineer driving Edge Security capabilities for Lloyds Banking Group. Focusing on web access protection, Zero Trust architectures, and modern security engineering approaches.
Senior System Administrator & Infrastructure Engineer managing reliable infrastructure and driving DevOps practices at IMAGO. Collaborating with development teams and providing technical guidance to ensure best practices.
Infrastructure Engineer maintaining high availability of systems at mortgage platform provider Pylon. Focus on developer productivity and codebase quality with instant feedback from peers.
Infrastructure Systems Engineer II managing production application support for Conduent. Collaborating on ITIL processes and incident management while working in a 24/7 environment.
OT Cybersecurity Specialist responsible for secure IT - OT infrastructures in industrial operations. Engaging in secure deployments, integrating cybersecurity frameworks, and providing expert support.
Ingeniero de Infraestructura y Seguridad colaborando en el diseño de arquitecturas seguras en CRG Solutions. Integrando buenas prácticas de ciberseguridad y gestionando incidentes en entornos Windows y Linux.
Senior Infrastructure Engineer managing global IT infrastructure for aviation solutions, focusing on VMware, Nutanix, and Windows Server environments. Collaborating with teams to ensure high availability and optimal performance in a hybrid work model.