Senior Software Engineer responsible for production incident response and system reliability for a B2B WealthTech startup. Managing Go and Node.js services in a hybrid tech environment.
Responsibilities
Lead and execute production incident response: triage, mitigation, stakeholder communication, and coordination across teams
Debug and fix issues across Go services (mandatory) and the broader stack (Node.js services where relevant)
Work across service boundaries: GraphQL/RPC, distributed tracing, dependency failures, performance bottlenecks, and safe degradation patterns
Troubleshoot Kubernetes workloads and deployments
Diagnose PostgreSQL/CNPG issues
Handle production bugs that span application + data pipelines (ETL/Snowflake mappings), including backfills/replays and data-quality validation
Reliability patterns common to trading/fintech platforms: correctness and data integrity mindset (idempotency, reconciliation), resilient partner integrations, and strong observability for critical user journeys
DevOps Engineer improving reliability and stability of cloud services at Madhive. Responsibilities include CI/CD tooling, monitoring, and cloud infrastructure management.
Site Reliability Engineer contributing to platform reliability at Trainline, Europe's leading rail ticketing platform. Collaborating with product engineering to ensure operational readiness and incident response.
Senior DevOps Analyst at Stefanini managing Azure DevOps for build and deploy automation. Collaborating with development squads and ensuring code quality with validation tools.
Senior DevOps Engineer leading design and management of CI/CD pipelines at Neuron7.ai. Collaborating on cloud infrastructure for scalable applications in an innovative tech environment.
Backend Software Engineer responsible for building robust backend systems for AI and analytics products. Collaborating with various teams to enhance platform reliability and performance.
Senior DevOps Engineer responsible for cloud ecosystem architecture at health - tech startup. Building HIPAA/GDPR - compliant foundations and mentoring developers.
Senior Backend Engineer building product features and maintaining infrastructure for insurance platform. Employing tools like Terraform, Kafka, Datadog and Qovery with a strong DevOps focus.
DevOps Systems Engineer supporting customer operations in Annapolis Junction, MD. Responsible for creating, sustaining, and troubleshooting complex operational data flows.
OpenShift Fresher assisting Cloud team in managing containerized applications using Red Hat OpenShift. Supporting CI/CD, deployment automation, and cloud - native application environments.
Site Reliability Engineer for Leidos ensuring reliability, performance, and scalability of complex distributed systems for the Navy - Marine Corps Intranet. Collaborating with teams to maintain and optimize network operations and services.