Software Engineer involved in reliability engineering at Cursor, focusing on high reliability and durability across the stack. Collaborating with product and infrastructure teams to improve overall system stability.
Responsibilities
Own reliability work end-to-end, from user-facing symptoms (crashes, latency, streaming failures) to root causes in services, infrastructure, or vendor dependencies.
Design and implement resilience patterns for upstream dependency failures (for example model providers): fallbacks, routing strategies, and degraded-mode designs.
Build and maintain reliability guardrails that make teams faster and safer: deployment safety, rollbacks, operational playbooks, automated checks, and standards for production readiness.
Improve observability (metrics, logs, traces, and client telemetry) so engineers can quickly answer 'Is it up?' and 'What changed?'.
Reduce operational toil through automation and better tooling.
Partner with product and infrastructure engineering teams as a drop-in reliability multiplier: embed on the highest-impact problems and drive them to a durable technical outcome.
Participate in an on-call rotation and help improve incident response practices over time (severity definitions, runbooks, retrospectives, and clear ownership of follow-up fixes).
You will own a small set of high-leverage reliability 'themes' at a time (for example client crash rate, streaming reliability, deploy safety). You drive these end-to-end until the reliability bar measurably moves.
Requirements
Strong experience owning reliability for production systems, including both incident response and long-term engineering fixes.
Expert-level experience in at least one of: Go, Node/TypeScript, or Python.
Deep practical knowledge of cloud infrastructure (AWS) and modern deployment/orchestration patterns (Kubernetes and/or ECS).
Experience with observability systems and practices (metrics, logs, traces, and alerting).
Software Engineer developing planning and scheduling solutions in customer project teams. Focusing on customer development and iterative enhancement using Python.
Engineering Director leading a high - performing team to develop and operate distributed cloud services. Overseeing technical excellence, collaboration, and innovation within the team.
Fullstack Engineer at Goway Travel contributing to development with Node.js and React.js. Collaborating across teams to build scalable travel technology solutions in a hybrid work environment.
Lead Software Engineer creating digital infrastructure for renewable energy assets at Arteus. Developing core logic for optimization and defining tech stack with strategic autonomy.
Senior Staff Engineer leading UI development for AI Agent Studio at Automation Anywhere. Focusing on high - quality UI components and collaborating with cross - functional teams for intelligent automation.
Technical Lead managing SAP Hybris development at Birlasoft, enhancing cloud and AI technologies. Leading integrations and software validation to improve business efficiencies in India.
Full Stack Developer responsible for coding and building high performance applications for Walmart - International. Working in a collaborative team to design innovative solutions affecting millions of customers.
Software Engineer developing features for schedule - based processing framework in autonomous transportation company. Collaborating with cross - functional teams to enhance data ingestion and processing scalability.
Engineer/Senior Engineer developing and updating engineering standards for AES Indiana and Ohio. Collaborating with multiple teams to ensure project efficiency and quality standards.