Engineer at Trading Technologies improving platform stability through coding and automation. Focus on building advanced monitoring tools for global trading operations.
Responsibilities
Design, build, and maintain advanced telemetry and automation tooling to monitor global platform health and trigger automated corrective actions.
Own and improve incident response runbooks and automated remediation workflows, reducing MTTR over time.
Participate in on-call rotations, diagnosing and resolving system issues and escalations from the customer support team (this is an internal-facing role, not customer-facing).
Drive continuous improvement through post-incident reviews (PIRs) and engineering initiatives that eliminate classes of failure.
Develop advanced monitoring software in python and GoLang.
Contribute to full-stack troubleshooting across our React.js frontend, Python backend services (Flask, Litestar, Celery), and AWS-managed Kafka (MSK/ESK).
Write infrastructure-as-code using Terraform, building reusable modules and submodules to provision and manage cloud resources.
Focus on coding advanced telemetry, implementing automation strategies, and building tools that proactively monitor platform health.
Rotate into an operational role to swiftly diagnose system issues and handle internal escalations, ensuring continuous platform stability.
Use insights gained during the operations week to develop automated solutions that reduce future incidents and optimize system performance.
Requirements
****Essential Skills & Experience******
**Software Development**
Extensive professional Python development experience, including object-oriented design and multi-threaded applications.
Substantial hands-on Terraform experience—able to author modules and submodules from scratch.
Experience building or supporting React.js applications.
**Cloud & Infrastructure**
Substantial hands-on AWS experience across EC2, Lambda, CloudWatch, EKS, ECS, MSK, ELB, RDS, DynamoDB, and SQS.
Solid Linux systems experience, including monitoring critical system health parameters.
****Desirable Skills & Experience**
Familiarity with trading systems, financial markets, or low-latency environments
AWS Associate-level certification or higher (preferred but not required).
Experience with chaos engineering, SLO/SLI frameworks, or formal reliability programs.
Prior on-call experience at a high-traffic or mission-critical platform.
Working understanding of TCP/IP, DNS, HTTP, and load balancing concepts
Experience with Golang, or a clear eagerness and ability to learn it quickly.****
Benefits
*We offer a comprehensive benefits package designed to support your well-being, growth, and work-life balance.*
**Health & Financial Security:**
Pension contributions
**Time Off & Flexibility:**
Enjoy the best of both worlds: the energy and collaboration of in-person work, combined with the convenience and focus of remote days. This is a hybrid position requiring three days of in-office collaboration per week, with the flexibility to work remotely for the remaining two days. Our hybrid model is designed to balance individual flexibility with the benefits of in-person collaboration, enhanced team cohesion, spontaneous innovation, hands-on mentorship opportunities and strengthens our company culture.
25 days of Paid Time Off (PTO) per year, with the option to roll over unused days.
One dedicated day per year for volunteering.
Two professional development days per year to allow uninterrupted professional development.
An additional PTO day added during milestone anniversary years.
Generous parental leave for all parents (including adoptive parents).
**Work-Life Support & Resources:**
Budget for tech accessories, including monitors, headphones, keyboards, and other office equipment.
Milestone anniversary bonuses.
**Wellness & Lifestyle Perks:**
Subsidy contributions toward gym memberships and health/wellness initiatives (including discounted healthcare premiums, healthy meal delivery programs, or smoking cessation support).
**Our Culture:**
Forward-thinking, culture-based organization with collaborative teams that promote diversity and inclusion.****
DevOps Engineer designing CI/CD pipelines and managing Azure cloud infrastructure for leading organizations. Collaborating with global teams and automating deployment processes across projects.
Senior DevOps professional at iugu managing system reliability and performance in a dynamic environment. Collaborating with development teams and automating processes for efficiency.
Site Reliability Engineer maintaining the ShiftKey Marketplace platform while ensuring its stability and availability. Collaborating on infrastructure projects and support with a remote - first approach.
Site Reliability Engineer ensuring platform stability and managing AWS migration. Focused on hands - on maintenance work and engineering automation for healthcare staffing platform.
Site Reliability Engineer maintaining stability and availability of healthcare staffing platform while collaborating with engineering teams on AWS migration projects.
Site Reliability Engineer for ShiftKey, ensuring stability and performance of healthcare management platform. Involves maintenance and development initiatives with a proactive approach to prevent incidents.
DevOps Team Lead managing deployment and operations of FedRAMP authorized products at Semperis. Lead a team in a regulated environment focusing on security and process improvement.
Senior DevOps Engineer responsible for deployment and secure operations of FedRAMP products at Semperis. Focusing on compliance, automation, and collaborating with security teams.
DevOps/IT Apprentice supporting cloud infrastructure and CI/CD pipelines at tech startup. Involves learning, taking ownership, and growing within the engineering team.