Site Reliability Engineer focused on coding and automation for global trading operations. Joining a culture of collaboration and innovation at Trading Technologies.
Responsibilities
Develop and maintain advanced telemetry and automation tools for monitoring and managing global platform health.
Actively participate in on-call rotations, swiftly diagnosing and resolving system issues and escalations from the customer support team (this is not a customer-facing role).
Implement automated solutions for incident response, system optimization, and reliability improvement.
Provide operational support for backend services and Kafka producers/consumers written in Python running on ECS.
Full-Stack Troubleshooting: Support, debug, and enhance the entire application stack, from our React.js frontend to our Python backend services (Flask, Litestar, Celery, ESK, MSK)
Hands-on experience building and/or supporting applications written with React.js. Must have professional experience building and/or supporting applications with React.js. Effectively troubleshoot issues between the frontend UI and backend APIs.
Requirements
Minimum 3 years of experience with Python
Experience with Icinga2, Prometheus, or Splunk a plus
Experience with AWS a plus
Solid understanding of functional programming, object oriented programming and computer science foundations
Good understanding of backend and server side components
Ability to work on-call rotation for support with global team members on a semi-frequent basis
Proven and strong communication skills
Must be self-directed, flexible and have the ability to prioritize and handle multiple projects simultaneously
Experience working in an Agile environment a plus
Benefits
Pension contributions
Enjoy the best of both worlds: the energy and collaboration of in-person work, combined with the convenience and focus of remote days. This is a hybrid position requiring three days of in-office collaboration per week, with the flexibility to work remotely for the remaining two days.
25 days of Paid Time Off (PTO) per year, with the option to roll over unused days.
One dedicated day per year for volunteering.
Two professional development days per year to allow uninterrupted professional development.
An additional PTO day added during milestone anniversary years.
Generous parental leave for all parents (including adoptive parents).
Budget for tech accessories, including monitors, headphones, keyboards, and other office equipment.
Milestone anniversary bonuses.
Subsidy contributions toward gym memberships and health/wellness initiatives (including discounted healthcare premiums, healthy meal delivery programs, or smoking cessation support).
Forward-thinking, culture-based organization with collaborative teams that promote diversity and inclusion.
Senior Software Engineer at PayPal managing cloud infrastructure and DevOps solutions. Delivering complete SDLC solutions and guiding engineering teams for scalable and reliable services.
Senior Site Reliability Engineer at Diligent leading reliability, automation, and observability across cloud infrastructure. Build tools for incident response and enhance performance in fast - paced environments.
Perception Deployment Engineer deploying deep learning models on embedded systems at Caterpillar. Collaborating with cross - functional teams for integration and optimization of perception modules in vehicles.
Principal Site Reliability Engineer at AT&T required to design scalable solutions for critical operations with minimal downtime. Collaborating with teams to monitor and improve system performance in cloud environments.
DevOps Engineer managing AI SaaS infrastructure at a high - growth European company. Supporting AI model deployment and ensuring platform security and compliance with multiple systems integration.
Engineering Manager leading teams for observability platforms at LexisNexis. Owns operational excellence across software delivery lifecycle in Raleigh, NC.
Reliability Engineer optimizing site facility infrastructure and utility systems at Roche. Conducting root cause analyses and developing maintenance plans to enhance reliability and efficiency.
DevOps SME designing, implementing, and operating multi - cloud platforms for The Missing Link. Collaborating with engineering, security, and operations teams while embedding DevOps best practices.
Site Reliability Engineer improving reliability of cloud infrastructure for an AI - specialized company. Taking ownership of monitoring and incident response processes in hybrid - working style.
DevOps Engineer leading automation for sophisticated release/deployment pipelines at Securonix. Focused on Python, Ansible, and cloud services to enhance security operations.