Production Support & Monitoring Engineer ensuring reliability, performance, and availability for Exegy's production systems. Collaborating with teams to resolve incidents and optimize environments.
Responsibilities
Monitor production systems and infrastructure, ensuring uptime and performance metrics are met
Troubleshoot, diagnose, and resolve production issues in real time, minimizing service impact
Manage incident response, including escalation, root cause analysis, and post-mortem reporting
Collaborate with engineering teams to develop and implement monitoring tools, alert systems, and automated recovery processes
Analyze system logs, metrics, and trends to proactively identify potential risks or issues
Execute software deployments, configuration changes, and system upgrades with minimal disruption
Maintain and refine operational runbooks, escalation procedures, and best practices.
Drive continuous improvement by identifying areas for process optimization and operational efficiency
Participate in an on-call rotation to provide 24/7 support for production systems
Requirements
Bachelor’s degree in Computer Science, Engineering, or a related field, or equivalent work experience
2+ years of experience in production support, system administration, or monitoring role
Strong technical skills in Linux/Unix environments, with experience in troubleshooting and debugging
Hands-on experience with monitoring tools (e.g., ITRS, Prometheus, Grafana, Splunk) and incident management platforms
Scripting experience (e.g., Python, Bash) to automate monitoring and reporting tasks
Excellent problem-solving and analytical skills, with the ability to work under pressure in a fast-paced environment
Solid understanding of networking, system performance, and application monitoring concepts
Exceptional communication and collaboration skills to coordinate with cross-functional teams effectively
Benefits
Monitor production systems and infrastructure, ensuring uptime and performance metrics are met
Troubleshoot, diagnose, and resolve production issues in real time, minimizing service impact
Manage incident response, including escalation, root cause analysis, and post-mortem reporting
Collaborate with engineering teams to develop and implement monitoring tools, alert systems, and automated recovery processes
Analyze system logs, metrics, and trends to proactively identify potential risks or issues
Execute software deployments, configuration changes, and system upgrades with minimal disruption
Maintain and refine operational runbooks, escalation procedures, and best practices.
Drive continuous improvement by identifying areas for process optimization and operational efficiency
Participate in an on-call rotation to provide 24/7 support for production systems
Observability Engineer joining Mobileye to enhance development environment scalability and resilience. Collaborating across teams to improve observability systems and automate best practices.
Digital Manufacturing Engineer modernizing plant operations through Manufacturing Execution Systems and automation integration. Working hands - on with production teams on shop - floor applications and analytics.
Forward Deployed Engineer owning end - to - end enterprise deployments at HeyGen for dynamic AI video solutions. Collaborating with customers to integrate systems and provide technical leadership.
Senior Environmental Engineer or Geoscientist involved in environmental site investigations and remediation across multiple Canadian locations. Leading projects and mentoring staff within a collaborative team.
Project Engineer managing projects in the food & beverage sector. Collaborating with internal teams and stakeholders to deliver successful project outcomes within budget and timeline.
Senior System & Safety Engineer at Vay owning the system architecture for Remote Driving technology. Working at the intersection of hardware, software, and vehicle integration.
Junior Engineer assisting with type approval activities for maritime equipment and systems. Working with experienced engineers in an international team based in Norway or Germany.
Embedded Software Engineer at Ford developing software for Advanced Driver Assistance Systems. Collaborating cross - functionally on product lifecycle and ensuring quality through rigorous testing.
Mid - Level IAM Engineer focusing on Agentic AI management in a cloud - native environment at Crypto.com. Join a seasoned Security Team dedicated to user security and protection.
Technical Integration Engineer managing integration and deployment on complex IT environments at Consort Group. Ensuring quality service and supporting applications from project phase to production.