Production Support & Monitoring Engineer ensuring reliability, performance, and availability for Exegy's production systems. Collaborating with teams to resolve incidents and optimize environments.
Responsibilities
Monitor production systems and infrastructure, ensuring uptime and performance metrics are met
Troubleshoot, diagnose, and resolve production issues in real time, minimizing service impact
Manage incident response, including escalation, root cause analysis, and post-mortem reporting
Collaborate with engineering teams to develop and implement monitoring tools, alert systems, and automated recovery processes
Analyze system logs, metrics, and trends to proactively identify potential risks or issues
Execute software deployments, configuration changes, and system upgrades with minimal disruption
Maintain and refine operational runbooks, escalation procedures, and best practices.
Drive continuous improvement by identifying areas for process optimization and operational efficiency
Participate in an on-call rotation to provide 24/7 support for production systems
Requirements
Bachelor’s degree in Computer Science, Engineering, or a related field, or equivalent work experience
2+ years of experience in production support, system administration, or monitoring role
Strong technical skills in Linux/Unix environments, with experience in troubleshooting and debugging
Hands-on experience with monitoring tools (e.g., ITRS, Prometheus, Grafana, Splunk) and incident management platforms
Scripting experience (e.g., Python, Bash) to automate monitoring and reporting tasks
Excellent problem-solving and analytical skills, with the ability to work under pressure in a fast-paced environment
Solid understanding of networking, system performance, and application monitoring concepts
Exceptional communication and collaboration skills to coordinate with cross-functional teams effectively
Benefits
Monitor production systems and infrastructure, ensuring uptime and performance metrics are met
Troubleshoot, diagnose, and resolve production issues in real time, minimizing service impact
Manage incident response, including escalation, root cause analysis, and post-mortem reporting
Collaborate with engineering teams to develop and implement monitoring tools, alert systems, and automated recovery processes
Analyze system logs, metrics, and trends to proactively identify potential risks or issues
Execute software deployments, configuration changes, and system upgrades with minimal disruption
Maintain and refine operational runbooks, escalation procedures, and best practices.
Drive continuous improvement by identifying areas for process optimization and operational efficiency
Participate in an on-call rotation to provide 24/7 support for production systems
Engineer in Water Resources Planning and Engineering leading stormwater management studies and overseeing development in growth areas. Coordinating mult - disciplinary teams and external consultants for effective project execution.
Ingénieur(e) d'Affaires pour commercialiser la gamme Civil visant le secteur public. Prospecte et qualifie les besoins, réalise des démonstrations et gère les contrats.
Lead Commissioning Engineer coordinating commissioning processes in dynamic environment at FläktGroup, a leader in air technology. Responsibilities include system checks and team guidance.
SRE Engineer responsible for maintaining platform reliability and performance for Coles Group serving Australian communities. Contribute to engineering lifecycle and implement modern DevOps practices.
Mechanical Engineer providing engineering and technical expertise for PG&E fleet vehicles and parts. Collaborating with regulatory compliance and conducting evaluations of manufacturers and vendors.
Entry - level Project Engineer providing technical support for electric service projects at PG&E. Collaborating on project scope, cost estimates, and engineering designs in a hybrid work environment.
Intern supporting product management in optimizing healthcare technology portfolios at Dräger. Collaborating on market analysis and product strategy improvements over 3 - 6 months.
Electronics Validator Engineer for automotive projects at Capgemini Engineering. Engaging in testing and functional validation of electronic components and systems.
Packaging Engineer Co - op program developing and designing packaging solutions with hands - on experience in a supportive environment at Johnsonville. Join a collaborative team improving functionality and costs.
Systems Analysis Engineer developing propulsion system designs and performance analyses for defense applications. Collaborating with senior engineers and using advanced modeling tools.