Develop and lead enterprise observability and reliability capabilities for Parts Town's systems using Dynatrace. Collaborate across teams to ensure comprehensive monitoring and improve performance and incident outcomes.
Responsibilities
Own enterprise observability using Dynatrace across cloud, on-prem, ERP, WMS, eCommerce, APIs, and integrations
Design service topology, dashboards, alerts, and health indicators that reflect business impact
Apply SRE principles (SLIs, SLOs, error budgets where appropriate) to reduce incidents and improve resilience
Accelerate incident detection and root-cause analysis; lead post-incident reviews focused on systemic fixes
Identify reliability, performance, and capacity risks before they impact the business
Define observability and SRE standards and enable teams to use them effectively
Requirements
7+ years in infrastructure, platform, operations, or reliability engineering
Hands-on experience implementing and operating Dynatrace
Strong understanding of distributed systems, cloud/hybrid environments, and integrations
Practical experience with SRE or reliability engineering concepts
Comfortable operating in high-impact incident and production environments
Benefits
Quarterly profit-sharing bonus
Hybrid Work schedule
Team member appreciation events and recognition programs
Volunteer opportunities
Monthly IT stipend
Casual dress code
On-demand pay options: Access your pay as you earn it, to cover unexpected or even everyday expenses
DevOps Specialist creating and overseeing Azure hybrid cloud infrastructures for EVLO's battery energy storage solutions. Collaborating with teams to implement cutting - edge technologies in a dynamic environment.
Software Quality and Release Engineer developing and maintaining C++/Python software solutions for aerospace and defense industry. Collaborating on CI/CD automation and feedback documentation.
Site Reliability / DevOps Engineer developing Big Data platforms for clients in Telco and Retail industries. Focus on stability, scalability, and performance of large - scale data processing systems.
Senior DevOps Engineer building and managing big data platforms for clients in telecommunications and finance industries. Ensuring stability, scalability, and performance across cloud and on - premise environments.
Site Reliability Engineer ensuring reliability, automation, and observability across cloud infrastructures for Diligent. Leading initiatives to improve performance in fast - paced environments.
Senior DevOps Engineer leading DevOps design and implementation for gaming projects at Stillfront. Collaborating with international teams to enhance gaming infrastructure and reduce costs.
Mainframe DevOps Engineer at Kyndryl enhancing mainframe delivery practices and migrating SCM to Azure DevOps. Requires extensive Mainframe development experience and DevOps skills.
DevOps/MLOps Engineer designing, automating, and maintaining scalable infrastructure for federal client. Collaborating with software engineers and data scientists for resilient solutions.