Site Reliability Engineer leading observability and monitoring practices for hybrid infrastructure at PANTHERx. Collaborating with various teams to enhance system performance and reliability.
Responsibilities
The Site Reliability Engineer (SRE) will lead the implementation and management of observability, monitoring, and reliability practices across our hybrid infrastructure.
This role requires hands-on expertise with Datadog or similar observability platforms, strong Azure administration skills, and a deep understanding of incident response and system performance.
The SRE will work closely with Infrastructure, Support, and Application teams to ensure high availability and operational excellence across on-prem and cloud environments.
Designs, implements, and manages observability solutions using Datadog or equivalent platforms.
Develops and maintains monitoring dashboards, alerts, and telemetry pipelines for critical systems.
Leads incident response efforts, including root cause analysis and postmortem documentation.
Collaborates with Infrastructure and Application teams to improve system reliability and performance.
Supports Azure administration tasks including resource monitoring, performance tuning, and cost optimization.
Defines and enforces best practices for system health, uptime, and scalability.
Contributes to automation of operational tasks and reliability improvements.
Documents observability standards, incident workflows, and operational runbooks.
Requirements
Bachelor’s degree in Computer Science, Information Technology, or equivalent.
Minimum of five (5) years of experience in Site Reliability Engineering, Infrastructure Monitoring, or DevOps.
Proficiency with Datadog or similar observability platforms (e.g., Prometheus, New Relic, Splunk).
Strong Azure administration experience including monitoring, resource management, and automation.
Solid understanding of on-prem infrastructure and hybrid cloud environments.
Experience with incident response, RCA, and operational documentation.
Strong scripting skills (e.g., PowerShell, Python) for automation and integration.
Excellent communication and collaboration skills across technical teams.
Benefits
Hybrid, remote and flexible on-site work schedules are available, based on the position.
Excellent benefit package, including but not limited to medical, dental, vision, health savings and flexible spending accounts
401K with employer matching
Employer-paid life insurance and short/long term disability coverage
Employee Assistance Program
Generous paid time off is also available to all full-time employees
Lead SRE for Data & Analytics platforms at Deloitte. Championing reliability, improving stability, and driving automation in a hybrid environment based in London.
RDS Engineer supporting enterprise - grade RDS environments for Wells Fargo. Building and tuning Windows Server RDS environments and collaborating with security and networking teams.
Senior DevSecOps Engineer managing Azure to AWS migration for AccuSourceHR. Leading cloud architecture, CI/CD implementation, and ensuring security and reliability in production systems.
Site Reliability Engineer ensuring infrastructure reliability and performance for Hornetsecurity. Collaborating across product, business, and infrastructure teams in a critical environment.
Senior DevOps Engineer developing core infrastructure supporting Shelf products. Focused on building reliable, secure, and scalable systems in hybrid work environment.
Cloud/Kubernetes Engineer supporting hybrid infrastructure across AWS and on - premise Kubernetes environments. Automating tasks and managing production reliability, security, and scalability.
AWS Infrastructure DevOps Engineer at Growth Acceleration Partners supporting AWS environments and infrastructure automation. Focused on reliability, security, and operational efficiency across production environments.
Mainframe SRE working on critical payment systems for fintech, ensuring stability and security. Collaborating with teams to perform root cause analysis and automate processes.
Site Reliability Engineer driving innovation and automation for Banking Solutions and Payments. Collaborating with teams to ensure application performance and reliability in a dynamic environment.
DevOps Engineer responsible for cloud product delivery, platform reliability, and using AI tools in DevOps workflows. Building CI/CD pipelines and optimizing container workloads for security and performance.