Mid-level Site Reliability Engineer at WEX managing Azure Cloud systems and driving reliability practices. Collaborating with teams to enhance performance, reduce toil and automate processes.
Responsibilities
Monitor and manage system health, availability, and performance of WEX’s Microsoft Azure Cloud ecosystem
Actively identify and reduce 'toil' (manual, repetitive work) by developing and maintaining automation tools
Participate in on-call rotations and respond to system alerts and incidents.
Collaborate with development teams to implement reliability-focused features.
Improve observability and logging for troubleshooting issues.
Follow IT security policies and compliance requirements.
Requirements
2+ years of experience in system administration, DevOps, or SRE roles
Proficiency in scripting and automation using Python, Bash, Go, Terraform
Experience with monitoring and logging (Grafana, ELK stack, Splunk, etc.)
Knowledge of containerization and orchestration (Docker, Kubernetes)
Understanding of CI/CD pipelines and version control systems
Understanding of monitoring tools such as Prometheus, Grafana, or Splunk
Strong problem-solving skills and a willingness to learn.
Preferred Qualification: Hands-on experience with Azure cloud platforms
Familiarity with infrastructure as code (Terraform, Ansible, CloudFormation)
Knowledge of incident response processes and SLAs
Experience with developing AI based solutions
Ability to troubleshoot and resolve performance bottlenecks
Strong communication skills and ability to work across teams
Experience in healthcare, insurance, or benefits technology
Experience working with compliance frameworks such as HIPAA, SOC 2, or HITRUST.
Senior Site Reliability Engineer maintaining reliability and user experience of AI services for Woven by Toyota. Collaborating with engineering teams to ensure service availability and performance.
DevOps Specialist supporting the engineering and operational enablement of next - gen data center platforms at KONE. Involves Infrastructure - as - Code deployments and daily DevOps workflows.
GitHub Enterprise Specialist managing KONE's GitHub ecosystem, ensuring secure and scalable workflows. Collaborating with teams to enhance developer productivity through AI - powered capabilities.
Senior Software Engineer responsible for designing microservices and enhancing LLM performance for Fortanix's Generative AI platform. Collaborating with data science and ML Infrastructure teams for security and optimization.
Reliability Engineering Technician conducting various verification tests and collaborating with reliability engineers. Preparing technical documentation in a well - equipped laboratory environment in Poland.
Reliability Engineer ensuring quality and reliability of products. Conducting various verification tests in a well - equipped laboratory in Mierzyn, Poland.
Senior SRE driving incident management and operational excellence in financial software solutions. Working with innovation and technology in Brazil's leading software company's team.
Salesforce DevOps Engineer focused on CI/CD pipeline management for Salesforce at S&P Global Mobility. Collaborating with cross - functional teams to ensure stable and secure releases.
Senior DevOps Engineer designing and building infrastructure for AI workloads across cloud and edge environments. Collaborating with engineering teams to implement scalable, automated solutions.
Reliability Engineer II improving efficiencies and safety in copper mining operations at Freeport - McMoRan. Developing recommendations for engineering projects and collaborating with Operations and Maintenance teams.