Site Reliability Engineer supporting U.S. Federal Government contracts at Workday. Maintaining Kubernetes-based infrastructure and collaborating on automation for enhanced reliability and performance.
Responsibilities
Ensuring the Workday Kubernetes based platform is maintained, healthy, and ensures high availability for our customers through infrastructure automation, CI/CD pipelines, reporting, incident handling and response, and observability tools
Maintain core platform components, ensuring high availability, scalability, and security
Automate infrastructure provisioning, configuration management, and application deployments using tools like Terraform and Argo CD
Provide support and solve for platform-related issues, working closely with development teams to resolve problems
Implement and maintain security standard methodologies for the platform, ensuring compliance with industry standards
Build and maintain comprehensive documentation for platform components and processes
Actively participate in knowledge sharing within the team
Collaborate effectively with other engineers and development teams across multiple locations and time zones
Stay up-to-date with the latest technologies and trends in the platform engineering space
Requirements
5 years of hands-on experience with large scale cloud infrastructure, automation, and DevOps methodologies
Bachelor's degree in a computer related field or equivalent work experience
Proficiency in infrastructure automation tools like Terraform
Experience with CI/CD pipelines and tools like Argo CD
Strong analytical and problem-solving skills
Strong skills in Technical Writing Documentation for creating comprehensive technical documentation related to system architecture, operations, and reliability practices
Proven ability in Troubleshooting complex system issues
Benefits
Flexible work arrangements
Professional development opportunities
Workday Bonus Plan or role-specific commission/bonus
Site Reliability Engineer improving reliability and availability of Forcepoint products through automation and operational efficiency. Engaging in incident response and collaborating with development teams.
DevOps Engineer responsible for internal tooling and API development to enhance deployment and operational efficiency at Genesys Cloud. Build automation to improve service health and scalability.
Site Reliability Engineer focused on designing and maintaining observability solutions for fintech company. Collaborating across teams and automating infrastructure for global payment processing.
Azure Security Engineer working on cloud - based security strategies and implementations for Global Payments. Collaborating with teams to enhance the security posture and mitigate risks.
Release Engineer at Air Apps responsible for optimizing release processes and collaborating with cross - functional teams. Focused on smooth, reliable, and efficient application delivery.
DevOps Engineer responsible for maintaining and optimizing infrastructure at Tenet3. Focused on security, automation, and technical operations within a collaborative team environment.
Site Reliability Engineer II at LexisNexis Risk Solutions building Terraform modules and CI/CD pipelines. Responsible for developing cloud infrastructure and ensuring reliability, security, and observability.
DevOps Engineer supporting cloud modernization for the Department of the Air Force on the Cloud One contract. Involved in systems analysis, security practices, and collaboration with engineering teams.
Journeyman Cloud Operations Engineer maintaining cloud infrastructure across DoD organizations. Supporting DevSecOps and ensuring compliance with security requirements in a high - visibility program.
DevOps Engineer managing cloud - native platforms for Capgemini. Collaborating with development, data/ML, and security teams to deliver scalable solutions on Azure.