Senior Site Reliability Engineer at iHeartMedia leading AWS and Kubernetes platforms. Responsible for direct team management and automation of infrastructure with a strong focus on innovative cloud solutions.
Responsibilities
Standardize and modernize Amazon EKS platforms & AWS Serverless Suites, including all Cutting-Edge Managed Services from AWS adhering to DevOps best practices.
Provide expertise and hands on implementation of large-scale, mission critical Kubernetes workloads with High Resiliency and multi-region architecture.
Work collaboratively with 2 to 5 Site Reliability Engineers.
Champion accountability; take responsibility through actions & thoughts.
Design and implement end-end CI/CD pipelines with CDK and CodePipeline, including integrating with source control, build tools and deployment targets like CFT stacks.
Prioritize & re-align quickly to adapt to a demanding fast paced Shift Left environment.
Maximize automation to improve speed and quality while relentlessly driving low-value, repetitive work out of our operational activities.
Work with our application delivery teams to design and build scalable and maintainable solutions for our customers.
Enforce GitOps workflow where Git is the source of truth for EKS clusters and app state in a multi-account and multi-region environment (FluxCD/ArgoCD).
Develop baselines for governance, consumption/cost and performance to ensure that our elastic cloud-based applications operate efficiently, securely and with zero down time.
Run Reliability Incident management processes along with Root Cause Analysis, developing Runbooks, & Self-Healing architecture.
Instill Standardization in DevOps processes across a wide range of applications.
Requirements
6+ years of hands-on experience in public cloud specifically AWS
3+ years of leading SRE/DevOps teams across complex AWS ecosystems
Deep understanding of high velocity SDLC best practices along with CI/CD & Application/infrastructure Monitoring practices to operate workloads at high scale
Expert proficiency in Kubernetes, Terraform, AWS CDK, Lambda, API Gateway, Route53, S3, EC2, Load Balancing, DynamoDB, CloudWatch, IAM, Networking, IOT, SQS, Event Bridge, etc.
Adept at solving & troubleshooting High volume Distributed architecture applications running on AWS
Demonstrated ability to design, build, and maintain AWS infrastructure using AWS CDK (TypeScript preferred) with strong modular patterns (multi-stack, multi-account, multi-region)
Strong understanding of GitOps methodologies, experience in implementing and managing multiple environments through declarative configuration management versioned in Git repos and applied via automated tools like Flux or ArgoCD
Hands-on experience managing large-scale, production EKS clusters across multiple regions and AWS accounts
Deep knowledge of AWS Cost optimization techniques such as Reserved Instances, Spot Instances, and Life Cycle Management
Proven ability to build highly secure AWS Infrastructure with a security first mindset
Proven ability to collaborate and build strong relationships with development teams including Conflict Resolutions & driving decisions/initiatives
Strong software development background including knowledge of microservices architecture along with fluency in JavaScript, TypeScript, or Node.JS or Python.
At least one among the following AWS Certifications: AWS Solution Architect Associate, AWS Solution Architect Professional, AWS DevOps Associate, AWS DevOps Professional, Professional Kubernetes Certifications
Benefits
Employer sponsored medical, dental and vision with a variety of coverage options
Company provided and supplemental life insurance
Paid vacation and sick time
Paid company holidays
A Spirit day to encourage and allow our employees to more easily volunteer in their community
A 401K plan
Employee Assistance Program (EAP) at no cost – services include telephonic counseling sessions, consultation on legal and financial matters, emotional well-being, family and caregiving
A range of additional voluntary programs, such as spending accounts, student loan refinancing, accident insurance and more!
Site Reliability Engineer ensuring the reliability and performance of cloud - native infrastructure at Sanlam Fintech. Collaborating with teams to deliver innovative solutions across the African continent.
DevOps Engineer building and owning a scalable event streaming platform for data analytics. Working at Statista, a leading business data platform, with hybrid and international team environments.
DevOps intern contributing to SSO logs integration for ELK stack at Atos. Enhancing authentication observability and supporting log collection and visualization at a leading digital transformation company.
DevOps Engineer creating a new cloud - native SSO solution based on NGINX and Kubernetes at Atos. Involves collaborating on the transition from Apache and VM to a modern infrastructure.
DevOps Engineer managing infrastructure and CI/CD at Boost - IT. Optimizing Kubernetes, GitLab CI/CD, and security practices in a hybrid remote work setting.
Site Reliability Engineer leading reliability engineering efforts at Honeywell Aerospace in Krakow, Poland. Driving improvements, collaborating with teams to enhance system reliability and performance.
Lead Software Engineer at Honeywell Aerospace Technologies ensuring reliability, availability, and performance of systems by collaborating with development and operations teams.
Team Lead overseeing Infrastructure Administration and DevOps at MoMo Payment Service Bank, ensuring high availability and compliance across cloud and on - premises environments.
Senior Cloud DevOps Engineer ensuring scalability and reliability of AI pipelines. Designing AWS environments and contributing to the DevOps culture in a collaborative team.
Site Reliability Engineer focusing on system reliability and automation for high - performance production systems in Warsaw. Collaborating with engineering teams for effective deployment and operational efficiency.