Senior DevOps Engineer optimizing cloud infrastructure for fast-growing digital classroom platform. Collaborate with teams to enhance applications' reliability, performance, and scalability while ensuring system stability.
Responsibilities
Analyze and optimize system reliability, performance, and resource utilization of cloud infrastructure
Develop and maintain automation scripts for deployment, monitoring, and maintenance tasks.
Implement infrastructure as code (IaC) to automate the provisioning and configuration of infrastructure components.
Design and implement monitoring solutions to proactively identify and address issues.
Participate in on-call rotations and respond to incidents to ensure system stability and performance.
Conduct capacity planning to anticipate future resource needs and optimize infrastructure scalability.
Define and track reliability metrics to measure and improve system performance.
Prepare and present reports on system reliability and performance.
Work closely with software development teams to influence and improve the reliability and scalability of applications.
Conduct post-incident reviews to identify root causes and implement preventive measures.
Troubleshoot complex issues in a production environment.
Requirements
7+ years of experience in a DevOps, SRE or similar role
Bachelor's degree in Computer Science, Information Technology, or a related field.
Relevant experience in software engineering, systems administration, or a related field.
Proficiency in programming languages (e.g. Python, Go, Ruby)
Strong scripting skills for automation tasks (e.g. Bash, Python)
Hands-on experience and in-depth knowledge of cloud platforms (e.g. Google Cloud, AWS) and container orchestration tools (e.g. Kubernetes), including adherence to best practices and resource optimisation
A proficient understanding of core networking concepts (e.g. TCP/IP, DNS, load balancing)
Familiarity with Infrastructure as Code (IaC) tools (e.g. Terraform) and/or configuration management tools (e.g. Ansible, Puppet, Chef)
Experience with infrastructure monitoring, logging and alerting tools (e.g. Datadog, Prometheus, Grafana, PagerDuty), and log analysis
Strong collaboration and communication skills to work effectively with cross-functional teams
Ability to analyze complex systems and troubleshoot issues effectively.
Benefits
A people-first employer that is on an inspiring mission to build the future of education while changing the lives of millions
High calibre and diverse team ranging from successful startup veterans, to Fortune 500 and big tech professionals
Continuous learning and development opportunities, including subsidised course fees, certifications, conferences, and free access to Udemy and more
A strong mission ; the satisfaction of knowing you’re not only helping modern day superheroes, aka teachers but also helping them shape the minds of future generations all across the globe
Happy customers; helping thousands of schools worldwide through the digital transformation of education for the 21st century.
One of the most popular and fastest-growing EdTech platforms worldwide.
Director of Data Engineering leading a strategic DevOps team within Enterprise AI. Balancing leadership with hands - on expertise to enable AI technology adoption.
Join a Data Engineering Team as a Senior DevOps to support multiple Data & AI initiatives. Utilize cloud technologies and enhance data pipelines in a collaborative environment.
Principal Site Reliability Engineer at Early Warning designing performance and resiliency patterns for applications and infrastructure. Collaborating with development teams to improve systems and data integrity.
DevOps Engineer contributing to CI/CD setup and Azure services management. Collaborates with teams to ensure efficient project delivery in a hybrid environment.
IT DevOps Specialist at BMW responsible for analyzing requirements and implementing software solutions in AWS cloud environments. Collaborating internationally within agile teams for digital transformation projects.
DevOps Engineer at Vistra designing, implementing, and maintaining robust CI/CD pipelines and cloud infrastructure. Enabling software delivery across multiple technology stacks with a focus on AWS.
Manage complex customer rollouts and initial system deployments at Talex.ai. Bridging technical development with real - world application in robotics and AI systems.
Cloud Operations Engineer designing and implementing highly reliable cloud solutions. Leading cloud infrastructure initiatives for production operations and customer success in a growing team.
Quality Engineer supporting new product launches and reliability testing for SSD at Micron in Malaysia. Responsible for coordinating test activities and conducting failure analysis.
Reliability Engineer ensuring operational readiness of data centers at Rowan Digital Infrastructure. Overseeing commissioning, operational standards, and transitioning facilities into live operations.