Site Reliability / DevOps Engineer developing Big Data platforms for clients in Telco and Retail industries. Focus on stability, scalability, and performance of large-scale data processing systems.
Responsibilities
Design, deploy, and maintain infrastructure for Big Data platforms, including Cloudera on-prem solutions and cloud-based environments on AWS or Azure
Implement, manage, and optimize monitoring solutions using Zabbix and Prometheus to ensure performance, availability, and reliability of data platforms
Troubleshoot and resolve platform-related issues, focusing on system performance, reliability, and scalability of large data sets
Use Terraform and Ansible for infrastructure automation, configuration management, and to ensure repeatable, scalable deployments
Implement and maintain CI/CD pipelines for seamless deployments and continuous integration
Deploy and manage Kubernetes clusters for orchestrating Big Data workloads and ensuring efficient resource utilization
Collaborate with network teams to ensure seamless communication between data services, optimize data traffic, and enhance security practices
Proactively identify and resolve performance bottlenecks within Big Data platforms, including resource management, cluster tuning, and workload optimization
Requirements
2+ years of hands-on experience in DevOps or SRE roles
Proficiency in monitoring systems using Zabbix or Prometheus for tracking and alerting system metrics
Knowledge of Linux (RHEL) systems, including scripting, system administration, and troubleshooting
Hands-on experience with cloud environments, particularly AWS or Azure, including the deployment of cloud-native services and infrastructure
Expertise in deploying, managing, and scaling applications using Kubernetes
Experience with Terraform and Ansible for infrastructure automation and configuration management
Experience with CI/CD pipelines and tools such as Argo CD, Flux CD, or similar
Understanding of networking concepts, including security, VPNs, and performance tuning in hybrid environments
Analytical skills with experience in identifying and solving complex platform issues, particularly performance bottlenecks
Knowledge of best practices and tools for optimizing system and application performance in large-scale distributed environments
Benefits
Participation in the company’s stock options program
Flexible Benefits & Personal learning budget from day 1
10 Growth Days per year - dedicated time for learning and development
Ownership and dynamics in your role
All the support you need from our experienced team to become an even better professional
Hybrid work environment with preferably at least 1 day per week in the office
DevOps Specialist creating and overseeing Azure hybrid cloud infrastructures for EVLO's battery energy storage solutions. Collaborating with teams to implement cutting - edge technologies in a dynamic environment.
Software Quality and Release Engineer developing and maintaining C++/Python software solutions for aerospace and defense industry. Collaborating on CI/CD automation and feedback documentation.
Senior DevOps Engineer building and managing big data platforms for clients in telecommunications and finance industries. Ensuring stability, scalability, and performance across cloud and on - premise environments.
Site Reliability Engineer ensuring reliability, automation, and observability across cloud infrastructures for Diligent. Leading initiatives to improve performance in fast - paced environments.
Senior DevOps Engineer leading DevOps design and implementation for gaming projects at Stillfront. Collaborating with international teams to enhance gaming infrastructure and reduce costs.
Mainframe DevOps Engineer at Kyndryl enhancing mainframe delivery practices and migrating SCM to Azure DevOps. Requires extensive Mainframe development experience and DevOps skills.
DevOps/MLOps Engineer designing, automating, and maintaining scalable infrastructure for federal client. Collaborating with software engineers and data scientists for resilient solutions.
Senior DevSecOps Engineer/Developer responsible for building Humana's software security platform. Modernizing architecture and managing CI/CD pipelines as part of core engineering team.