DevOps Engineer developing and enhancing machine Learning infrastructure. Collaborating with AI teams to support ML projects in an Enterprise SaaS startup for contact centers.
Responsibilities
Design, build, and develop/enhance state of art machine Learning system infrastructure (cloud and on-premise) core components and architect platforms to create, train and deploy ML models.
Build operating dashboards and charts to track system errors, performance and enable root cause analysis.
Identify gaps and evaluate relevant tools and technologies as needed to improve processes and systems, leveraging open-source and cloud computing technologies to build effective solutions.
Collaborate with the AI team to drive ML projects from conception to completion and production monitoring.
Requirements
Bachelor's or above with a good academic background.
2-4 years of meaningful work experience in DevOps handling complex services.
Strong troubleshooting skills to keep our services highly available.
Strong expertise and experience with Google Cloud Platform (GCP), Docker, Kubernetes, CI/CD, and Jenkins.
Extensive experience in designing, implementing, and maintaining infrastructure as code, preferably using Terraform.
Create and maintain deployment manifest files for microservices using HELM.
Having LLMOps or MLOps experience is a bonus.
Strong expertise is required with deployment at scale on a Kubernetes cluster via HPA.
Broad technical background and experience with architecture, design, and operations of cloud solutions and how to meet security compliance requirements.
Monitoring system health, ensuring security, scalability, and reliability.
Design, implement, and maintain observability, monitoring, logging, and alerting using tools like Prometheus, Grafana, Promtail, Loki, and Datadog.
Benefits
market-leading compensation, based on the skills and aptitude of the candidate.
Senior Site Reliability Engineer managing the reliability and operational health of the Loan Origination System for a fintech company. Collaborating with engineering teams in Brazil and the US to improve system reliability.
Cloud Engineer working with Azure DevOps and digital transformation in a global team at EY. Collaborating on cloud engineering projects and supporting CI/CD pipeline development.
DevOps Engineer creating better conditions for developers in Saab's defence technology. Collaborating with developer teams for effective continuous development and delivery of software.
Ingénieur Infrastructure DevOps chez Bull, renforçant l'équipe AdminLab Echirolles. Travailler sur des infrastructures Linux et des pratiques d'automatisation dans un environnement HPC.
Product Quality & Reliability Engineer developing quality/reliability standards for Applied Materials. Design methods for testing products and analyze operational data in a supportive team environment.
DevOps System Engineer creating and managing infrastructure for ESET's global SaaS service. Collaborating with tech teams to maintain secure and stable operations.
Provides expertise in business applications design and functionality. Supports users and validates technical designs for alignment with business needs.
Senior Site Reliability Engineer supporting the reliability and performance of Broadridge’s fintech platform. Collaborating with senior engineers on automation, infrastructure, and production stability.
DevOps Engineer at Mindera focusing on Windows environments and Azure cloud solutions. Involves system modernization, automation, and migration projects with collaborative teams.
DevSecOps Lead supporting Synthesized's cloud automation strategy with a focus on security and compliance. Collaborating closely with development teams to shape cloud architecture and enhance deployment processes.