Design, implement, and maintain scalable cloud infrastructure for AI applications using AWS services, with a focus on ECS, S3, RDS, and Bedrock
Manage deployments with Kubernetes clusters, ensuring high availability and performance, and integrating with our existing infrastructure best practices
Develop and maintain CI/CD pipelines
Develop and expand our Infrastructure-as-Code (IaC) with Terraform to ensure consistent and reproducible deployments across multiple environments
Collaborate with Python developers to containerize AI applications and optimize them for production deployment
Design and implement cloud networking solutions that support secure and efficient communication between AI microservices
Maintain and monitor security best practices for AI applications, including secrets management, access controls, and compliance requirements
Troubleshoot and resolve infrastructure-related issues across the AI application stack, ensuring minimal disruption to business operations
Stay updated with emerging MLOps tools, cloud technologies, and industry best practices to continuously improve our deployment and infrastructure capabilities
Participate in the strategic planning of AI Innovation infrastructure, identifying opportunities for automation and process improvement
Requirements
Bachelor's degree in Computer Science, Software Engineering, DevOps, or a related technical field
3+ years of experience in DevOps, MLOps, or cloud infrastructure engineering
Proven experience with AWS cloud services, particularly ECS, S3, RDS, IAM, and networking components
Strong expertise in Infrastructure as Code tools, specifically Terraform for cloud resource management
Experience building and maintaining CI/CD pipelines for application deployment, preferably with ML/AI applications
Strong understanding of containerization technologies and best practices for production deployments
Experience with cloud networking, security, and compliance best practices
Basic Python and software development skills
Excellent communication and collaboration skills, with the ability to work effectively with development teams and stakeholders.
Benefits
Competitive salary and performance-based bonuses
Comprehensive benefits package, including health, dental, and retirement plans
Opportunities for professional growth and career advancement
A collaborative work environment focused on innovation, learning, and excellence
Senior Software Engineer developing machine learning geospatial products for Planet. Collaborating with engineers and scientists on innovative remote sensing analytics.
Machine Learning Engineer responsible for optimizing AI pipelines at Easy2Parts. Join a growing team to revolutionize component sourcing with AI technology.
AI/ML Engineer developing and deploying machine learning solutions for Nokia's network optimization projects. Collaborating with cross - functional teams to enhance network planning capabilities.
Machine Learning Platform Engineer for Coinbase, building foundational components for ML at scale. Collaborating on fraud combat, personalizing user experiences, and blockchain analysis.
Machine Learning Engineer focused on building sophisticated models to protect Coinbase users from fraud. Engaging in hands - on technical role with modern AI/ML methodologies.
Senior ML Platform Engineer developing and maintaining scalable ML infrastructure at GEICO. Focused on Large Language Models and collaborating with data science and engineering teams.
Staff ML Engineer developing GenAI infrastructure at Zendesk. Leading design and optimization of ML platforms while fostering technical excellence and collaboration.
Senior Deep Learning Engineer developing deep learning models for wireless communications. Working on next - gen signal processing and radio access technologies at NVIDIA's Vietnam R&D center.
Leading a team of ML Engineers to design and deploy AI - driven solutions at Welldoc. Overseeing critical ML projects while collaborating with international teams.
Senior ML Platform Engineer building and scaling machine learning infrastructure for AI applications. Responsible for LLM deployment, Kubernetes management, and mentoring engineering teams.