Platform Engineer developing API and CI/CD environments for machine learning at CADDi, improving developer productivity and system reliability.
Responsibilities
Build API and batch execution environments for running machine learning model inference, as well as deployment environments using CI/CD.
Implement monitoring, performance tuning, and other improvements to enhance site reliability in production environments.
Optimize the cost of inference and training platforms.
Create the deployment and operation processes based on input from the modeling and platform teams.
Actively experiment with new ML, MLOps, and infrastructure tools, quickly validating ideas through proof-of-concepts and applying what works to real company products.
Besides the team we are recruiting for this time, you may be assigned to other teams depending on your experience and preferences. (In that case, we would be happy to discuss this with you at the interview.)
After joining the company, your role may change due to organizational growth or an individual's career perspective.
Requirements
5+ years of professional experience as a software engineer
Experience leading software development
We especially value experience leading design, development, operations, and the necessary communication involved in roles such as team lead or project driver, regardless of project size
At least 2 years of experience in ONE or more of the following:
Developing shared platforms or backend systems using cloud infrastructure.
Designing and operating Machine Learning systems (MLOps) with consideration for latency, cost, and non-functional requirements.
Developing Generative AI applications using LLMs, RAG architectures, and Vector Databases.
Hands-on experience with statically typed programming languages (such as TypeScript, Rust, Java/Kotlin, Go, etc)
General understanding of the core Computer Science concepts behind AI (such as Vector Space, Embeddings, or Inference) and the ability to leverage these principles to build and integrate AI-driven features into software platforms.
Experience in development using public cloud platforms such as AWS, Google Cloud, etc.
Fluent business communication skills in English, able to complete daily tasks in English, including text communication and meetings.(CEFR B1 or Higher level)
Must currently reside in Vietnam or have plans to relocate. Foreign nationals must also hold a valid Vietnam work permit or be legally eligible to work in Vietnam.
Experience developing machine learning pipelines using tools such as Vertex AI Pipelines, Kubeflow, Apache Beam, or Spark
Familiarity with at least one ML/AI framework such as scikit-learn, PyTorch, or TensorFlow.
Development experience related to MLOps or SRE
Experience collaborating with ML engineers to continuously improve and deliver machine learning and data science models
Experience building and operating systems such as Data Lakes or Feature Stores
Experience implementing initiatives to improve data quality for data-centric ML model improvement
Experience planning and driving data utilization initiatives—internally or externally—using tools such as BigQuery or Redash
Basic knowledge of algorithms related to machine learning, statistics, linear algebra, and computer science
Experience working with Scrum or Agile methodologies.
Conversational-level Japanese proficiency(Japanese Language Proficiency Test N2 or above is a guideline)
Benefits
Hybrid (come to Office at least once a week)
Remote (depending on the case, and limited to those who can go on business trip due to Company orders)
Office address:
HCMC: 7F, Gia Loc Building, No. 27-29 Nguyen Cuu Van Street, Ward 17, Binh Thanh District, HCMC
Hanoi: Unit 9.03, 9F, The West Building, 265 Cau Giay Street, Cau Giay Ward, Hanoi
Official full-time employee
Probation period: 2 months
Annual paid leave: 12 days
National holidays
Year-end holidays (December 31 to January 3)
Tet holidays
Others (following Labor Regulations)
13th month salary
Salary review: twice a year
100% monthly basic salary and mandatory social insurances in 2-month probation
Premium Health Insurance
Social insurance, health insurance, unemployment insurance, workers’ accident compensation insurance
Annual health check-up
Allowances such as: child-care allowance, commuting allowance, life event congratulatory gift, etc
Growth support such as subsidy for server fee, support for attending external training courses
Intensive training program (external or internal training courses, workshop etc)
Senior Machine Learning Engineer at Itaú, driving innovation with data and AI solutions. Collaborating across teams to implement robust machine learning architectures and ensure scalable deployments.
Machine Learning Engineer responsible for developing and deploying advanced ML and AI solutions at Zendesk. Collaborating with stakeholders to deliver impactful business outcomes using latest machine learning technologies.
Lead advanced machine learning model development and optimization at PayPal. Collaborate with teams to deploy scalable ML solutions in production environments.
Senior Machine Learning Engineer at Pivotal Health developing ML systems for healthcare reimbursement. Collaborating across teams to build and maintain reliable, production - grade machine learning systems.
Machine Learning Engineer working with Algorithm team on customer onboarding processes. Focus on execution and automation of models using computer vision and AI in sports industry.
Senior Machine Learning Engineer at Troveo designing and optimizing machine learning pipelines for AI video models. Collaborating with cross - functional teams to build scalable video data solutions.
Software Engineer focusing on ML infrastructure for drug discovery at Genesis AI. Leading engineering efforts to enhance scalable platforms for generative modeling and large - scale simulations.
AI/ML Engineer developing machine learning systems for TymeX's digital banking platform. Collaborating across teams to enhance customer interaction and personalization through AI technology.