Platform Engineer developing API and CI/CD environments for machine learning at CADDi, improving developer productivity and system reliability.
Responsibilities
Build API and batch execution environments for running machine learning model inference, as well as deployment environments using CI/CD.
Implement monitoring, performance tuning, and other improvements to enhance site reliability in production environments.
Optimize the cost of inference and training platforms.
Create the deployment and operation processes based on input from the modeling and platform teams.
Actively experiment with new ML, MLOps, and infrastructure tools, quickly validating ideas through proof-of-concepts and applying what works to real company products.
Besides the team we are recruiting for this time, you may be assigned to other teams depending on your experience and preferences. (In that case, we would be happy to discuss this with you at the interview.)
After joining the company, your role may change due to organizational growth or an individual's career perspective.
Requirements
5+ years of professional experience as a software engineer
Experience leading software development
We especially value experience leading design, development, operations, and the necessary communication involved in roles such as team lead or project driver, regardless of project size
At least 2 years of experience in ONE or more of the following:
Developing shared platforms or backend systems using cloud infrastructure.
Designing and operating Machine Learning systems (MLOps) with consideration for latency, cost, and non-functional requirements.
Developing Generative AI applications using LLMs, RAG architectures, and Vector Databases.
Hands-on experience with statically typed programming languages (such as TypeScript, Rust, Java/Kotlin, Go, etc)
General understanding of the core Computer Science concepts behind AI (such as Vector Space, Embeddings, or Inference) and the ability to leverage these principles to build and integrate AI-driven features into software platforms.
Experience in development using public cloud platforms such as AWS, Google Cloud, etc.
Fluent business communication skills in English, able to complete daily tasks in English, including text communication and meetings.(CEFR B1 or Higher level)
Must currently reside in Vietnam or have plans to relocate. Foreign nationals must also hold a valid Vietnam work permit or be legally eligible to work in Vietnam.
Experience developing machine learning pipelines using tools such as Vertex AI Pipelines, Kubeflow, Apache Beam, or Spark
Familiarity with at least one ML/AI framework such as scikit-learn, PyTorch, or TensorFlow.
Development experience related to MLOps or SRE
Experience collaborating with ML engineers to continuously improve and deliver machine learning and data science models
Experience building and operating systems such as Data Lakes or Feature Stores
Experience implementing initiatives to improve data quality for data-centric ML model improvement
Experience planning and driving data utilization initiatives—internally or externally—using tools such as BigQuery or Redash
Basic knowledge of algorithms related to machine learning, statistics, linear algebra, and computer science
Experience working with Scrum or Agile methodologies.
Conversational-level Japanese proficiency(Japanese Language Proficiency Test N2 or above is a guideline)
Benefits
Hybrid (come to Office at least once a week)
Remote (depending on the case, and limited to those who can go on business trip due to Company orders)
Office address:
HCMC: 7F, Gia Loc Building, No. 27-29 Nguyen Cuu Van Street, Ward 17, Binh Thanh District, HCMC
Hanoi: Unit 9.03, 9F, The West Building, 265 Cau Giay Street, Cau Giay Ward, Hanoi
Official full-time employee
Probation period: 2 months
Annual paid leave: 12 days
National holidays
Year-end holidays (December 31 to January 3)
Tet holidays
Others (following Labor Regulations)
13th month salary
Salary review: twice a year
100% monthly basic salary and mandatory social insurances in 2-month probation
Premium Health Insurance
Social insurance, health insurance, unemployment insurance, workers’ accident compensation insurance
Annual health check-up
Allowances such as: child-care allowance, commuting allowance, life event congratulatory gift, etc
Growth support such as subsidy for server fee, support for attending external training courses
Intensive training program (external or internal training courses, workshop etc)
Audio Machine Learning Co - op developing real - time AI - powered audio processing algorithms for Bose. Collaborating with experts to prototype and implement novel ML algorithms for various applications.
AI Center of Excellence Engineer at F5 supporting applied AI research, prototyping, and engineering initiatives. Evaluating AI techniques and creating integration recommendations for production systems.
Senior ML Engineer at Centra developing forecasting and AI - driven decision support for fashion brands. Collaborating to enhance ecommerce through machine learning and insights.
Staff ML/AI Engineer for healthcare communication solutions at Accurx. Leading AI/ML initiatives to enhance patient communication and healthcare efficiency.
Senior Machine Learning Engineer developing ML systems for healthcare communication technology at Accurx. Join our mission - driven team to solve real - world problems in healthcare.
Senior Developer at Valorem Reply delivering ML/AI applications on AWS. Collaborating with product and engineering teams to provide high - quality tech solutions.
Senior Developer building and evolving ML/AI applications on AWS for Valorem Reply. Collaborating closely with product, architecture, and engineering teams for quality solutions.
Senior Software Engineer designing and operating ML infrastructure for Plaid's AI initiatives. Collaborating with product teams to accelerate AI - powered financial experiences and ensure scalable ML systems.
Senior Staff Machine Learning Engineer at GEICO, enhancing service productivity through AI technologies. Collaborating with dynamic teams to develop and deploy scalable AI workflows across Geico.
Staff AI Engineer at GEICO designing and deploying AI platforms for virtual agent workflows. Collaborating with teams to improve service for millions of customers.