DevOps Engineer optimizing CI/CD and security infrastructure at Helpshift. Ensuring reliability and scalability while mentoring junior team members in a hybrid workplace.
Responsibilities
Design, implement, and maintain secure CI/CD pipelines for automating deployment, configuration, and testing processes.
Own Helpshift production services and ensure complete monitoring coverage, troubleshoot and fix production issues.
Build a seamless zero-downtime process to upgrade our core infrastructure (ScyllaDB, Elasticsearch, Kafka, MongoDB, Redis)Move us to a region with no downtime. Build a cloud infrastructure that’ll be easy to move to a different cloud service provider.
Collaborate with development and operations teams to integrate security practices into the software development lifecycle.
Conduct regular security assessments, vulnerability scans, and penetration testing to identify and mitigate security risks.
Develop and maintain infrastructure as code (IaC) templates for provisioning and configuring cloud resources securely.
Monitor and respond to production incidents, including investigation, containment, and remediation activities.
Stay up-to-date with the latest security threats, vulnerabilities, and best practices, and make recommendations for continuous improvement.
You will play a pivotal role in ensuring the security, scalability, and reliability of our infrastructure and applications.
You will collaborate closely with cross-functional teams to implement security best practices throughout the development lifecycle, automate security processes, and enhance our overall DevSecOps capabilities.
Mentor Junior Team members
Requirements
Relevant experience of 4+ years and above.
In-depth knowledge of running/managing UNIX-like operating systems (we use Ubuntu).
Strong knowledge of networking protocols, security architectures, and identity and access management (IAM) principles.
Experience with containerisation technologies (e.g., Docker, Kubernetes) and securing containerised environments.
Experience in Designing and building solutions that are highly scalable, fault tolerant and cost-effective
Experience of various FOSS tools for monitoring, graphing, capacity planning, and logging.
Experience with IaaC tools like Ansible, Puppet, Terraform.
Experience with Cloud Computing platforms like Amazon AWS, Google Cloud Platform, Heroku.
Experience with managing NoSQL and RDBMS
Experience with queuing systems (Kafka, RabbitMQ) and Big data platforms (Hadoop)
Good programming skills with focus on scripting (Python, Shell, Perl).
Ability to analyse bottlenecks in architecture and quickly debug to reach resolution for issues
Have an automation mindset and ability to reason and work with complex systems.
Excellent communication and documentation skills
Quick learner and good mentor for junior team members
Principal Site Reliability Engineer at Early Warning designing performance and resiliency patterns for applications and infrastructure. Collaborating with development teams to improve systems and data integrity.
DevOps Engineer contributing to CI/CD setup and Azure services management. Collaborates with teams to ensure efficient project delivery in a hybrid environment.
IT DevOps Specialist at BMW responsible for analyzing requirements and implementing software solutions in AWS cloud environments. Collaborating internationally within agile teams for digital transformation projects.
DevOps Engineer at Vistra designing, implementing, and maintaining robust CI/CD pipelines and cloud infrastructure. Enabling software delivery across multiple technology stacks with a focus on AWS.
Manage complex customer rollouts and initial system deployments at Talex.ai. Bridging technical development with real - world application in robotics and AI systems.
Cloud Operations Engineer designing and implementing highly reliable cloud solutions. Leading cloud infrastructure initiatives for production operations and customer success in a growing team.
Quality Engineer supporting new product launches and reliability testing for SSD at Micron in Malaysia. Responsible for coordinating test activities and conducting failure analysis.
Reliability Engineer ensuring operational readiness of data centers at Rowan Digital Infrastructure. Overseeing commissioning, operational standards, and transitioning facilities into live operations.
Manager of Mechanical Engineering ensuring high - availability mechanical systems in data centers. Collaborating on lifecycle management and performance evaluation across missions - critical facilities in a hybrid role.
DevOps Engineer developing reusable Ansible and Puppet modules and managing CI/CD for project teams. Join PLATH in Hamburg, focusing on crisis detection software development.