Lead SRE ensuring system reliability and scalability at Veepee, a leading e-commerce company. Collaborating with teams to automate processes and support technical challenges.
Responsibilities
Implementing tools and processes for deployment and industrialization (CI/CD, blue/green, canary, rollback, etc.);
Automating provisioning of a resilient infrastructure that meets the needs of products;
Working with development teams to facilitate regular releases;
Maintaining services in operational conditions, analyze and resolve performance and scalability anomalies (load tests) of current and historical deployments;
Supervising the application portfolio in collaboration with the Network Operations Center (NOC), manage access and security;
Participating in the evolution of the IS (VMware migration to KVM and service offer) and the reduction of the technical debt;
Being the evangelist of DevOps’ good practices and participate in the construction of a true transversal SRE community within Veepee.
To share company information & spread out team activity;
To define and run a clear and relevant organization within the team;
To develop the team without doing micromanagement;
Requirements
At least 3 years of experience in a similar function;
Knowledge of industrialization processes, agile methods, gitflow flow and DevOps practices in general and understanding of a system side;
Experience in maintaining high levels of availability;
On-call organization and incident response;
Familiar with Linux (good knowledge), knowledge of Windows would be a plus;
Proficiency with IaC: Packer, Terraform, Ansible, Puppet;
SUP: Icinga, ELK, Prometheus;
Hands on with Docker, Kubernetes, Nomad, Consul.
Proficiency with different types of DB such as, PostgreSQL, MongoDB, ElasticSearch;
You have strong verbal and written English language skills.
Empathetic and open-minded
Benefits
Dynamic and creative environment within international teams
The variety of self-education courses on our e-learning platform
The participation in meetups and conferences locally and internationally
Reliability Engineer focused on the dependability and mission success of complex space systems. Involvement includes analyses, collaboration, and adherence to aerospace reliability standards.
DevOps Engineer automating IT processes at Maurer Electronics GmbH in Hannover. Engaging in continuous integration and development with team collaboration and innovative solutions.
DevOps Engineer working with IT Security Team in Berlin, developing and supporting complex IT Security Services. Collaborating on automated IT - Security - Services with cutting - edge technologies and methodologies.
DevOps Engineer focusing on deploying high - security on - prem infrastructure and MLOps platforms for mission - critical systems. Collaborating on Kubernetes - based orchestration and machine learning workloads.
Cloud Site Reliability Engineer managing Solace Cloud services across leading cloud providers. Ensuring reliability, handling incidents, and collaborating with customers for operational excellence.
Senior Cloud Site Reliability Engineer ensuring reliability and health of Solace Cloud Services with hands - on cloud operations expertise. Lead incident management and customer support for high - impact environments.
DevOps Engineer designing and operating AWS infrastructure within industrial IoT environments. Working on systems that ensure security, resilience, and end - to - end observability.
Sr. Site Reliability Engineer (SRE) III providing technical solutions for the federal government. Collaborating in a high - performing team focused on reliability and application scalability.