Site Reliability Engineer focusing on system reliability and automation for high-performance production systems in Warsaw. Collaborating with engineering teams for effective deployment and operational efficiency.
Responsibilities
Design, develop, test, and deploy high-quality solutions, automation scripts, and tools to improve system stability, performance, and operational efficiency.
Proactively automate manual processes and optimize production operations.
Support the release and deployment lifecycle of applications, ensuring smooth, stable, and reliable rollouts.
Work closely with software and infrastructure teams to resolve system issues, provide system design input, support platform management, and contribute to capacity planning.
Create and maintain clear technical documentation for SRE solutions, automation tools, and operational procedures.
Provide regular progress updates to engineering stakeholders.
Requirements
Minimum 1 year of proven experience in software development.
Strong proficiency in at least one of the following: Python, C++, Java, or shell scripting.
Experience with Linux operating system internals.
Solid understanding of networking concepts.
Experience with modern version control systems (e.g., Git).
Familiarity with monitoring, logging, and CI/CD tools (e.g., Prometheus, Grafana, Splunk, Jenkins, GitLab CI) is a strong advantage.
Ability to work independently, manage time effectively, and take ownership of tasks from concept to delivery.
Strong analytical and problem-solving skills, with a proactive approach in fast-paced production environments.
Excellent verbal and written communication skills.
Ability to quickly learn new technologies and adapt to evolving platform requirements.
Strong focus on delivering measurable outcomes related to system reliability and performance.
Benefits
Opportunity to work within our client’s engineering team on high-performance production systems.
Project-based engagement with clear, measurable deliverables.
Hybrid work model based in Warsaw.
Collaborative technical environment with strong ownership and impact.
Exposure to modern SRE and automation practices in a production environment.
DevOps Engineer for designing and maintaining Azure - based hybrid cloud infrastructure for a company specializing in nature - based smart city solutions. Leading cloud architecture and mentoring engineers as part of a high - impact team.
SRE responsible for ensuring reliability and performance of IT systems at a digital transformation company specializing in public sector efficiency. Collaborating on system health, incident response, and automation tasks.
DevOps Senior role at Beyond Soluções managing CI/CD for .NET and Kubernetes applications. Collaborating on cloud solutions while fostering a culture of innovation and quality.
Senior Software Engineer at PayPal managing cloud infrastructure and DevOps solutions. Delivering complete SDLC solutions and guiding engineering teams for scalable and reliable services.
Senior Site Reliability Engineer at Diligent leading reliability, automation, and observability across cloud infrastructure. Build tools for incident response and enhance performance in fast - paced environments.
Perception Deployment Engineer deploying deep learning models on embedded systems at Caterpillar. Collaborating with cross - functional teams for integration and optimization of perception modules in vehicles.
Principal Site Reliability Engineer at AT&T required to design scalable solutions for critical operations with minimal downtime. Collaborating with teams to monitor and improve system performance in cloud environments.
DevOps Engineer managing AI SaaS infrastructure at a high - growth European company. Supporting AI model deployment and ensuring platform security and compliance with multiple systems integration.
Engineering Manager leading teams for observability platforms at LexisNexis. Owns operational excellence across software delivery lifecycle in Raleigh, NC.
Reliability Engineer optimizing site facility infrastructure and utility systems at Roche. Conducting root cause analyses and developing maintenance plans to enhance reliability and efficiency.