Senior Lead Site Reliability Engineer overseeing critical systems stability and incident management. Leading Java applications reliability and supporting a dynamic technology environment.
Responsibilities
Ensure stability, availability, resilience, and scalability of critical systems
Manage critical incidents and major problems
Oversee operation and reliability of batch jobs and process chains using Control-M
Continuous monitoring of critical transactional systems
Design, implement, and evolve Site Reliability Engineering practices
Act as the technical lead for the AMS service
Provide expert support and technical leadership in Java, Spring Boot, microservices, and REST/SOAP APIs
Oversee monitoring, alerting, and observability strategies
Maintain up-to-date technical and operational documentation
Requirements
8+ years of experience in IT
Strong focus on production support and AMS operations
Java development and architecture
Reliability and availability of mission-critical systems
Bachelor's or Engineering degree in Systems Engineering, Computer Science, Information Technology, Software Engineering or related fields
Java certifications preferred
ITIL / ITSM and Cloud certifications (Azure / GCP) preferred
DevOps / SRE certifications preferred
Spanish: Fluent
English: Basic to intermediate (technical)
Benefits
Flexible work environment
Professional development opportunities
Job title
Senior Lead – Site Reliability Engineer, Java / APIs
Vulnerability & Configuration Management Engineer responsible for vulnerability management and remediation processes at Relax Gaming. Collaborate with IT teams to improve security measures across various platforms.
DevOps Engineer for designing and maintaining Azure - based hybrid cloud infrastructure for a company specializing in nature - based smart city solutions. Leading cloud architecture and mentoring engineers as part of a high - impact team.
SRE responsible for ensuring reliability and performance of IT systems at a digital transformation company specializing in public sector efficiency. Collaborating on system health, incident response, and automation tasks.
DevOps Senior role at Beyond Soluções managing CI/CD for .NET and Kubernetes applications. Collaborating on cloud solutions while fostering a culture of innovation and quality.
Senior Software Engineer at PayPal managing cloud infrastructure and DevOps solutions. Delivering complete SDLC solutions and guiding engineering teams for scalable and reliable services.
Senior Site Reliability Engineer at Diligent leading reliability, automation, and observability across cloud infrastructure. Build tools for incident response and enhance performance in fast - paced environments.
Perception Deployment Engineer deploying deep learning models on embedded systems at Caterpillar. Collaborating with cross - functional teams for integration and optimization of perception modules in vehicles.
Principal Site Reliability Engineer at AT&T required to design scalable solutions for critical operations with minimal downtime. Collaborating with teams to monitor and improve system performance in cloud environments.
DevOps Engineer managing AI SaaS infrastructure at a high - growth European company. Supporting AI model deployment and ensuring platform security and compliance with multiple systems integration.
Engineering Manager leading teams for observability platforms at LexisNexis. Owns operational excellence across software delivery lifecycle in Raleigh, NC.