Site Reliability Engineer at AIG applying software engineering principles to IT operations and building resilient IT infrastructure while ensuring system stability and speed.
Responsibilities
Apply software engineering principles to IT operations
Build resilient, efficient, and scalable IT infrastructure
Prioritize automation, monitoring, and incident management
Define and meet Service Level Objectives (SLOs)
Manage error budgets
Conduct blameless postmortems for continuous improvement
Act as a bridge between development and operations teams
Ensure the speed of software development and system stability
Requirements
Bachelor's degree in related field
3+ years of relevant technology experience
Solid grasp of core technical areas such as programming (Python, Go, Java)
System administration (Linux/Unix), networking, databases, and cloud computing platforms (like AWS, Azure, GCP)
Practical experience running production systems
Proficiency in scripting languages (e.g., Python, Bash)
Experience with Infrastructure as Code (IaC) tools (e.g., Terraform, Ansible)
Implementing comprehensive monitoring solutions (e.g., Prometheus, Grafana, or ELK Stack)
Ability to quickly diagnose and resolve system incidents
Excellent communication skills
Proactive in learning new technologies
Benefits
Volunteer Time Off
Matching Grants Programs
Comprehensive benefits package focused on health, wellbeing and financial security
Professional development opportunities
Job title
Service Reliability Engineer, GI Application Management
Lead DevOps Engineer modernizing infrastructure and automation for Wells Fargo’s Consumer Technology platforms. Collaborating across teams to build scalable solutions and elevate engineering excellence.
Senior DevOps Engineer re - envisioning enterprise level applications at Ryan. Designing and maintaining cloud infrastructure for optimal service delivery.
Reliability Engineer focusing on risk minimization and maintenance strategies in an innovative PEM electrolyzer company. Collaborating cross - functionally to enhance equipment and systems performance.
Principal Site Reliability Engineer at Red Hat managing the RHIVOS product SRE initiative. Focusing on infrastructure reliability and continuous improvement with deep technical expertise in engineering.
DevOps Azure Developer specializing in end - to - end application development at global healthcare leader Abbott. Engaging in CI/CD processes and building secure cloud applications using Azure and Python.
DevSecOps Engineer at Livingston ensuring security in CI/CD pipelines and building resilient systems. Collaborating with teams to integrate best practices in software development.
Reliability Engineer at LANXESS improving the reliability of fixed and rotating equipment. Partnering with Engineering and Operations to ensure asset safety and performance.
Cloud Engineer at Agility Technologies leading the design of scalable eLearning infrastructure. Collaborating on technical design and implementation involving cloud - based platforms and secure integrations.
Senior Hardware Reliability Engineer overseeing reliability testing and analysis of outdoor electronic assemblies at Gridware. Collaborating with mechanical engineers and contributing to product lifetimes modeling.
Senior Manager leading SRE, Virtualization, Networking, and AI Infrastructure teams at F5. Overseeing mission - critical infrastructure and driving operational excellence across hybrid compute environments.