Lead Resiliency Engineer focusing on enhancing reliability and availability of technology platforms at American Family Insurance. Mentor teams and implement engineering practices for seamless operations.
Responsibilities
Lead and mentor a team in implementing solutions to enhance overall resiliency.
Help define, document, and champion comprehensive resiliency engineering principles and practices tailored to the organization’s environment.
Participate in the selection and implementation of tools for monitoring, alerting, chaos engineering, and automated recovery.
Guide the analysis of ITSM workflows (incident, change, problem) to identify and prioritize high-impact automation opportunities.
Architect and direct the development of automation solutions leveraging ITSM tools and integrating with other enterprise systems.
Ensure the deployment and integration of automated workflows, maintaining seamless data flow and effective reporting.
Establish and monitor metrics to track the effectiveness of resiliency and automation initiatives.
Collaborate with Enterprise DevOps, Integration Platform DevOps, and other teams to review CI/CD pipelines, identify bottlenecks, and drive improvements.
Participate in the development of a comprehensive automation strategy encompassing build, test, security scanning, and deployment processes to improve system resiliency.
Participate in defining and standardizing operating environments and infrastructure configurations, partnering with architecture teams to implement Infrastructure-as-Code (IaC) frameworks.
Implement and manage automated testing and rollback mechanisms for infrastructure and application changes.
Lead the design, development, enhancement, and maintenance of tools, systems, and software solutions to support resiliency objectives.
Direct incident triage efforts, determining scope, urgency, and potential impact, and coordinate effective response and recovery.
Lead technology evaluations and re-engineering activities to support strategic direction and continuous improvement.
Transform business requirements into technical specifications, ensuring alignment with resiliency and automation goals.
Foster cross-functional coordination and ensure a partnership-focused approach to balancing service priorities with business needs, managing risk-based exceptions as necessary.
Requirements
Experience in designing fault-tolerant architectures, including automated failover, multi-region redundancy, and graceful degradation strategies enabled by Chaos engineering.
Deep understanding of complex distributed systems, such as microservices orchestration, service meshes, and eventual consistency models.
Demonstrated experience delivering customer-driven solutions, support, or service in high-availability environments.
Extensive knowledge of software engineering architectures, system/software design, and system deployments.
Proven experience across multiple IT domains, including development, testing, configuration, deployment, and monitoring.
Strong understanding of infrastructure technologies and application development methodologies.
Demonstrated experience in system administration activities (configuration, installations, patch management, server maintenance) and network management (firewalls, proxies, IP management, routing, DNS).
Experience in the utilization and support of integration and communication protocols between applications, databases, and technology platforms.
Solid foundation in building scalable frameworks and providing specifications for APIs that support enterprise fulfillment processes.
Strong analytical and problem-solving skills, with the ability to diagnose and resolve complex technical issues.
Proven ability to lead and mentor technical teams, driving continuous improvement and innovation in resiliency engineering.
Benefits
comprehensive medical, dental, vision and wellbeing benefits
competitive 401(k) contribution
pension plan
annual incentive
9 paid holidays
paid time off program (23 days accrued annually for full-time employees)
HVDC Valves Site Lead Engineer supervising a team for installation and maintenance of HVDC Valve Solutions. Requires mechanical engineering expertise and travel for GE projects.
Engineer for technical office in construction projects at Würth España. Collaborating on real technical projects impacting construction and clients in Spain.
Senior Performance Engineer providing technical expertise and mentoring within Energy Systems Group. Leading project development and energy savings strategies in Florida.
Telematics System SW Engineer at 42dot designing next - gen telematics systems. Collaborating with teams to implement stable telematics communications.
Forward Deployed Engineer integrating Firecrawl, scaling inside customer products using TypeScript/Node.js. Focused on delivery, debugging, and collaboration with customers and engineering.
Power Generation Engineer focusing on wind technology and wind farm projects at Aurecon. Collaborating closely with clients and multidisciplinary teams to deliver solutions throughout the project lifecycle.
Power Generation Engineer specializing in thermal generation projects for Aurecon, providing expert technical input across all project stages. Collaborating with multidisciplinary teams to deliver sustainable and effective solutions.
Solar Electrical Field Project Engineer at Moss engaging in administrative and technical management of large utility scale Solar projects. Assisting with contract administration, procurement, and scheduling.
Network Engineer responsible for designing and supporting ICNs in manufacturing facilities for Cisco. Requires knowledge of C++/C# and Cisco industrial infrastructure with hybrid work requirements.
System Validation Engineer validating NVIDIA DOCA BF - Bundles for BlueField DPUs. Focuses on system - level validation of software and firmware bundles for data centers.