Data Reliability Engineer on Personal Investor Data & Analytics team responsible for ensuring data reliability and efficiency. Collaborating with various teams to identify issues and implement solutions within data pipelines.
Responsibilities
Proactively analyzes data pipeline & platform logs and metrics to identify trends and potential issues.
Participates in special projects and performs other duties as assigned.
Gain insights into PI Data & Analytics operations, demonstrates and champions Reliability culture and practices, builds relationships, and influences Reliability as a way of thought.
Exhibits proficiency in data reliability, scalability, performance, security, enterprise system architecture, toil reduction, and other Reliability best practices.
Communicates progress, issues, trends, and solutions to management and partner organizations.
Maintains proactive knowledge and understanding of pending elevations, enhancements, and infrastructure changes.
Proactively identifies potential failure points and designs strategies to ensure that failures remain localized, preventing widespread disruption and contagion.
Collaborates with internal teams to evaluate the health, stability, and reliability of systems/platforms.
Collaborates with product teams in triage and troubleshooting during client impacting incidents.
Participates in and/or facilitates post-incident reviews for any client-impacting events local to the Personal Investor Data & Analytics products.
Maintains centralized incident response playbook, in collaboration with DRE Champions on each product team.
Collaborates with DRE Champions and/or product team points of contacts to ensure adherence to the common operating model and standard development playbooks.
Requirements
Minimum of five years related experience, with at least two years of development experience.
Undergraduate degree or equivalent combination of training and experience.
1-3 years of Reliability Engineering experience.
2 years of DevOps experience.
Strong analytic and problem-solving skills.
Self-motivated individual with the ability to prioritize and manage changing priorities.
Experience and understanding of working in AWS data engineering products, Python, and SQL.
Proficiency and experience in observability, and telemetry tools such as Splunk, CloudWatch, Grafana, Datadog, etc.
Senior Site Reliability Engineer focused on developing and maintaining OpenShift - based platform solutions at Red Hat. Responsible for software automation, onboarding new services, and maintaining service reliability.
Site Reliability Engineer at Red Hat designing Python and Golang solutions for managed services. Involves onboarding services, maintaining reliability, and fostering team excellence.
Development Operations Engineer supporting enterprise application development in Java and/or C. Ensuring high availability and operational excellence in modern payment solutions.
Site Reliability Engineer designing and supporting Kubernetes environments for F5's UDF platform. Collaborating with cross - functional teams to ensure reliability and operational excellence.
Senior Site Reliability Engineer ensuring operational excellence for multi - datacenter infrastructure at F5. Developing automation tools and APIs in Python and Go.
DevOps Engineer needed to develop a new OpenXDR solution on AWS, processing security data from multiple sources. Join a leading cybersecurity company in Slovakia.
DevOps Engineer at Castalia Systems automating and optimizing toolchain and CI/CD pipelines. Designing Azure infrastructure and ensuring collaboration between development and operations teams.
Senior DevOps Engineer managing Kubernetes and AI - driven workflows at Hex Trust. Supporting blockchain infrastructure while implementing best DevOps practices.