Senior Engineer for the Serverless Cell Platform at CrowdStrike, a leader in cybersecurity. Monitoring large-scale distributed systems to ensure high performance and reliability.
Responsibilities
Monitor and maintain the health, performance, and reliability of our hyperscale cell infrastructure processing trillions of events daily
Lead incident response and problem management through established on-call rotations and structured feedback loops
Implement comprehensive monitoring with Service Level Indicators to enable proactive alerting and automated self-healing
Conduct capacity planning and forecasting based on ingest rates and query patterns to optimize resource utilization
Ensure data integrity and compliance across >100 PB of stored data through automated consistency checks and recovery testing
Manage access controls, certificate rotation, and vulnerability management across cell infrastructure according to defined SLAs
Provision and scale cell infrastructure (vertical/horizontal) based on demand and performance requirements
Develop microservices and automation tools for cell components, including ingest writers and management systems
Orchestrate version upgrades, patch management, and configuration changes with minimal customer impact
Perform load testing and performance benchmarking to validate scaling thresholds and optimize costs
Coordinate with fleet operations, product teams, and infrastructure teams on global changes and capacity planning
Create technical documentation, operational playbooks, and partner with teams to address customer-impacting issues
Work in a team of friendly, trustworthy, and knowledgeable colleagues
Build and maintain CI/CD pipelines for testing and releasing configuration and software
Troubleshoot complex issues across multiple large-scale distributed systems, including LogScale, Kafka, object storage systems, and related infrastructure
Work closely with Engineering and Customer Support to troubleshoot time-sensitive production issues, regardless of when they happen
Apply SRE best practices, including SLOs, error budgets, chaos engineering, and blameless post-mortems
Effectively utilize AI coding assistants (e.g., Anthropic Claude) to accelerate development and problem-solving.
Requirements
Proven experience designing and implementing distributed systems with high scalability, availability, and performance optimization at enterprise scale
Experience in contributing to broad technical leadership in products or services
A can-do attitude; you thrive collaborating in a team and are not afraid of taking on responsibilities
Several years' experience with large-scale, business-critical Linux-based environments
Solid grounding in the technology of at least one cloud environment (AWS, Azure, GCP)
Experience working with CI/CD, Jenkins Git, Artifactory, Bitbucket
Go (golang) programming experience in production environments
Some familiarity with Python programming
Experience with configuration management systems such as Chef or Ansible
Availability for on-call on a rotational basis
Bonus Points: Experience with Kafka
Bachelor's degree in an applicable field, such as Computer Science or Engineering.
Benefits
Market leader in compensation and equity awards
Comprehensive physical and mental wellness programs
Competitive vacation and holidays for recharge
Paid parental and adoption leaves
Professional development opportunities for all employees regardless of level or role
Employee Networks, geographic neighborhood groups, and volunteer opportunities to build connections
Senior Engineer I at Phillips 66 combining engineering and physics models with ML. Enhancing safety, reliability, and profitability through digital product development.
Customer Success Integration Engineer in IDEMIA responsible for system integration and customer support. Overseeing software validation while collaborating with global teams.
Senior Product Engineer responsible for product design and development in mechanical and electrical engineering. Enhancing customer specifications and assuring product quality for mass production at Rogers Corporation.
Student assistant position involving Full Stack Development within a leading research institute in Berlin. Contributing to software solutions in process management and industry projects.
Director of Software Engineering at Acuity leading AI - enabled digital commerce platform development and transforming user experience with modern architecture.
Senior Product Engineer leading application and integration of protection and control solutions by Hubbell. Collaborating with engineering, sales, and customer support to deploy tailored technical solutions.
Software Engineer leading a team to develop high quality software solutions for DoD training systems. Supporting the JTSE program at Joint Staff Complex in Suffolk, VA.
Lead Principal Engineer Specialist at SAE facilitating aviation standards through technical management and collaboration. Recruiting and mentoring volunteers while driving continuous improvement initiatives in a hybrid work environment.
Product Engineer overseeing the technical lifecycle of screening and biomass handling products for Valmet. Collaborating with global teams and providing engineering expertise across the product lifecycle.
Lead ETL Developer responsible for ETL solutions involving data integration and automation. Working in a hybrid environment at Canada Life with a strong emphasis on collaboration.