Senior Engineer for the Serverless Cell Platform at CrowdStrike, a leader in cybersecurity. Monitoring large-scale distributed systems to ensure high performance and reliability.
Responsibilities
Monitor and maintain the health, performance, and reliability of our hyperscale cell infrastructure processing trillions of events daily
Lead incident response and problem management through established on-call rotations and structured feedback loops
Implement comprehensive monitoring with Service Level Indicators to enable proactive alerting and automated self-healing
Conduct capacity planning and forecasting based on ingest rates and query patterns to optimize resource utilization
Ensure data integrity and compliance across >100 PB of stored data through automated consistency checks and recovery testing
Manage access controls, certificate rotation, and vulnerability management across cell infrastructure according to defined SLAs
Provision and scale cell infrastructure (vertical/horizontal) based on demand and performance requirements
Develop microservices and automation tools for cell components, including ingest writers and management systems
Orchestrate version upgrades, patch management, and configuration changes with minimal customer impact
Perform load testing and performance benchmarking to validate scaling thresholds and optimize costs
Coordinate with fleet operations, product teams, and infrastructure teams on global changes and capacity planning
Create technical documentation, operational playbooks, and partner with teams to address customer-impacting issues
Work in a team of friendly, trustworthy, and knowledgeable colleagues
Build and maintain CI/CD pipelines for testing and releasing configuration and software
Troubleshoot complex issues across multiple large-scale distributed systems, including LogScale, Kafka, object storage systems, and related infrastructure
Work closely with Engineering and Customer Support to troubleshoot time-sensitive production issues, regardless of when they happen
Apply SRE best practices, including SLOs, error budgets, chaos engineering, and blameless post-mortems
Effectively utilize AI coding assistants (e.g., Anthropic Claude) to accelerate development and problem-solving.
Requirements
Proven experience designing and implementing distributed systems with high scalability, availability, and performance optimization at enterprise scale
Experience in contributing to broad technical leadership in products or services
A can-do attitude; you thrive collaborating in a team and are not afraid of taking on responsibilities
Several years' experience with large-scale, business-critical Linux-based environments
Solid grounding in the technology of at least one cloud environment (AWS, Azure, GCP)
Experience working with CI/CD, Jenkins Git, Artifactory, Bitbucket
Go (golang) programming experience in production environments
Some familiarity with Python programming
Experience with configuration management systems such as Chef or Ansible
Availability for on-call on a rotational basis
Bonus Points: Experience with Kafka
Bachelor's degree in an applicable field, such as Computer Science or Engineering.
Benefits
Market leader in compensation and equity awards
Comprehensive physical and mental wellness programs
Competitive vacation and holidays for recharge
Paid parental and adoption leaves
Professional development opportunities for all employees regardless of level or role
Employee Networks, geographic neighborhood groups, and volunteer opportunities to build connections
Lead Software Engineer guiding teams in software design and implementation at Royal Caribbean Group. Responsible for delivering scalable solutions while collaborating globally with diverse teams.
Fullstack Developer for a tech company focused on scalable digital solutions. Collaborate in a technical team using modern technologies like Node.js and React.
Senior Electrical & Controls Engineer leading the design and development of control systems for automation projects in North America. Mentoring engineers and overseeing project execution in Richmond, BC.
Senior Software Developer building innovative solutions and automating security operations. Exploring and experimenting with security technologies in a creative engineering environment.
Senior Full Stack Engineer designing and maintaining financial management applications at AccountsIQ. Collaborating with Product, Engineering, and DevOps teams to implement scalable full - stack solutions.
Tech Lead overseeing enterprise cloud migration and architecture in hybrid setup. Managing multi - disciplinary squads and ensuring security and data strategies.
Staff Software Engineer at Walmart leading development of cloud - native platforms and AI - driven applications. Responsible for full software lifecycle and technical leadership in data - driven projects.
Fullstack Developer participating in the development of applications around the AEB account. Collaborating in a team to integrate software solutions for secure cloud access.
Automotive Linux BSP Senior Engineer developing embedded software and applications. Responsible for design, testing, and documentation to meet customer requirements.
Staff Engineer on Customer Order team at Grainger improving order searchability and data handling. Responsible for software maintenance, mentoring, and high - quality delivery of technology solutions.