Lead SRE team to design and operate scalable, secure cloud infrastructure for Instabase's AI platform. Manage CI/CD, Kubernetes, production reliability, and release processes.
Responsibilities
Define and steer the technical direction for the team, collaborating with cross-functional partners
Develop and execute comprehensive short and long-term roadmaps balancing business needs, user experience, and technical foundations
Oversee cloud infrastructure and deployment automation to ensure efficient and reliable operations
Guarantee uptime and reliability for production systems through proactive monitoring and production support
Manage vulnerability assessments and facilitate prompt remediation
Maintain and enhance CI/CD and build infrastructure to support development workflows
Implement and optimize tools to enhance developer productivity
Drive improvements in release management processes and tooling to ensure smooth, reliable software delivery
Build scalable, distributed, and fault-tolerant systems integrating Software and Systems Engineering to drive performance, capacity, and reliability
Requirements
5+ years of experience in Site Reliability Engineering, Software Engineering, or Production Engineering
Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field, or equivalent practical experience
Proven track record of setting technical and cultural standards for engineering teams
Demonstrated experience in managing and sustaining SaaS production environments
Hands-on experience with major cloud providers such as AWS and Azure
Proficient in containerization technologies like Docker
Expertise in container orchestration platforms, especially Kubernetes
Skilled in overseeing and managing software release processes to ensure smooth deployments
Systematic approach to solving platform and production issues, strong problem-solving abilities, and a passion for automation
Benefits
Bonus
Equity
Benefits
Hybrid work
Offices in San Francisco, New York, London and Bengaluru
Chassis Controls Software Engineer developing applications for sophisticated systems at Ford. Involves software delivery and calibration management with supplier collaboration in hybrid work setting.
Business Intelligence Developer creating and maintaining Power BI solutions for strategic decision - making. Collaborating with teams to develop scalable BI assets and optimize data reporting.
Drive design and delivery of scalable and secure AWS cloud infrastructure at Gartner. Lead automation and cloud strategy, ensuring operational excellence and mentoring junior engineers.
DevOps Engineer responsible for stable operations of infrastructure and software lifecycle in Collection Process Operations. Involvement in modernizing systems and continuous process automation.
Site Reliability Engineer improving reliability of cloud communications technology. Building monitoring solutions with a focus on operational readiness across Windows and Linux environments.
DevSecOps Engineer managing secure cloud infrastructure and automating CI/CD pipelines at CACI. Collaborating with teams to ensure compliance and implement security best practices.
Site Reliability Engineer developing resilient infrastructure for the Intelligence Community. Building redundancy, implementing monitoring tools, and automating tasks to improve systems.
DevSecOps & Platform Operations Lead designing and implementing cloud - native CI/CD pipelines for secure federal cloud modernization initiatives. Ensuring scalable, observable data platforms aligned with federal governance.