Principal Engineer defining and driving the reliability strategy for Saviynt's critical SaaS platform. Shaping infrastructure design and operations for reliability at scale.
Responsibilities
Define and drive the reliability strategy for our SaaS platform
Shape how Saviynt designs, operates, and measures reliability at scale
Instrumental in designing, building, and maintaining the shared infrastructure services and platforms
Creating reusable, reliable, and scalable solutions that abstract away complexity
Design and build core platform components and shared infrastructure services
Architect, implement, and manage highly available and scalable Kubernetes platforms as a service
Develop robust, internal-facing tools and automation for infrastructure provisioning and management primarily using Go (Golang)
Architect and optimize foundational solutions within Cloud environments (AWS, Azure, etc.)
Design and implement shared Event-Driven Architecture components and messaging platforms
Develop and maintain robust CI/CD pipelines (e.g., GitLab CI and ArgoCD) as a service
Design and build resilient Distributed Systems components
Manage and optimize shared infrastructure across Multi-Region Cloud Environments
Establish and enhance centralized Observability and Monitoring platforms and tools
Define and implement clear, well-documented RESTful API designs
Implement and manage Service Mesh capabilities
Design, implement, and optimize highly available Relational Database services
Collaborate closely with product development teams
Participate in on-call rotations to support the critical shared infrastructure
Requirements
9+ years of experience in an Infrastructure Development, Platform Engineering, or Site Reliability Engineering role, with a strong focus on building tools and services for other engineers
Deep expertise with Kubernetes in production environments, particularly in providing it as a platform(i.e single tenant and multi-tenant deployment architectures)
Strong programming skills in Go (Golang) and Python, with experience building robust, maintainable backend services and automation
Extensive hands-on experience with at least one major Cloud Provider (AWS, GCP, or Azure); multi-cloud experience is a strong plus, especially in building abstractions over them
Proven experience designing and implementing Event-Driven Architecture and message queuing systems (e.g., Kafka, RMQ, NATS) as shared services
Solid understanding and practical experience with CI/CD pipeline tools (especially GitLab CI) and experience establishing automated delivery processes for other teams
Demonstrable experience designing and operating Distributed Systems, with an understanding of patterns for creating reliable, shared components
Familiarity with Multi-Region Cloud Environments and strategies for building globally distributed and highly available platform
Proficiency in establishing and utilizing comprehensive Observability and Monitoring platforms (e.g., Prometheus, Grafana, ELK stack, Datadog) for shared infrastructure
Strong experience with RESTful API design principles and building well-documented, consumable APIs
Knowledge of Service Mesh concepts and practical experience with solutions like Istio in a platform context
Hands-on experience with Relational Databases (e.g., MySQL, PostgresSQL), ideally in managing them as a service
Excellent communication skills and the ability to clearly articulate complex technical concepts to both technical and non-technical audiences
A strong customer-centric mindset, treating internal development teams as your primary customers
Bachelor's degree in Computer Science, Engineering, or a related field, or equivalent practical experience or equivalent military experience required.
Benefits
Competitive compensation, benefits, and growth opportunities
Fullstack Developer contributing to innovative digital products with a focus on collaboration. Combining front - end and back - end expertise to enhance user experiences in Québec, Canada.
Senior Director of Software Engineering leading HR Tech initiatives at Capital One. Focused on building best - in - class HR platforms and driving modernization for a Fortune 100 company.
Backend Software Engineer joining Abnormal Security to develop scalable infrastructure for cybersecurity. Building platforms that drive business growth and enhance development velocity.
Senior Software Engineer solving business challenges with technology and collaboration. Join a mission - driven organization as part of a passionate team in a hybrid workplace.
Lead Software Engineer developing custom solutions for enterprise - level applications. Focus on cloud technologies and delivering projects in an agile, people - first way.
Staff Software Engineer leading full - stack initiatives at TELUS Digital. Design, build, and maintain end - to - end features using modern technologies and collaborate with global teams.
Senior Fullstack Engineer driving features from idea to production at reteach. Collaborating with product and design while building scalable backend and frontend solutions.
Senior Engineer responsible for structural design and data migration of tower cranes. Collaborating with teams and ensuring compliance with engineering standards in Pune, India.
Engineering Lead for AlignOps, developing cloud - based solutions for the construction industry. Leading a team with strong expertise in Node.js, TypeScript, and cloud services.