Senior DevOps Engineer building and operating developer platforms for reliable production shipping at Demandbase in Hyderabad. Focused on improving developer experience and cloud infrastructure.
Responsibilities
Build and operate the platforms, tooling, and workflows that enable engineers to ship reliably to production.
Partner with software, data, and security engineering teams to identify friction across the software delivery lifecycle and address it through automation, platform abstractions, and improved workflows.
Design and evolve developer-facing platforms and tooling that standardize how services and pipelines are built, deployed, and operated.
Enable self-service workflows with opinionated defaults that improve reliability, security, and consistency without slowing teams down.
Use developer feedback, operational data, and production signals to prioritize and drive the DevEx roadmap.
Design, build, and maintain CI/CD orchestration that supports high release velocity, strong security guardrails, and local-to-production parity, preferably using GitLab CI/CD.
Standardize build, test, and deployment patterns across application and data workloads.
Support modern deployment strategies and GitOps-based workflows.
Build, operate, and evolve Kubernetes-based platforms across AWS and GCP, including EKS and GKE.
Enable teams to run workloads on Kubernetes by providing clear operational guardrails, platform defaults, and documented best practices.
Manage multi-account cloud environments with a focus on security, scalability, and ease of use.
Design and maintain infrastructure using Infrastructure as Code, including Terraform and Crossplane.
Build and operate internal platform components such as GitOps tooling, secret management systems, and service mesh infrastructure.
Operate and evolve observability platforms (e.g., Prometheus, Mimir, Thanos, Grafana, Datadog) to provide actionable signals for platform and application teams.
Define and apply SLIs, SLOs, alerting strategies, and incident response practices.
Lead and participate in blameless post-mortems, translating learnings into platform improvements and reduced operational toil.
Support engineering teams running data pipelines and batch workloads on platforms such as Airflow, EMR, and Dataproc.
Standardize deployment, observability, and operational patterns for data workloads.
Improve reliability and operability of data platforms through shared tooling and best practices.
Serve as a technical leader within DevEx, promoting best practices in platform engineering, reliability, and secure software delivery.
Mentor engineers and influence teams through strong technical design, documentation, and collaboration.
Drive adoption of internal platforms through strong defaults, clear documentation, and self-service tooling.
Requirements
8+ years of overall engineering experience, including hands-on software development and cloud infrastructure ownership.
Strong software engineering fundamentals with experience in at least one general-purpose programming language (e.g., Go, Python, Java).
5+ years of experience building and operating cloud infrastructure on AWS and/or GCP at scale.
Proven experience managing multi-account cloud environments, including IAM, networking, and security best practices.
Strong proficiency with Infrastructure as Code, preferably Terraform and Crossplane.
Extensive experience operating Kubernetes platforms in production, including EKS and/or GKE.
Experience managing multiple Kubernetes clusters, including upgrades, networking, and security.
Hands-on experience with service mesh technologies such as Istio in multi-cluster environments.
Deep experience designing and operating CI/CD systems that support high release velocity, preferably GitLab CI/CD.
Experience building developer-facing tooling that improves local-to-production parity and reduces cognitive load.
Familiarity with GitOps practices and modern deployment strategies.
Experience supporting data platforms such as Airflow, EMR, and Dataproc.
Strong experience building and operating observability platforms including Prometheus, Mimir, Thanos, Grafana, and Datadog.
Solid understanding of SLIs, SLOs, alerting, and incident response.
Demonstrated ability to partner with engineering teams to identify pain points and improve developer experience.
Strong communication skills, including experience participating in or leading blameless post-mortems.
Benefits
Group Medical
Personal Accident
Term Life Insurance
Preventive healthcare including dental, vision, and OPD needs
Senior DevOps Engineer at Orion focusing on AWS cloud solutions and mentoring teammates. Collaborate to create efficient release strategies and ensure reliable application health in cloud environments.
Cloud Platform DevOps Engineer specializing in Google Cloud Platform networking. Pivotal role in developing and enhancing networking foundations for transformation within Lloyds Banking Group.
DBRE responsible for database reliability and performance in Data Mesh at FMX. Collaborating with product and engineering teams to implement automation and data governance strategies.
Release Engineer managing end - to - end lifecycle of software deployments at CrowdStrike. Focused on building automated release processes that ensure quality across environments.
Mainframe DevOps role focusing on data management and service delivery for Commerzbank. Join a customer - centric team dedicated to a data - driven enterprise.
Senior DevOps Engineer working on CI/CD setup, deployment security, and database maintenance for Bundesdruckerei GmbH. Collaborating on innovative secure digital solutions in Berlin.
Site Reliability Engineer operating on Confluent Cloud for government clients. Ensuring system reliability and compliance with FedRAMP standards in a hybrid working model.
Site Reliability Engineer at Plenful maintaining system performance and reliability. Collaborating with teams to improve operations and ensure system stability in a fast - paced environment.
Senior Site Reliability Engineer at LexisNexis working on cloud data applications and microservices. Collaborating within teams to enhance system reliability and automate recovery processes.