DevOps Engineer improving reliability and stability of cloud services at Madhive. Responsibilities include CI/CD tooling, monitoring, and cloud infrastructure management.
Responsibilities
Improve the reliability and stability of Madhive’s cloud services, operating primarily in a mix of AWS and Google Cloud, with more of the latter.
Design, build, and maintain CI/CD tooling for Infrastructure as Code and internal services (GitHub Workflows, CloudBuild).
Develop and support monitoring, alerting, and observability systems to ensure platform health.
Automate deployment and management of cloud infrastructure using Terraform, Helm, and other IaC tooling.
Administer, monitor, and optimize databases to ensure performance, reliability, and availability.
Implement database backup, recovery, and scaling strategies to support large-scale distributed systems.
Enforce cloud security best practices (IAM, permissions, policies).
Identify opportunities to optimize cloud services and databases for efficiency and cost control.
Collaborate with cross-functional guilds to establish operational standards and reduce risk.
Stay current on emerging cloud and database technologies, evaluating for potential adoption.
Requirements
Strong understanding of cloud infrastructure, networking, containerization, and distributed systems (GCP preferred, AWS/Azure a plus).
Hands-on experience with Infrastructure as Code (Terraform or similar).
Proficiency with Bash and command-line utilities; Golang experience required (PHP/Python/JavaScript nice to have).
Experience with containerization and orchestration (Docker, Kubernetes).
Solid background in Database Administration: provisioning, scaling, tuning, monitoring, backup/recovery, and troubleshooting.
Familiarity with database performance optimization and observability tools.
Experience with monitoring systems (Google Cloud Monitoring Suite, Datadog, Cloudwatch, etc.).
Strong troubleshooting and problem-solving skills, with a systematic approach.
Excellent written and verbal communication skills; able to document and share best practices.
Comfortable in a fast-paced environment with a growth mindset and eagerness to learn.
Benefits
We embrace our differences and believe they fuel our creativity.
We come from varied backgrounds and think that’s important.
We are all trail-blazing team players who think big and want to make an impact.
We are committed to cultivating a culture of inclusion and collaboration.
We welcome diversity in education, culture, opinions, race, ethnicity, gender identity, veteran status, religion, disability, sexual orientation, and beliefs.
Site Reliability Engineer contributing to platform reliability at Trainline, Europe's leading rail ticketing platform. Collaborating with product engineering to ensure operational readiness and incident response.
Senior DevOps Analyst at Stefanini managing Azure DevOps for build and deploy automation. Collaborating with development squads and ensuring code quality with validation tools.
Senior DevOps Engineer leading design and management of CI/CD pipelines at Neuron7.ai. Collaborating on cloud infrastructure for scalable applications in an innovative tech environment.
Backend Software Engineer responsible for building robust backend systems for AI and analytics products. Collaborating with various teams to enhance platform reliability and performance.
Senior DevOps Engineer responsible for cloud ecosystem architecture at health - tech startup. Building HIPAA/GDPR - compliant foundations and mentoring developers.
Senior Backend Engineer building product features and maintaining infrastructure for insurance platform. Employing tools like Terraform, Kafka, Datadog and Qovery with a strong DevOps focus.
DevOps Systems Engineer supporting customer operations in Annapolis Junction, MD. Responsible for creating, sustaining, and troubleshooting complex operational data flows.
OpenShift Fresher assisting Cloud team in managing containerized applications using Red Hat OpenShift. Supporting CI/CD, deployment automation, and cloud - native application environments.
Site Reliability Engineer for Leidos ensuring reliability, performance, and scalability of complex distributed systems for the Navy - Marine Corps Intranet. Collaborating with teams to maintain and optimize network operations and services.
DevOps Engineer evolving banking infrastructure for a fintech company. Focusing on observability, incident response, and platform automation in a hybrid work setup.