Site Reliability Engineer automating infrastructure and operations at DTEX Systems. Seeking candidates with strong software engineering background and experience in cloud environments.
Responsibilities
Design, write, and maintain software, primarily in Python, to automate the provisioning, deployment, and configuration management of our infrastructure
Contribute to the adoption and maturation of Terraform, establishing and maintaining best practices for state management, modularization, and version control
Utilize Ansible and/or Saltstack to ensure consistency, repeatability, and standardization across all environments
Develop robust CI/CD pipelines for both infrastructure and application deployments, replacing manual processes
Implement and mature monitoring, logging, and alerting systems to proactively improve system reliability
Participate in a “follow the sun” on-call rotation, focusing on sustainable incident response, blameless postmortems, and driving continuous improvement
Champion SRE principles, automation, and coding best practices within the team and across the organization
Requirements
3+ years of hands-on experience managing production environments in AWS and/or GCP.
Strong proficiency in Python.
Demonstrated ability to write clean, maintainable, and testable code to solve infrastructure problems.
Experience with Terraform, including best practices for state management and modular design in complex environments.
Strong knowledge of Linux internals and high competency in Bash scripting and command-line operations.
Proficiency with Ansible and/or Saltstack as configuration management tools.
Expert level understanding of Git and collaborative workflows, such as branching strategies and code review best practices.
MS/BS in Computer Science/Computer Engineering or related field of study (or equivalent experience).
Systems engineer focused on ML training infrastructure at OpenAI, building and maintaining large - scale model training systems. Collaborating with research teams to enable novel training approaches and improving infrastructure reliability.
Cloud Network Engineer focusing on Azure network architecture and Zscaler solutions at Packsize. Collaborating with IT and DevOps to enhance network security and performance.
Journeyman Infrastructure Engineer supporting the delivery and enhancement of enterprise data and analytics products. Working with government partners and teams on scalable, production - ready solutions.
Journeyman Infrastructure Engineer supporting DoD enterprise data and analytics program. Collaborating with teams to deliver scalable, production - ready IT solutions for national security.
Public Cloud Infrastructure Engineer at Lloyds Banking Group focused on scalable cloud services for developers. Assist in building secure automated cloud platform capabilities using modern infrastructure practices.
Infrastructure Engineer focusing on automation and platform enablement for data protection within the DLM team. Involves designing automated pipelines and transitioning to policy - as - code models in a hybrid working environment.
Cloud Infrastructure Engineer at Lead Forensics managing AWS infrastructure and working on hybrid platforms. Supporting internal operations and customer - facing services with a focus on security and performance.
IT Infrastructure Engineer maintaining diverse infrastructure for Arden University. Delivering IT vision, supporting students and staff with a high - performing technology environment.
Cloud Infrastructure Engineer focusing on building and maintaining OCI environments for AI/ML - enabled programs. Collaborating with Army personnel to integrate AI models into operational architecture.
Cloud Infrastructure Engineer building and securing environments for AI/ML model testing in DoD settings. Requires extensive experience in Cloud technologies and collaboration with government personnel.