Engineer focusing on site reliability and cloud operations for the leader in payment tokenization. Seeking motivated individuals to improve system reliability and infrastructure deployment.
Responsibilities
Architect and maintain scalable, reliable infrastructure: Design and optimize infrastructure for high availability, fault tolerance, and performance across distributed systems.
Lead incident management and root cause analysis: Own incident response processes, ensure swift resolution of issues, and drive post-incident improvements to prevent recurrences.
Service monitoring and automation: Build and maintain automated monitoring, alerting, and healing systems that improve system health, reduce manual intervention, and minimize downtime.
Performance tuning and capacity planning: Identify bottlenecks and optimization opportunities, and implement scaling strategies to handle traffic spikes and growing workloads efficiently.
Collaborate with cross-functional teams: Work closely with software engineers, product teams, and DevOps to enhance system reliability and delivery pipelines.
Improve operational processes: Champion continuous improvement initiatives in deployment, scaling, and performance testing, while advocating for the adoption of SRE best practices across the organization.
Mentorship and leadership: Provide technical mentorship to junior engineers, contribute to strategic decisions around infrastructure, and ensure best practices are implemented at scale.
Be proactive and innovative: we rely on your feedback to build a world-class product.
Be a part of a team that believes in the core values of transparency, collaboration, grit, and humility; in going above and beyond what is required to do the right thing for our customers and the company; and in having fun while doing all this!
Requirements
Proven experience in Infrastructure/SRE roles, with a track record of managing production systems in complex, large-scale environments.
Strong proficiency in AWS, including infrastructure-as-code (Terraform, CloudFormation, etc.).
Solid understanding of cloud-native architecture, Linux Systems, microservices, Infrastructure-as-code (Terraform, CloudFormation, CDK), CI/CD (CircleCI, GitHub Actions, Argo), GitOps, Authentication and Authorization, APIs and API Gateway, Docker, Kubernetes (EKS), Kafka (MSK), Java, Spring Framework, Python, and AWS services.
Strong plus if you are a database wiz.
Expertise in monitoring and observability tools like Prometheus, Grafana, Open Telemetry, New Relic, or similar tools to measure system health and performance.
Programming and scripting experience in languages such as Python, Go, Bash, or other relevant languages used in automating infrastructure.
Solid understanding of networking, security, and load balancing in cloud-native environments.
Strong communication and collaboration skills, with the ability to lead cross-functional initiatives and mentor junior team members.
Experience with incident management and disaster recovery best practices.
Strong written and verbal communication skills.
Benefits
Flexible work hours and flexible PTO
Competitive health benefits
VGS stock options
401k plan, with employer matching 4% and immediate vesting (available only for US employees)
Life & disability insurance
Pre-tax flexible spending accounts, dependent and healthcare FSA (available only for US employees)
Lead Infrastructure Engineer designing secure automation infrastructure for GE Vernova's digital transformation in utility operations. Collaborate with architects to develop reusable IT solutions.
Infrastructure Engineer managing VMware Server Infrastructure for CMA CGM in the UK. Providing L2/L3 support and ensuring smooth IT operations across client environments.
Infrastructure Engineer responsible for IT infrastructure maintenance and user support. Join One Beyond's innovative team to enhance system reliability and performance while working flexibly.
Infrastructure Engineer optimizing cloud infrastructure and costs for blockchain analytics. Join the Core Platform team at Elliptic driving efficiency and scalability.
Software Engineer building infrastructure for Benchling’s biotechnology R&D Cloud platform. Collaborate to enhance developer experience and ensure operational reliability in regulated environments.
Core Infrastructure Engineer building and maintaining core infrastructure for fintech startup YOVO. Engineering high - performance systems for checkout, subscription, and hosting platforms in a hybrid environment.
Senior/Staff Cloud Infrastructure Engineer designing and operating scalable AWS infrastructure for EVENTIM’s ticketing platform. Collaborating in an agile environment and applying modern DevOps practices.
IT Infrastructure Architect Team Leader at QUALCO supervising IT Architects and senior engineers. Overseeing design of IT infrastructures and managing 3rd level operations in financial solutions.
Infrastructure Engineer developing and maintaining AWS and SQL Server environments at an IT consulting firm. Working directly with clients to troubleshoot infrastructure issues and improve systems.