Monitor and maintain system performance to ensure the stability and reliability of applications and infrastructure.
Troubleshoot and resolve issues related to database performance, network connectivity, and deployment failures, including diagnosing problems at the underlying platform level (e.g., Kubernetes, virtual machines).
Ensure that issues are resolved within the stipulated Service Level Agreements (SLAs), maintaining high standards of service delivery.
Identify and address performance bottlenecks in applications and infrastructure.
Conduct root cause analysis for recurring incidents to develop long-term solutions.
Improve monitoring solutions to proactively identify and mitigate issues before they impact services.
Assist in the deployment and configuration of new applications and services, ensuring adherence to best practices.
Develop and maintain scripts for automation of routine tasks and monitoring processes.
Participate in on-call rotations and respond to critical incidents as they arise.
Analyze system logs and metrics to identify trends and potential areas for improvement.
Assist in capacity planning and performance tuning to ensure optimal resource utilization.
Requirements
Strong expertise in Linux system administration.
Proven experience in troubleshooting application support issues with a focus on performance and connectivity.
Solid understanding of database management and performance tuning.
Hands-on experience with Kubernetes and virtual machines.
Ability to diagnose and resolve complex technical issues across compute, storage, network, and database components.
Strong analytical skills and intellectual curiosity; able to question existing processes and understand their implications.
Self-motivated learner who can operate autonomously with minimal guidance.
Excellent problem-solving abilities and a proactive approach to identifying and addressing challenges.
DevOps Product Manager working on complex platform and infrastructure projects. Consulting on DevOps best practices and ensuring scalable, efficient digital ecosystems for clients.
Site Reliability Engineer optimizing large - scale Linux environments at Bumble Inc. Troubleshooting incidents and driving performance improvements on platforms such as Kafka and Kubernetes.
Senior DevOps Engineer at mylo, managing multi - cloud infrastructure and CI/CD pipelines. Promoting DevOps culture while ensuring compliance and automating system maintenance.
Lead Site Reliability Engineer at S&P Global's Cloud Engineering team. Responsible for designing and maintaining cloud infrastructure and ensuring the performance of cloud - based systems.
Site Reliability Engineer responsible for monitoring and improving the reliability of satellite operations infrastructure. Collaborating with teams to automate processes in a dynamic environment.
DevOps Analyst providing high quality and reliable solutions within multifuncional teams at technology - focused financial organization. Automating build and deployment solutions in a hybrid work environment.
Network & Datacenter Deployment Engineer at Cloudflare focused on building and expanding their global network infrastructure with collaboration across multiple engineering teams and vendors.
Senior DevOps Engineer leading cloud - native solutions at Sparksoft Corporation. Driving automation and system reliability within a fast - paced Agile team.