Product Reliability Engineer ensuring the health and performance of services at Palantir. Responsible for troubleshooting and improving service reliability across key customer issues.
Responsibilities
Continuously invest in documentation, metrics, monitors and other troubleshooting tools
Participate in on-call rotations during business hours and occasional weekends. This is a challenging yet rewarding opportunity to help remediate the most pressing issues across the Palantir fleet.
Diagnose, resolve, and prevent issues encountered in the field. Deliver end-to-end improvements to core products based on these issues you encounter in the field.
Improve observability by refactoring codepaths and introducing telemetry
Identify and implement data-driven opportunities for improved service resilience
Develop strategic opinions on stability investments and inform the vision for long-term product stability
Requirements
Engineering background in Computer Science, Mathematics, Software Engineering, Physics or similar field
Ability to work with a high degree of ownership and a strong sense of urgency in a dynamic environment
Experience producing code in backend languages such as Java, as part of a past role or personal projects
Familiarity with storage and data processing systems and cloud infrastructure
Strong written and verbal communication and ability to iterate quickly with teammates and incorporate feedback
Eligibility and willingness to obtain a US Security clearance
Benefits
Employees (and their eligible dependents) can enroll in medical, dental, and vision insurance as well as voluntary life insurance
Employees are automatically covered by Palantir’s basic life, AD&D and disability insurance
Commuter benefits
Take what you need paid time off, not accrual based
2 weeks paid time off built into the end of each year (subject to team and business needs)
10 paid holidays throughout the calendar year
Supportive leave of absence program including time off for military service and medical events
Paid leave for new parents and subsidized back-up care for all parents
Fertility and family building benefits including but not limited to adoption, surrogacy, and preservation
Stipend to help with expenses that come with a new child
Lead Cybersecurity Analyst specializing in Cloud Security Engineering and DevSecOps at FIS, focused on secure cloud adoption and integrations with development teams.
AWS DevOps Engineer at Bank of America designing, building, and maintaining cloud infrastructure. Collaborating with teams to deliver robust, secure, scalable AWS environments.
Senior Reliability Engineer managing reliability for EV powertrain systems at Zoox. Collaborating with a diverse engineering team to ensure vehicle performance and safety.
DevOps Engineer developing and managing container platforms at Booz Allen. Utilizing cloud technologies to solve client challenges and improve environments while ensuring secure adoption of containers.
Senior Director of DevOps at HUMAN Security leading global teams and modernizing infrastructure for high - scale environments. Responsible for developing strategy and ensuring operational excellence across products.
Manage the DevOps team to deliver reliable internet - scale infrastructure at HUMAN Security. Solve problems related to fraud defense and enhance product capabilities for security researchers.
Senior DevOps Engineer designing deployment systems and overseeing IT projects for PROCITEC. Collaborating in a team - focused environment to deliver innovative technology solutions.
Reliability Engineer I responsible for conducting product inventories at customer locations for Regal Rexnord. Managing workflows and mentoring new engineers while adhering to safety protocols in hybrid work setting.
Senior Manager of Site Reliability Engineering at Insulet overseeing SRE practices and team leadership to enhance system reliability. Driving automation, incident response, and partnership across engineering and product teams.
DevOps Engineer responsible for designing and supporting CI/CD pipelines for Xumo. Collaborating with teams to enhance cloud infrastructure for video streaming services.