Product Reliability Engineer at Palantir improving service stability and reliability through technical troubleshooting and solutions. Working on infrastructure migrations and core product enhancements.
Responsibilities
Continuously invest in documentation, metrics, monitors and other troubleshooting tools
Participate in on-call rotations during business hours and occasional weekends. This is a challenging yet rewarding opportunity to help remediate the most pressing issues across the Palantir fleet.
Diagnose, resolve, and prevent issues encountered in the field. Deliver end-to-end improvements to core products based on these issues you encounter in the field.
Improve observability by refactoring codepaths and introducing telemetry
Identify and implement data-driven opportunities for improved service resilience
Develop strategic opinions on stability investments and inform the vision for long-term product stability
Requirements
Engineering background in Computer Science, Mathematics, Software Engineering, Physics or similar field
Ability to work with a high degree of ownership and a strong sense of urgency in a dynamic environment
Experience producing code in backend languages such as Java, as part of a past role or personal projects
Familiarity with storage and data processing systems and cloud infrastructure
Strong written and verbal communication and ability to iterate quickly with teammates and incorporate feedback
Eligibility and willingness to obtain a US Security clearance
Benefits
Employees (and their eligible dependents) can enroll in medical, dental, and vision insurance as well as voluntary life insurance
Employees are automatically covered by Palantir’s basic life, AD&D and disability insurance
Commuter benefits
Take what you need paid time off, not accrual based
2 weeks paid time off built into the end of each year (subject to team and business needs)
10 paid holidays throughout the calendar year
Supportive leave of absence program including time off for military service and medical events
Paid leave for new parents and subsidized back-up care for all parents
Fertility and family building benefits including but not limited to adoption, surrogacy, and preservation
Stipend to help with expenses that come with a new child
Cloud Engineer at Agility Technologies leading the design of scalable eLearning infrastructure. Collaborating on technical design and implementation involving cloud - based platforms and secure integrations.
Senior Hardware Reliability Engineer overseeing reliability testing and analysis of outdoor electronic assemblies at Gridware. Collaborating with mechanical engineers and contributing to product lifetimes modeling.
Senior Manager leading SRE, Virtualization, Networking, and AI Infrastructure teams at F5. Overseeing mission - critical infrastructure and driving operational excellence across hybrid compute environments.
Senior Software Release Engineer managing software release trains at GM. Owning integration activities and defining software release scopes with a focus on collaboration with suppliers.
Software Release Engineer managing VCU and CCU software release trains for automotive solutions. Overseeing release readiness, integration, and building processes for embedded software.
Senior DevOps Engineer at Broadridge developing fully automated pipelines for Python applications. Collaborating on LTX Trading applications with a focus on cloud infrastructure and deployment automation.
DevOps Azure Developer specializing in end - to - end application development with Python, Azure, and CI/CD practices at Abbott. Involves collaborative environments and building secure cloud applications.
Release Engineer enhancing end - to - end build and deployment pipelines for Ironclad's AI contracting platform. Collaborating with Engineering, QE, and Product teams to manage releases and deployment processes.
DevOps Engineer focused on CI/CD and cloud operations for a leading financial services client. Ensure high - quality, automated deployments and promote DevOps practices within the team.
DevOps Engineer maintaining cloud infrastructure and automation for clinical trials at Teckro. Collaborating with development and operations teams to optimize performance and ensure system reliability.