Reliability Engineer ensuring the stability and reliability of mission-critical workflows built on Palantir software. Engaging with issues, driving product change, and refining operational processes.
Responsibilities
Ensure stability and reliability of mission-critical workflows built on Palantir software
Gather signal by going on call — resolving problems before the customer is impacted
Drive product change, shape internal tooling, and refine operational processes
Rapidly address issues as they arise with quick and effective solutions
Advocate for workflow or product improvements after immediate issues are resolved
Engage directly with problems, simplify, automate, and enhance system resilience
Synthesize learnings from support into best practices and clear documentation
Requirements
Background in Computer Science, Engineering, Information Systems, or other technical field.
Ability to work independently and collaboratively to solve ambiguous technical and operational challenges
Excellent written and verbal communication skills, capable of interacting effectively with both technical and non-technical stakeholders.
Proficiency in Python, Java, and SQL
Familiarity with parallel data processing and Spark job optimization
Strong organizational skills and attention to detail, with the ability to prioritize effectively
Resourcefulness and creativity in fast-paced dynamic environments
Experience with root cause analysis and documenting solutions for broader impact
Enthusiasm for hands-on problem solving, continuous improvement, and knowledge sharing
Benefits
Employees (and their eligible dependents) can enroll in medical, dental, and vision insurance as well as voluntary life insurance
Employees are automatically covered by Palantir’s basic life, AD&D and disability insurance
Commuter benefits
Take what you need paid time off, not accrual based
2 weeks paid time off built into the end of each year (subject to team and business needs)
10 paid holidays throughout the calendar year
Supportive leave of absence program including time off for military service and medical events
Paid leave for new parents and subsidized back-up care for all parents
Fertility and family building benefits including but not limited to adoption, surrogacy, and preservation
Stipend to help with expenses that come with a new child
DevOps Project Manager for Baker Hughes leading digital transformation projects. Collaborating with stakeholders, managing vendors, and ensuring operational readiness in energy technology.
Senior DevOps Engineer optimizing CI/CD pipelines and enhancing machine learning frameworks for NVIDIA's Vision AI platform. Collaborating with cross - functional teams for efficient software delivery.
Site Reliability Engineer at healthcare startup Heidi. Improving operational reliability and collaborating with engineers in a hybrid work environment.
DevOps Engineer standardizing deployments and managing cloud infrastructure for gaming and fintech solutions. Leading system design and cloud - based EGM management solutions in Warsaw, Poland.
DevOps Engineer operating Kubernetes environments and AWS platforms for internal teams in Germany. Contributing to cloud - native initiatives with a focus on automation and security.
Ensure stability and reliability of mission - critical workflows in Palantir's software. Collaborate with teams to improve scalability and efficiency in technical operations.
Engineer ensuring stability and reliability of workflows built on Palantir software. Engage with problems directly and advocate for product enhancements.
DevSecOps Engineer responsible for building secure CI/CD pipelines in cybersecurity. Collaborating with teams to enhance security of software development processes while working in a hybrid environment.
Sr. Site Reliability Engineer III delivering technical solutions within the highest levels of federal government. Collaborating in a high - performing team with a focus on mission - critical application workloads.