Senior Systems Engineer responsible for HPC cluster management and optimization at Rackspace. Collaborating with scientists and handling technical support for high-performance computing.
Responsibilities
Install, configure, and maintain HPC clusters (hardware, software, operating systems).
Perform regular updates/patching and manage user accounts and permissions.
Troubleshoot/resolve hardware or software issues.
Monitor and analyze system and application performance, identify bottlenecks and implement tuning solutions.
Manage job scheduling and resource allocation using tools such as Slurm, LSF, Bright Cluster Manager, OpenHPC, and Warewulf.
Configure Linux networking (TCP/IP, DNS, routing) and HPC interconnects (InfiniBand, Ethernet).
Implement and maintain large-scale storage and parallel file systems (Lustre, Ceph, GPFS) ensuring data integrity and managing backups.
Implement security controls and manage authentication services like LDAP and Active Directory.
Automate deployments and system configurations using tools like Ansible, Terraform, Jenkins, and Git.
Provide technical support, documentation, and training to researchers and collaborate with scientists and HPC architects.
Requirements
Bachelor’s degree in Computer Science, Engineering, or a related field (equivalent experience may substitute for degree).
Minimum of 10 years of systems experience, including at least 5 years working specifically with HPC.
Strong knowledge of Linux operating systems (e.g., Rocky Linux, Ubuntu) with a fundamental understanding of Linux internals, system administration, and performance tuning.
Experience building and managing RPM and DEB packages.
Experience with cluster management tools such as Bright Cluster Manager, OpenHPC stack, or Warewulf.
Proficiency with job schedulers and resource managers such as Slurm and LSF.
Strong understanding of Linux networking (e.g., TCP/IP, DNS, routing) and HPC interconnects (e.g., InfiniBand, Ethernet) including performance tuning.
Knowledge of parallel file systems such as Lustre, Ceph, or GPFS.
Working knowledge of Linux authentication and directory services such as LDAP and Active Directory.
Proficiency in scripting languages (e.g., Python, Bash, R) and familiarity with MPI libraries for parallel and distributed computing (nice to have).
Strong experience with DevOps and configuration management tools, including Ansible, Terraform, Jenkins, and Git.
Knowledge of HPC in cloud environments (e.g., AWS, Azure, GCP HPC offerings) is a plus.
Strong knowledge of Linux security, compliance standards, and data protection best practices.
Excellent communication, interpersonal, and problem-solving skills.
Payroll Analyst supporting payroll operations and system enhancements at Old Republic Title. Serving as a subject matter expert for UKG BI reporting, payroll system upgrades, and process improvement initiatives.
As an Application Technician, responsible for the functioning of security management systems for clients in Berlin or Leipzig. Collaborate closely with teams and clients for successful implementations.
Application Technician responsible for commissioning danger management systems in a security solutions company. Working closely with clients and project teams to ensure successful implementation and operation.
Propulsion Systems Engineer working on designing and operating rocket propulsion systems. Collaborating with NASA to execute propulsion system development and testing in human space exploration.
Senior Controls Systems Engineer applying technology and control theory for HVAC systems at Johnson Controls. Responsibilities include implementation and testing of control applications for Building Automation systems.
Application technician responsible for implementing safety management systems and providing client support. Collaborating with technicians and project leaders for troubleshooting and successful operation.
Application Technician managing safety management systems for Funkwerk Security Solutions in Leipzig. Responsible for system installation, parameterization, and client support with technical challenges.
Systems Engineer I designing, implementing, and managing complex systems at Honeywell. Collaborating with cross - functional teams to enhance operational efficiency and maintain quality standards.
Vehicle Cyber Security Systems Engineer at Ford Motor Company influencing product security. Collaborate across teams to safeguard automotive technology and compliance.
ADAS Systems Engineer at Ford leading development of advanced driver assistance systems. Supporting design and optimization of automotive technology solutions for cutting - edge applications.