Senior System Administrator specializing in systems management and HPC at Mila, enabling cutting-edge AI research infrastructure. Collaborating with researchers and overseeing operational readiness of computing systems.
Responsibilities
**Your key responsibilities**
Ensure the operational readiness and ongoing maintenance of the compute infrastructure;
Identify and resolve performance and operational issues in hardware and software components;
Contribute to the development and evolution of infrastructure-as-code (IaC) and automation tools;
Participate in the architecture and implementation of infrastructure projects;
Serve as a technical authority on advanced High Performance Computing (HPC) topics;
Assist in producing technical specifications for tenders and sole-source procurements; evaluate submissions and recommend equipment selection;
Requirements
**Qualifications - Requirements**
University degree in a relevant discipline;
Minimum of 10 years' experience in a system administrator role;
Experience managing compute clusters for High Performance Computing (HPC);
Strong expertise in Linux;
Experience in IT security;
Experience implementing automated server provisioning, performing security audits, and automating tasks (Ansible);
Expertise with the Slurm workload manager;
Expertise with parallel storage systems;
Expertise in networking for HPC, including InfiniBand;
Strong knowledge of compute hardware and GPU accelerators;
Good knowledge of version control tools (git);
Experience implementing high-performance infrastructure solutions and managing projects with organization-wide impact;
Experience and knowledge of virtualization, backup systems, storage networking technologies, and network/server management and monitoring;
Experience managing data centers and implementing high-availability solutions;
Bilingual in French and English due to regular interactions with partners, stakeholders, or members of our anglophone academic community.
Benefits
**Good reasons to work at Mila**
The opportunity to contribute to a unique mission with significant impact;
Comprehensive group insurance (health, dental, disability, life, travel insurance, and additional coverages);
Employee and family assistance program;
Access to telemedicine services;
Paid time off policy offering a base of 20 vacation days from hire;
Retirement savings plan with a 4% employer contribution;
A generous flexible benefits allowance allowing you to tailor your benefits to what contributes to your well-being. You can choose and combine options such as lifestyle credits, enhanced insurance, additional vacation days, and an increased employer contribution to the retirement plan;
Flexible working hours, a summer schedule, and remote work options;
A workplace in the heart of Little Italy, in the trendy Mile-Ex district, close to public transit;
A team of domain experts—passionate and engaging people;
Systems Administrator managing support for various operating systems at American Systems. Leading backup and restore efforts, security management, and end - user assistance across platforms.
Systems Administrator lead support for VMware, Storage, and appliance - based systems at AMERICAN SYSTEMS. Responsible for backup support, security, and user assistance during escalated incidents.
Systems Administrator at AMERICAN SYSTEMS managing various operating systems and backup support tasks in a multi - platform environment. Requires expertise in VMware, SAN Management, and complex troubleshooting.
Systems Administrator III responsible for supporting various operating systems and backup solutions at AMERICAN SYSTEMS. Ensuring compliance, troubleshooting, and managing security protocols.
System Administrator leading support for various operating systems including VMware and Storage at AMERICAN SYSTEMS. Responsible for backup support, security, and complex hardware issue troubleshooting.
Internal IT System Administrator joining a tech group to solve problems for clients like Google and PayPal. Responsible for service desk management and technical escalation in hybrid work setup.
Senior IT Systems Administrator leading Aukera's IT operations and Microsoft technologies. Collaborating with international teams to enhance IT infrastructure and support services.
Sr. Virtualization Systems Administrator at CACI managing vSphere environment and supporting DGS modernization. Collaborating with teams to ensure system performance and resolving technical issues.
GPS Systems Administrator supporting the Global Positioning System at Cape Canaveral, FL. Involved in system integration, hardware installation, and cybersecurity compliance.
Systems Administrator managing server infrastructure for banking clients in Alabama and Georgia. Focused on deploying technologies and providing server support and maintenance.