Head of Platform/AI Cluster Management - System Integrator Job at Hamilton Barnes Associates Limited, San Francisco, CA

T3diVTI2eExkTVgrazZLUnNBY3hmRGdqaVE9PQ==
  • Hamilton Barnes Associates Limited
  • San Francisco, CA

Job Description

Ready to lead innovation at the intersection of platforms and artificial intelligence?

Join a pioneering technology company driving advancements in cloud, AI, and data-driven solutions across global markets. The organization is recognized for fostering innovation, scalability, and collaboration through cutting-edge platforms that empower enterprises to evolve intelligently.

The team is hiring a Head of Platform/AI Cluster Management to oversee the strategic development, integration, and optimization of AI and platform initiatives. The role will focus on leading cross-functional teams, enhancing performance and scalability, and aligning technology strategy with long-term business goals.

Shape the future of intelligent platforms and transformative innovation. Apply now!

Responsibilities

  • Own the scheduler/runtime layer (Slurm, Kubernetes, Ray), including multi-tenancy, quotas, and GPU/host fleet management.
  • Lead cluster operations across images, CI/CD, repair/health, performance/telemetry, and incident response.
  • Deliver platform services that ensure workload SLOs and reliable runtime execution.
  • Define and implement namespace/tenancy design, node health automation, golden images, admission controls, on-call runbooks, and go-live gates.
  • Collaborate closely with infra, SRE, and network teams to optimize workload placement and cluster efficiency.
  • Provide hands-on expertise in NCCL behaviours, placement strategies, and congestion signal management.

Requirements

  • Deep expertise in cluster management, scheduling, and runtime environments for large-scale compute.
  • Hands-on background with Slurm, Kubernetes, Ray, or similar orchestration platforms.
  • Strong understanding of NCCL performance tuning, workload isolation, and congestion management.
  • Experience scaling multi-tenant, GPU-heavy clusters with strict SLOs.
  • Ability to thrive in a startup environment with full ownership over platform and cluster strategy.

Salary

  • $500,000 gross per year (Negotiable)
#J-18808-Ljbffr

Job Tags

Similar Jobs

P&S Transportation

Recruiter Job at P&S Transportation

The ideal candidate will be comfortable meeting new people frequently and have an ability to determine a candidate's potential through clever questions. They should have excellent organizational skills in order to build and maintain a pipeline of prospective candidates...

International Leadership of Texas

Administrative Assistant Job at International Leadership of Texas

 ...Knowledge/Skills: 2+ years of experience as an office manager, administrative assistant or secretary preferred Knowledge of...  ...schedules interviews.Greets visitors and ascertains nature of business.Utilizes office technology and automation to complete tasks... 

Gomez Partners

Recruiter Job at Gomez Partners

 ...Recruiter & Client Manager The Opportunity We are looking for a highly motivated Recruiter & Client Manager who thrives in a fast-paced, relationship-driven environment. This is not a passive recruiting role. We need a true hunter someone who is energized by... 

LTYC, Inc.

CHIEF EXECUTIVE OFFICER (CEO) Job at LTYC, Inc.

 ...25 Industry Nonprofit Charitable Organizations State/Province Maryland Country United States Job Description CHIEF EXECUTIVE OFFICER (CEO) JOB POSTING This is a working CEO role for a creative, resourceful leader who can build, fund, and grow not... 

Fairfax Water

SAP S/4 HANA Business Systems Analyst Job at Fairfax Water

 ...The SAP Business Process Analyst III plays a key role in supporting and optimizing SAP S/4HANA Materials Management and Procurement processes. We are seeking a highly skilled Technical Business Process Analyst III with strong hands-on experience in SAP Procurement or...