> ## Documentation Index
> Fetch the complete documentation index at: https://docs.ornn.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Connect to a managed Slurm cluster

> Launch a managed Slurm cluster on your reserved Ornn GPUs, connect over SSH with your account keys, and submit your first GPU job.

A managed Slurm cluster turns the GPU nodes you already reserved into a Slurm scheduler you reach over SSH. Ornn runs the scheduler and login node; you submit batch and distributed jobs with the standard Slurm tools. Logins use the SSH keys saved on your account.

## When to use Slurm

* You run batch jobs or distributed training and want a queue and scheduler.
* Your team shares a pool of GPUs and needs fair scheduling across jobs.
* You already have `sbatch` job scripts.

<Note>
  Prefer containers and orchestration? Use [Kubernetes access](/guides/kubernetes-access) instead. Want a single host you SSH into? See [VM](/guides/vm-access) or [Bare Metal](/guides/bare-metal-access).
</Note>

## Prerequisites

* The bid is promoted to a confirmed reservation that is visible in your portfolio.
* **Checkout and payment** for the reservation are complete.
* Your account has **at least one active SSH public key** registered — Slurm logins use your account keys. See [Manage SSH keys](/guides/ssh-keys).

## Launch the cluster

<Steps>
  <Step title="Open Clusters">
    From the console, go to **Clusters** (`/clusters`) and start a new cluster, or open your reservation in [Portfolio](/guides/portfolio-overview) and choose the **Slurm** access mode.
  </Step>

  <Step title="Choose Slurm and a network mode">
    Pick **Slurm**, then a network mode:

    * **Public** — the SSH login is reachable over the internet.
    * **Private** — the SSH login is reachable only over your reservation's WireGuard VPN.
  </Step>

  <Step title="Launch">
    Launch the cluster. The first launch after an enrollment takes a few extra minutes while the GPU images download. The scheduler starts once every node is GPU-ready; your SSH login details appear on the manage page when it's ready.
  </Step>
</Steps>

## Connect over SSH

When the scheduler is ready, the cluster panel shows your **SSH login** host and port, a ready-to-run **Connect** command, and the login **host key** so you can verify it on first connect. Copy those values and connect:

```bash theme={null}
ssh <login-user>@<login-host> -p <login-port>
```

## Submit a job

On the login node, inspect the cluster and claim GPUs with the standard Slurm tools:

```bash theme={null}
sinfo                          # partitions and node states
sacct -n -X --format=JobID,State,Elapsed

srun -N1 hostname              # single-node smoke; use -N<node-count> to span nodes
ssh <node-name> hostname       # pick a node name shown by sinfo or srun

srun --gpus=1 nvidia-smi       # grab 1 GPU and print the GPU table

sbatch my-job.sh               # submit a batch job
squeue --me                    # watch your queued and running jobs
```

A minimal GPU `sbatch` script:

```bash theme={null}
#!/bin/bash
#SBATCH --job-name=train
#SBATCH --gpus=1
#SBATCH --output=train-%j.out

srun python train.py
```

## Private clusters: bring up WireGuard first

Private clusters expose nothing publicly — the SSH login is reachable only over your reservation's WireGuard tunnel. Set it up once:

```bash theme={null}
# 1. Generate a WireGuard keypair locally.
wg genkey | tee privatekey | wg pubkey

# 2. Paste the printed PUBLIC key into the cluster panel's Connect section and
#    generate a config.
# 3. Save the returned config (fill in your private key) and bring the tunnel up.
sudo wg-quick up ./ornn-wg.conf
```

Once the tunnel is up, the SSH login above works unchanged.

## Tear down

Tearing down stops the scheduler, revokes SSH login access, drains the workers, and returns the GPU nodes to your reservation. A torn-down or failed cluster relaunches in place.

<Warning>
  Teardown interrupts running jobs and can't be undone. Persist checkpoints and data to your own object storage first.
</Warning>

## Troubleshooting

<AccordionGroup>
  <Accordion title="Permission denied (publickey)">
    Slurm logins use your account SSH keys. Confirm the private key your SSH client uses matches a key registered on your account, and that the cluster has finished issuing login access (the panel shows the SSH login once it's ready).
  </Accordion>

  <Accordion title="No SSH login details yet">
    The scheduler starts after every node reports GPU-ready. Wait for the manage page to show the **SSH login** host and port; the first launch is paced by the GPU image pull.
  </Accordion>

  <Accordion title="Host key changed warning">
    If you reconnect after a relaunch, remove the old entry and reconnect: `ssh-keygen -R <login-host>`. Verify the new key against the host key shown on the cluster panel.
  </Accordion>

  <Accordion title="A private cluster won't connect">
    Make sure the WireGuard tunnel is up (`sudo wg show`) before connecting. Re-generate the peer config from the Connect section if your public key changed.
  </Accordion>
</AccordionGroup>

## What's next

<CardGroup cols={2}>
  <Card title="Kubernetes access" img="https://mintcdn.com/ornn/fNQRwmmR9H-yzza5/images/banners/access.jpg?fit=max&auto=format&n=fNQRwmmR9H-yzza5&q=85&s=d7f78f5776720abf9eadfbb2519503c6" href="/guides/kubernetes-access" width="1200" height="800" data-path="images/banners/access.jpg">
    Prefer container orchestration? Run a managed Kubernetes cluster on your reserved GPUs instead.
  </Card>

  <Card title="Manage SSH keys" img="https://mintcdn.com/ornn/fNQRwmmR9H-yzza5/images/banners/ssh-keys.jpg?fit=max&auto=format&n=fNQRwmmR9H-yzza5&q=85&s=f17bae5a05d5d6981e1cc7e88d4e664f" href="/guides/ssh-keys" width="1200" height="800" data-path="images/banners/ssh-keys.jpg">
    Register the account SSH keys your Slurm logins use.
  </Card>
</CardGroup>
