Machines are high-performing computing for scaling AI applications.
NVIDIA H100 Tensor Core GPU is a type of GPU that is built with NVIDIA’s Hopper GPU architecture. It is useful in large scale AI and HPC workloads.
To learn more about the NVIDIA H100, read NVIDIA’s Hopper Architecture product outline.
Paperspace now supports the NVIDIA H100 both with a single chip (NVIDIA H100x1) and with eight chips (NVIDIA H100x8), currently in the NYC2
datacenter.
Here are the machine details for NVIDIA H100.
Name | GPU Memory (GB) | vCPUs | CPU RAM (GB) | NVLink Support | GPU Interconnect Speeds |
---|---|---|---|---|---|
NVIDIA H100x1 | 80 GB | 20 | 250 GB | No | N/A |
NVIDIA H100x8 | 640 GB | 128 | 2048 GB | Yes | 3.2 Tb/s |
For information about NVLink, see NVIDIA’s NVLink documentation.
NVIDIA H100s are available as on-demand compute which means if there is available capacity, your NVIDIA H100s are immediately accessible once approved by Paperspace.
A VPC private network is required to start any NVIDIA H100 machine. If required, this additionally allows for multi-node training. If your work requires nodes to see a common file system, then you need to provide access to shared drives.
Once you have access to your NVIDIA H100 GPU, follow the Deep Learning with ML in a Box tutorial to learn how to access a generic data science stack. When using the ML-in-a-Box template, you do not need to disable NVLink for H100x1 machines.
Not all libraries and versions work with NVIDIA H100s. If you change your CUDA version or add/remove libraries that differ from the ML-in-a-Box template, this may cause your NVIDIA H100s to not work correctly. See Software Included for the current versions used within ML-in-a-Box.
Ubuntu 20.04 works on all GPUs except H100s. A100-80Gs work with both Ubuntu 20.04 and 22.04, while H100s only work with Ubuntu 22.04.
For H100x1, you are required to install and properly configure NVIDIA drivers and CUDA without fabric-manager which is the management software for NVLink. H100x1 requires you to disable NVLink both at the system level and the RAM disk (initrd
) the system uses to boot up. This ensures that the CUDA starts running. You can follow this guide to disable NVLink in order to successfully run CUDA with NVIDIA H100x1 machines on a Ubuntu 22.04 base image.
For H100x8, you are required to install and properly configure NVIDIA drivers, CUDA with fabric-manager. H100x8 does not require you to disable NVLink with Ubuntu.
The following table displays the performance specifications for NVIDIA H100. We’ve add the NVIDIA A100-80G for reference.
Name | Generation | Type | FP32 CUDA Cores | GPU Memory | Memory Bandwidth | FP64 Tensor Core or FP32 | TF32 Tensor Core | BFLOAT16 or FP16 Tensor Core | FP8 Tensor Core or INT8 Tensor Core |
---|---|---|---|---|---|---|---|---|---|
NVIDIA H100x11 | Hopper | SXM5 | 16,896 | 80 GB HBM3 | 3.35 TB/s | 67 TFLOPS | 989 TFLOPS | 1979 TFLOPS | 3958 TFLOPS/TOPS |
NVIDIA A100-80Gx1 | Ampere | SXM4 | 6,912 | 80 GB HBM2 | 1.555 TB/s | 19.5 TFLOPS | 312 TFLOPS | 624 TFLOPS | N/A / 1248 TOPS |
To learn more about how the NVIDIA H100 GPU compares to other machine types, read our machine types and their performance specs.
After you have created and connected to your NVIDIA H100 GPU, you can use these commands to help you verify your GPUs’ state such as checking if they are accessible within the environment, or if NVLink support is activated.
nvidia-smi
: Checks if the GPUs are present. Use the PyTorch command python -m torch.utils.collect_env
to get additional information about the environment.nvidia-smi topo -m
: Checks if NVLink is available. If it is available, it outputs NV18
between all GPUs on NVIDIA H100.md5sum
) to verify that the move was successful. Instead of using the mv
command to move your files, we also recommend using the cp
command to copy the files to the new location first and then use the rm
command to delete the old files after you’ve run a checksum on the copied files. This maintains an intact copy of the data that you can use in the event that the copied data was corrupted in the copy process.--network=host
than --publish
.~/.cache/huggingface
tmux
while training as it is a tool that allows you to access multiple terminal sessions at once. Since training can result in long computations, tmux
is useful as it does not terminate a run if a network error breaks your connection to the terminal, such as if you receive a client_loop: send disconnect: Broken pipe
error.tmux
with iTerm2, a terminal replacement with tmux
support. Once set up, run the command tmux -CC
. After starting your run, close your tmux session using your Esc key from the original window to detach it. Later, you can reopen it with tmux -CC attach
. When the process is done, close the window.df -h
. This returns an output like the following:Filesystem Size Used Avail Use% Mounted on
tmpfs 25G 1.5M 25G 1% /run
/dev/mapper/ubuntu--vg-root 97G 28G 65G 31% /
tmpfs 123G 0 123G 0% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
/dev/xvda2 1.7G 232M 1.4G 15% /boot
tmpfs 25G 92K 25G 1% /run/user/1000
The /dev/mapper/ubuntu--vg-root
line shows you how much disk space your machine has available.
venv
, tmux
needs to start first before activating the environment.3.11.x
, accessed via python
or python3
.H100 machines use 12.2 as the CUDA driver version and 12.1 as the CUDA runtime version.
nvidia-smi
...
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.129.03 Driver Version: 535.129.03 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
...
nvcc --version
Cuda compilation tools, release 12.1, V12.1.105
Build cuda_12.1.r12.1/compiler.32688072_0
/etc/nccl.conf
to enable optimal running on the new H100 GPU fabric. This is notable within the /etc/nccl.conf
file on the following lines:NCCL_TOPO_FILE=/etc/nccl/topo.xml
NCCL_IB_DISABLE=0
NCCL_IB_CUDA_SUPPORT=1
NCCL_IB_HCA=mlx5
NCCL_CROSS_NIC=0
NCCL_SOCKET_IFNAME=eth0
NCCL_IB_GID_INDEX=1
For machines created before 17 January 2024, users need to run the following command to create the /etc/nccl.conf
file:
sudo bash -c 'cat < /etc/nccl.conf
NCCL_TOPO_FILE=/etc/nccl/topo.xml
NCCL_IB_DISABLE=0
NCCL_IB_CUDA_SUPPORT=1
NCCL_IB_HCA=mlx5
NCCL_CROSS_NIC=0
NCCL_SOCKET_IFNAME=eth0
NCCL_IB_GID_INDEX=1
EOF'
This enables the same optimal running for multinode that machines created after 17 January 2024 have by default.
docker
run command to get the best performance when using multinode on your H100s. Infiniband devices and volumes are related to Infiniband protocols in the upgraded infrastructure.--device /dev/infiniband/:/dev/infiniband/
--volume /dev/infiniband/:/dev/infiniband/
--volume /sys/class/infiniband/:/sys/class/infiniband/
--volume /etc/nccl/:/etc/nccl/
--volume /etc/nccl.conf:/etc/nccl.conf:ro
N/A
in nvidia-smi topo -m
.symlink
:>>> ln -s test /mnt/my_shared_drive/test
ln: failed to create symbolic link '/mnt/my_shared_drive/test': Operation not supported
NVIDIA H100 Tensor Core specifications for data types TF32, BFLOAT16, FP16, FP8, and INT8 has sparsity such that the data contains matrices with mostly zeros. For more information on NVIDIA H100 tensor cores, visit NVIDIA’s Tensor Core GPU data sheet. ↩︎