Configuring specific local repositories
AMD GPU ROCm
To install ROCm, do the following:
Include the following line under
softwaresininput/software_config.json:{"name": "amdgpu", "version": "6.0"},Add the following line below the
softwaressection:"amdgpu": [ {"name": "rocm", "version": "6.0" } ]A sample format is available here.
BeeGFS
To install BeeGFS, include the following line under
softwaresininput/software_config.json:{"name": "beegfs", "version": "7.4.2"},
For information on deploying BeeGFS after setting up the cluster, click here.
CUDA
To install CUDA, include the following line under
softwaresininput/software_config.json:{"name": "cuda", "version": "12.3.2"},For a list of repositories (and their types) configured for CUDA, view the
input/config/<cluster_os_type>/<cluster_os_version>/cuda.jsonfile. To customize your CUDA installation, update the file. URLs for different versions can be found here:For Ubuntu:
{ "cuda": { "cluster": [ { "package": "cuda", "type": "iso", "url": "https://developer.download.nvidia.com/compute/cuda/12.3.2/local_installers/cuda-repo-ubuntu2204-12-3-local_12.3.2-545.23.08-1_amd64.deb", "path": "" } ] } }For RHEL or Rocky Linux:
{ "cuda": { "cluster": [ { "package": "cuda", "type": "iso", "url": "https://developer.download.nvidia.com/compute/cuda/12.3.2/local_installers/cuda-repo-rhel8-12-3-local-12.3.2_545.23.08-1.x86_64.rpm", "path": "" }, { "package": "dkms", "type": "rpm", "repo_name": "epel" } ] } }
If the package version is customized, ensure that the
versionvalue is updated insoftware_config.json`.If the target cluster runs on RHEL or Rocky Linux, ensure the “dkms” package is included in
input/config/<cluster_os_type>/8.x/cuda.jsonas illustrated above.
BCM RoCE
To install RoCE, do the following:
Include the following line under
softwaresininput/software_config.json:{"name": "bcm_roce", "version": "229.2.61.0"}Add the following line below the
softwaressection:"bcm_roce": [ {"name": "bcm_roce_libraries", "version": "229.2.61.0"} ],A sample format is available here.
For a list of repositories (and their types) configured for RoCE, view the
input/config/ubuntu/<cluster_os_verison>/bcm_roce.json.{ "bcm_roce": { "cluster": [ { "package": "bcm_roce_driver_{{ bcm_roce_version }}", "type": "tarball", "url": "", "path": "" } ] }, "bcm_roce_libraries": { "cluster": [ { "package": "bcm_roce_source_{{ bcm_roce_libraries_version }}", "type": "tarball", "url": "", "path": "" }, {"package": "libelf-dev", "type": "deb", "repo_name": "jammy"}, {"package": "gcc", "type": "deb", "repo_name": "jammy"}, {"package": "make", "type": "deb", "repo_name": "jammy"}, {"package": "libtool", "type": "deb", "repo_name": "jammy"}, {"package": "autoconf", "type": "deb", "repo_name": "jammy"}, {"package": "librdmacm-dev", "type": "deb", "repo_name": "jammy"}, {"package": "rdmacm-utils", "type": "deb", "repo_name": "jammy"}, {"package": "infiniband-diags", "type": "deb", "repo_name": "jammy"}, {"package": "ibverbs-utils", "type": "deb", "repo_name": "jammy"}, {"package": "perftest", "type": "deb", "repo_name": "jammy"}, {"package": "ethtool", "type": "deb", "repo_name": "jammy"}, {"package": "libibverbs-dev", "type": "deb", "repo_name": "jammy"}, {"package": "rdma-core", "type": "deb", "repo_name": "jammy"}, {"package": "strace", "type": "deb", "repo_name": "jammy"} ] } }
Note
The RoCE driver is only supported on Ubuntu clusters.
The only accepted URL for the RoCE driver is from the Dell support site.
Kubernetes plugin for the RoCE NIC
To install Kubernetes plugin for the RoCE NIC, do the following:
Include the following line under
softwaresininput/software_config.json:{"name": "roce_plugin"},A sample format is available here.
Note
The RoCE plugin is only supported on Ubuntu clusters.
Custom repositories
Include the following line under
softwaresininput/software_config.json:{"name": "custom"},Create a
custom.jsonfile in the following directory:input/config/<cluster_os_type>/<cluster_os_version>to define the repositories. For example, For a cluster running RHEL 8.8, go toinput/config/rhel/8.8/and create the file there. The file is a JSON list consisting of the package name, repository type, URL (optional), and version (optional). Below is a sample version of the file:{ "custom": { "cluster": [ { "package": "ansible==5.3.2", "type": "pip_module" }, { "package": "docker-ce-24.0.4", "type": "rpm", "repo_name": "docker-ce-repo" }, { "package": "gcc", "type": "rpm", "repo_name": "appstream" }, { "package": "community.general", "type": "ansible_galaxy_collection", "version": "4.4.0" }, { "package": "perl-Switch", "type": "rpm", "repo_name": "codeready-builder" }, { "package": "prometheus-slurm-exporter", "type": "git", "url": "https://github.com/vpenso/prometheus-slurm-exporter.git", "version": "master" }, { "package": "ansible.utils", "type": "ansible_galaxy_collection", "version": "2.5.2" }, { "package": "prometheus-2.23.0.linux-amd64", "type": "tarball", "url": "https://github.com/prometheus/prometheus/releases/download/v2.23.0/prometheus-2.23.0.linux-amd64.tar.gz" }, { "package": "metallb-native", "type": "manifest", "url": "https://raw.githubusercontent.com/metallb/metallb/v0.13.4/config/manifests/metallb-native.yaml" }, { "package": "registry.k8s.io/pause", "version": "3.9", "type": "image" } ] } }
FreeIPA
To install FreeIPA, include the following line under
softwaresininput/software_config.json:{"name": "freeipa"},
For more information on FreeIPA, click here.
Jupyterhub
To install Jupyterhub, include the following line under
softwaresininput/software_config.json:{"name": "jupyter"},
For information on deploying Jupyterhub after setting up the cluster, click here.
Kserve
To install Kserve, include the following line under
softwaresininput/software_config.json:{"name": "kserve"},
For information on deploying Kserve after setting up the cluster, click here.
Kubeflow
To install kubeflow, include the following line under
softwaresininput/software_config.json:{"name": "kubeflow"},
For information on deploying kubeflow after setting up the cluster, click here.
Kubernetes
To install Kubernetes, include the following line under
softwaresininput/software_config.json:{"name": "k8s", "version":"1.26.12"},
For more information about installing Kubernetes, click here.
Note
The version of the software provided above is the only version of the software Omnia supports.
OFED
To install OFED, include the following line under
softwaresininput/software_config.json:{"name": "ofed", "version": "24.01-0.3.3.1"},For a list of repositories (and their types) configured for OFED, view the
input/config/<cluster_os_type>/<cluster_os_version>/ofed.jsonfile. To customize your OFED installation, update the file.:For Ubuntu:
{ "ofed": { "cluster": [ { "package": "ofed", "type": "iso", "url": "https://content.mellanox.com/ofed/MLNX_OFED-24.01-0.3.3.1/MLNX_OFED_LINUX-24.01-0.3.3.1-ubuntu20.04-x86_64.iso", "path": "" } ] } }For RHEL or Rocky Linux:
{ "ofed": { "cluster": [ { "package": "ofed", "type": "iso", "url": "https://content.mellanox.com/ofed/MLNX_OFED-24.01-0.3.3.1/MLNX_OFED_LINUX-24.01-0.3.3.1-rhel8.7-x86_64.iso", "path": "" } ] } }
Note
If the package version is customized, ensure that the version value is updated in software_config.json.
OpenLDAP
To install OpenLDAP, include the following line under
softwaresininput/software_config.json:{"name": "openldap"},
For more information on OpenLDAP, click here.
OpenMPI
To install OpenMPI, include the following line under
softwaresininput/software_config.json:{"name": "openmpi", "version":"4.1.6"},
OpenMPI is deployed on the cluster when the above configurations are complete and omnia.yml playbook is executed.
For more information on OpenMPI configurations, click here.
Note
The default OpenMPI version for Omnia is 4.1.6. If you change the version in the software.json file, make sure to update it in the openmpi.json file in the input/config directory as well.
Pytorch
To install PyTorch, do the following:
Include the following line under
softwaresininput/software_config.json:{"name": "pytorch"},Add the following line below the
softwaressection:"pytorch": [ {"name": "pytorch_cpu"}, {"name": "pytorch_amd"}, {"name": "pytorch_nvidia"} ],A sample format is available here.
For information on deploying Pytorch after setting up the cluster, click here.
Secure Login Node
To secure the login node, include the following line under
softwaresininput/software_config.json:{"name": "secure_login_node"},
For more information on configuring login node security, click here.
TensorFlow
To install TensorFlow, do the following:
Include the following line under
softwaresininput/software_config.json:{"name": "tensorflow"},Add the following line below the
softwaressection:"tensorflow": [ {"name": "tensorflow_cpu"}, {"name": "tensorflow_amd"}, {"name": "tensorflow_nvidia"} ]A sample format is available here.
For information on deploying TensorFlow after setting up the cluster, click here.
Unified Communication X
To install UCX, include the following line under
softwaresininput/software_config.json:{"name": "ucx", "version":"1.15.0"},
UCX is deployed on the cluster when local_repo.yml playbook is executed, followed by the execution of omnia.yml.
For more information on UCX configurations, click here.
vLLM
To install vLLM, do the following:
Include the following line under
softwaresininput/software_config.json:{"name": "vLLM"},Add the following line below the
softwaressection:"vllm": [ {"name": "vllm_amd"}, {"name": "vllm_nvidia"} ],A sample format is available here.
For information on deploying vLLM after setting up the cluster, click here.
Intel benchmarks
To install Intel benchmarks, include the following line under
softwaresininput/software_config.json:{"name": "intel_benchmarks", "version": "2024.1.0"},
For more information on Intel benchmarks, click here.
AMD benchmarks
To install AMD benchmarks, include the following line under
softwaresininput/software_config.json:{"name": "amd_benchmarks"},
For more information on AMD benchmarks, click here.
If you have any feedback about Omnia documentation, please reach out at omnia.readme@dell.com.