Input parameters for Local Repositories

Input all required values in input/software_config.json.

Parameters for Software Configuration
Parameter	Details
cluster_os_type `string` Required	The operating system running on the cluster. Value: `ubuntu`.
cluster_os_version `string` Required	The OS Version that will be provisoned on compute nodes. For Ubuntu, the accepted values are `22.04` and `24.04`. Note Specify only the major version for the `cluster_os_version` parameter (for example, 22.04), and do not specify minor versions such as 22.04.5. Default value: `22.04`
repo_config `string` Required	The type of offline configuration user needs. When the value is set to `always`, Omnia creates a local repository/registry on the OIM hosting all the packages/images required for the cluster. When the value is set to `partial`, Omnia creates a local repository/registry on the OIM hosting all the packages/images except those listed in the `user_repo_url/user_registry` in `input/local_repo_config.yml`. When the value is set to `never`, Omnia does not create a local repository/registry. All the packages/images are directly downloaded on the cluster. Note After `local_repo.yml` has run, the value of `repo_config` in `input/software_config.json` cannot be updated without running the oim_cleanup.yml script first. Irrespective of the value of `repo_config`, all local repositories that are not available as images, debian packages, or RPMs will be downloaded and configured locally on the OIM. Additionally, AMD GPU drivers, Intel Gaudi drivers, CUDA, and OFED are downloaded by default if they are mentioned in the `input/software_config.json`. Accepted values: `always` `partial` <- Default `never`
softwares `JSON list` Required	A JSON list of required software and (optionally) the software version. The following software should be listed with a version in the list: BeeGFS, AMD GPU, Intel Gaudi, Kubernetes, CUDA, OFED, UCX, and ROCm. A minimum of one software should be provided in the list for `local_repo.yml` to execute correctly. The `<os_type>_software_config.json` will have the basic softwares present in it. To add additional software stacks, add the software under `input/software_config.json`. For the list of all applicable softwares based on your `<cluster_os_type>`, refer the templates at `examples/template_<os>_software_config.json`. For example, `examples/template_ubuntu_software_config.json`. Note The accepted names for software is taken from `input/config/<cluster_os_type>/<cluster_os_version>`.

Sample version for Ubuntu:

{
    "cluster_os_type": "ubuntu",
    "cluster_os_version": "22.04",
    "repo_config": "partial",
    "softwares": [
        {"name": "amdgpu", "version": "6.3.1"},
        {"name": "cuda", "version": "12.8.0"},
        {"name": "bcm_roce", "version": "232.1.133.2"},
        {"name": "ofed", "version": "24.01-0.3.3.1"},
        {"name": "openldap"},
        {"name": "secure_login_node"},
        {"name": "nfs"},
        {"name": "beegfs", "version": "7.4.5"},
        {"name": "k8s", "version":"1.31.4"},
        {"name": "roce_plugin"},
        {"name": "jupyter"},
        {"name": "kubeflow"},
        {"name": "kserve"},
        {"name": "pytorch"},
        {"name": "tensorflow"},
        {"name": "vllm"},
        {"name": "telemetry"},
        {"name": "ucx", "version": "1.15.0"},
        {"name": "openmpi", "version": "4.1.6"},
        {"name": "intelgaudi", "version": "1.19.2-32"},
        {"name": "csi_driver_powerscale", "version":"v2.13.0"}
        {"name": "intel_benchmarks", "version": "2024.1"},
        {"name": "amd_benchmarks"}
    ],

    "bcm_roce": [
        {"name": "bcm_roce_libraries", "version": "232.1.133.2"}
    ],
    "amdgpu": [
        {"name": "rocm", "version": "6.3.1" }
    ],
    "intelgaudi": [
        {"name": "intel"}
    ],
    "vllm": [
        {"name": "vllm_amd"},
        {"name": "vllm_nvidia"}
    ],
    "pytorch": [
        {"name": "pytorch_cpu"},
        {"name": "pytorch_amd"},
        {"name": "pytorch_nvidia"},
        {"name": "pytorch_gaudi"}
    ],
    "tensorflow": [
        {"name": "tensorflow_cpu"},
        {"name": "tensorflow_amd"},
        {"name": "tensorflow_nvidia"}
    ]
}

For a list of accepted values in softwares, go to input/config/<cluster_os_type>/<cluster_os_version> and view the list of JSON files available. The filenames present in this location (without the * .json extension) are a list of accepted software names. The repositories to be downloaded for each software are listed the corresponding JSON file. For example, for a cluster running Ubuntu 22.04, go to input/config/ubuntu/22.04/ and view the file list:

amdgpu.json
bcm_roce.json
beegfs.json
cuda.json
jupyter.json
k8s.json
kserve.json
kubeflow.json
roce_plugin.json
nfs.json
ofed.json
openldap.json
pytorch.json
tensorflow.json
vllm.json
intelgaudi.json

For a list of repositories (and their types) configured for AMD GPUs, view the amdgpu.json file:

{
  "amdgpu": {
    "cluster": [
        {"package": "linux-headers-$(uname -r)", "type": "deb", "repo_name": "jammy"},
        {"package": "linux-modules-extra-$(uname -r)", "type": "deb", "repo_name": "jammy"},
        {"package": "amdgpu-dkms", "type": "deb", "repo_name": "amdgpu"}
    ]
  },
  "rocm": {
    "cluster": [
      {"package": "rocm", "type": "deb", "repo_name": "rocm"},
      {"package": "rocm-validation-suite", "type": "deb", "repo_name": "rocm"}
    ]
  }
}

Note

To configure a locally available repository that does not have a pre-defined json file, click here.

Input the required values in input/local_repo_config.yml.

Parameters for Local Repository Configuration
Parameter	Details
repo_store_path `string` Required	The intended file path for offline repository data. Ensure the disk partition has enough space. Ensure that 755 permission is given to `repo_store_path` if user intends to use nfs share mount for `repo_store_path`. Default value: `"/opt/omnia_repo"`
user_repo_url `JSON List` Optional	This variable accepts the repository urls of the user which contains the packages required for the cluster. When `repo_config` is always, the given list will be configured on the OIM and packages required for cluster will be downloaded into a local repository. When `repo_config` is partial, a local repository is created on the OIM containing packages that are not part of the user’s repository. When `repo_config` is never, no local repository is created and packages are downloaded on all cluster nodes. ‘url’ defines the baseurl for the repository. ‘gpgkey’ defines gpgkey for the repository. If ‘gpgkey’ is omitted then gpgcheck=0 is set for that repository. Sample value: `- {url: "http://crb.com/CRB/x86_64/os/",gpgkey: "http://crb.com/CRB/x86_64/os/RPM-GPG-KEY"}`
user_registry `JSON List` Optional	This variable accepts the registry url along with port of the user which contains the images required for cluster. When `repo_config` is always, the list given in `user_registry` will be configured on the OIM and packages required for cluster will be downloaded into a local repository. If the same repository is available in both the `user_repo_url` and the `user_registry`, the repository will be configured using the values in `user_registry`. When `repo_config` is partial, a local registry is created on the OIM containing packages that are not part of the `user_registry`. Images listed in `user_registry` are directly configured as a mirror on compute nodes. Compute nodes are expected to connect to the URLs in the `user_registry` via http_proxy. When `repo_config` is never, no local registry is created and packages/images are downloaded on all cluster nodes. ‘host’ defines the URL and path to the registry. ‘cert_path’ defines the absolute path where the security certificates for each registry. If this path is not provided, insecure registries are configured. Sample value: - { host: 10.11.0.100:5001, cert_path: "/home/ca.crt" } - { host: registryhostname.registry.test, cert_path: "" }
ubuntu_os_url `string` Required	Mandatory when `cluster_os_type` is ubuntu in `softwares_config.json`. This variables defines the repos to be configured on all the compute nodes. When `repo_config` is always, partial or never, the given ubuntu_os_url configured via proxy in compute nodes. Online `ubuntu_os_url` for Ubuntu 22.04 or 24.04 is “http://in.archive.ubuntu.com/ubuntu”. Example: When cluster_os_type is Ubuntu 22.04, `ubuntu_os_url` should be “http://in.archive.ubuntu.com/ubuntu”
omnia_repo_url_ubuntu `JSON List` Required	A list of all the repo urls from where deb packages will be downloaded for Omnia features on Ubuntu clusters. ‘url’ defines the baseurl for the repository. ‘gpgkey’ defines gpgkey for the repository. If ‘gpgkey’ is omitted, the repository will be marked as “trusted”. On clusters running Ubuntu, if gpgkeys are not available, public keys are accepted in place of gpgkeys. However, the field public key cannot be left blank. This value is not validated by Omnia. Any errors can cause Omnia to fail. Ensure that all URLs listed below are reachable to the OIM. Default value: - { url: "https://download.docker.com/linux/ubuntu {{ os_release }} stable", gpgkey: "https://download.docker.com/linux/ubuntu/gpg" } - { url: "https://repo.radeon.com/rocm/apt/{{ rocm_version }} {{ os_release }} main", gpgkey: "https://repo.radeon.com/rocm/rocm.gpg.key" } - { url: "https://www.beegfs.io/release/beegfs_{{beegfs_version}} {{ os_release }} non-free", gpgkey: "https://www.beegfs.io/release/beegfs_{{beegfs_version}}/gpg/GPG-KEY-beegfs" } - { url: "https://repo.radeon.com/amdgpu/{{ amdgpu_version }}/ubuntu {{ os_release }} main", gpgkey: "https://repo.radeon.com/rocm/rocm.gpg.key" } - { url: "https://ltb-project.org/debian/openldap25/jammy jammy main", publickey: "https://ltb-project.org/documentation/_static/RPM-GPG-KEY-LTB-project" } - { url: "https://nvidia.github.io/libnvidia-container/stable/deb/amd64 /", gpgkey: "https://nvidia.github.io/libnvidia-container/gpgkey" } - { url: "http://ppa.launchpad.net/deadsnakes/ppa/ubuntu {{ os_release }} main", gpgkey: "" } - { url: "https://a2o.github.io/snoopy-packages/repo/ubuntu {{ os_release }} stable", publickey: "https://a2o.github.io/snoopy-packages/snoopy-packages-key.pub" } - { url: "https://vault.habana.ai/artifactory/debian {{ os_release }} main", publickey: "https://vault.habana.ai/artifactory/api/gpg/key/public" }

Input docker_username and docker_password in input/provision_config_credentials.yml to avoid image pullback errors.

If you have any feedback about Omnia documentation, please reach out at omnia.readme@dell.com.