Building clusters

  1. In the input/omnia_config.yml, input/security_config.yml, input/telemetry_config.yml and [optional] input/storage_config.yml files, provide the required details.

  2. Create an inventory file in the omnia folder. Check out the sample inventory for more information. If a hostname is used to refer to the target nodes, ensure that the domain name is included in the entry. IP addresses are also accepted in the inventory file.

Hostname requirements
  • The hostname should not contain the following characters: , (comma), . (period) or _ (underscore). However, the domain name is allowed with commas and periods.

  • The hostname cannot start or end with a hyphen (-).

  • No upper case characters are allowed in the hostname.

  • The hostname cannot start with a number.

  • The hostname and the domain name (that is: hostname00000x.domain.xxx) cumulatively cannot exceed 64 characters. For example, if the node_name provided in input/provision_config.yml is ‘node’, and the domain_name provided is ‘omnia.test’, Omnia will set the hostname of a target cluster node to ‘node000001.omnia.test’. Omnia appends 6 digits to the hostname to individually name each target node.

Note

  • Omnia creates a log file which is available at: /var/log/omnia.log.

  • If only Slurm is being installed on the cluster, docker credentials are not required.

  1. omnia.yml is a wrapper playbook comprising of:

    1. security.yml: This playbook sets up centralized authentication (LDAP/FreeIPA) on the cluster. For more information, click here.

    2. storage.yml: This playbook sets up storage tools like BeeGFS and NFS.

    3. scheduler.yml: This playbook sets up job schedulers (Slurm or Kubernetes) on the cluster.

    4. telemetry.yml: This playbook sets up Omnia telemetry and/or iDRAC telemetry. It also installs Grafana and Loki as Kubernetes pods.

    5. rocm_installation.yml: This playbook sets up the ROCm platform for AMD GPU accelerators.

To run omnia.yml:

ansible-playbook omnia.yml -i inventory

Note

  • If you want to view or edit the omnia_config.yml file, run the following command:

    • ansible-vault view omnia_config.yml --vault-password-file .omnia_vault_key – To view the file.

    • ansible-vault edit omnia_config.yml --vault-password-file .omnia_vault_key – To edit the file.

  • Use the ansible-vault view or edit commands and not the ansible-vault decrypt or encrypt commands. If you have used the ansible-vault decrypt or encrypt commands, provide 644 permission to the parameter files.

  1. Once omnia.yml playbook is successfully executed, the cluster is up and running with the required application stack. Now you can install AI tools or utilize the cluster for job execution.

If you have any feedback about Omnia documentation, please reach out at omnia.readme@dell.com.