Cluster formation
In the
input/omnia_config.yml,input/security_config.yml, andinput/storage_config.ymlfiles, provide the required details. Forinput/telemetry_config.yml, the details can be found here.Create an inventory file in the omnia folder. Check out the sample inventory for more information. If a hostname is used to refer to the target nodes, ensure that the domain name is included in the entry. IP addresses are also accepted in the inventory file.
Note
Omnia creates a log file which is available at: /var/log/omnia.log.
omnia.ymlis a wrapper playbook and achieves the following tasks:security.yml: This playbook sets up centralized authentication (OpenLDAP) on the cluster. For more information, click here.storage.yml: This playbook sets up storage tools such as, NFS.scheduler.yml: This playbook sets up the (Kubernetes) job scheduler on the cluster.telemetry.yml: This playbook sets up Omnia telemetry and/or iDRAC telemetry. It also installs Grafana and Loki as Kubernetes pods.rocm_installation.yml: This playbook sets up the ROCm platform for AMD GPU accelerators.performance_profile.yml: This playbook is located in theutils/performance_profiledirectory and it enables you to optimize system performance for specific workloads. For more information, see Performance profile configuration.
Note
To run the scheduler.yml, security.yml, telemetry.yml, or storage.yml playbook independently from the omnia.yml playbook on Intel Gaudi nodes, start by executing the performance_profile.yml playbook. Once that’s done, you can run the respective playbooks separately.
To run omnia.yml:
ansible-playbook omnia.yml -i inventory
Note
If you want to view or edit the
omnia_config.ymlfile, run the following command:ansible-vault view omnia_config.yml --vault-password-file .omnia_vault_key– To view the file.ansible-vault edit omnia_config.yml --vault-password-file .omnia_vault_key– To edit the file.
Use the ansible-vault view or edit commands and not the ansible-vault decrypt or encrypt commands. If you have used the ansible-vault decrypt or encrypt commands, provide 644 permission to the parameter files.
Once
omnia.ymlplaybook is successfully executed, the cluster is up and running with the required application stack. Now, you can install AI tools or utilize the cluster for job execution.
If you have any feedback about Omnia documentation, please reach out at omnia.readme@dell.com.