Configuring UCX and OpenMPI on the cluster
Prerequisites
Ensure that
ucxandopenmpientry is present in thesoftwareslist insoftware_config.json, as mentioned below:"softwares": [ {"name": "ucx", "version": "1.15.0"}, {"name": "openmpi", "version": "4.1.6"} ]
Ensure to run
local_repo.ymlwith theucxandopenmpientry present insoftware_config.json, to download all required UCX and OpenMPI packages.To install any benchmarking software like UCX or OpenMPI, ensure that
k8s_shareis set totruein storage_config.yml, for one of the entries innfs_client_params. If both are set to true, a higher precedence is given toslurm_share.
Inventory details
For UCX and OpenMPI, all the applicable inventory groups are
slurm_control_nodeandkube_control_plane.The inventory file must contain exactly 1
slurm_control_nodeor/and 1kube_control_plane.
To install UCX and OpenMPI
UCX will be compiled and installed on the NFS share (based on the
client_share_pathprovided in thenfs_client_paramsininput/storage_config.yml).If the cluster uses Slurm and UCX, OpenMPI is configured to compile with the UCX and Slurm on the NFS share (based on the
client_share_pathprovided in thenfs_client_paramsininput/storage_config.yml).
Run either of the following commands:
ansible-playbook omnia.yml -i inventory ansible-playbook scheduler.yml -i inventory
Note
All corresponding compiled UCX and OpenMPI files will be saved to the
<client_share_path>/compiledirectory on the nfs share.All corresponding UCX and OpenMPI executables will be saved to the
<client_share_path>/benchmarks/directory on the nfs share.The default OpenMPI version for Omnia is 4.1.6. If you change the version in the
software.jsonfile, make sure to update it in theopenmpi.jsonfile in theinput/configdirectory as well.To add new nodes to an existing cluster, click here.
If you have any feedback about Omnia documentation, please reach out at omnia.readme@dell.com.