Automate installation oneAPI on Intel processors for MPI jobs
This topic explains how to automatically update servers for MPI jobs.
Caution
oneAPI is not supported on Ubuntu clusters.
Pre-requisites
discovery_provision.yml
has been executed.The cluster has been set up with kubernetes.
An Omnia slurm cluster has been set up by
omnia.yml
running with at least 2 nodes: 1 slurm_control_node and 1 slurm_node.A local repository has been set up by listing
{"name": "intel_benchmarks"},
ininput/software_config.json
and runninglocal_repo.yml
. For more information, click here.Verify that the target nodes are in the
booted
state. For more information, click here.
To run the playbook:
cd benchmarks
ansible-playbook intel_benchmark.yml -i inventory
To execute multi-node jobs
Ensuree to have NFS shares on each node.
Copy slurm script to NFS share and execute it from there.
Load all the necessary modules using module load:
module load mpi module load pmi/pmix-x86_64 module load mkl
If the commands/batch script are to be run over TCP instead of Infiniband ports, include the below line:
export FI_PROVIDER=tcp
Job execution can now be initiated.
Note
Ensure runme_intel64_dynamic
is downloaded before running this command.
srun -N 2 /mnt/nfs_shares/appshare/mkl/2023.0.0/benchmarks/mp_linpack/runme_intel64_dynamic
For a batch job using the same parameters, the script would be:
#!/bin/bash
#SBATCH --job-name=testMPI
#SBATCH --output=output.txt
#SBATCH --partition=normal
#SBATCH --nodelist=node00004.omnia.test,node00005.omnia.test
pwd; hostname; date
export FI_PROVIDER=tcp
module load pmi/pmix-x86_64
module use /opt/intel/oneapi/modulefiles
module load mkl
module load mpi
srun /mnt/appshare/benchmarks/mp_linpack/runme_intel64_dynamic
date
If you have any feedback about Omnia documentation, please reach out at omnia.readme@dell.com.