Automate installation oneAPI on Intel processors for MPI jobs

This topic explains how to automatically update servers for MPI jobs.

Caution

oneAPI is not supported on Ubuntu clusters.

Pre-requisites

  • discovery_provision.yml has been executed.

  • The cluster has been set up with kubernetes.

  • An Omnia slurm cluster has been set up by omnia.yml running with at least 2 nodes: 1 slurm_control_node and 1 slurm_node.

  • A local repository has been set up by listing {"name": "intel_benchmarks"}, in input/software_config.json and running local_repo.yml. For more information, click here.

  • Verify that the target nodes are in the booted state. For more information, click here.

To run the playbook:

cd benchmarks
ansible-playbook intel_benchmark.yml -i inventory

To execute multi-node jobs

  • Ensuree to have NFS shares on each node.

  • Copy slurm script to NFS share and execute it from there.

  • Load all the necessary modules using module load:

    module load mpi
    module load pmi/pmix-x86_64
    module load mkl
    
  • If the commands/batch script are to be run over TCP instead of Infiniband ports, include the below line:

    export FI_PROVIDER=tcp
    

Job execution can now be initiated.

Note

Ensure runme_intel64_dynamic is downloaded before running this command.

srun -N 2 /mnt/nfs_shares/appshare/mkl/2023.0.0/benchmarks/mp_linpack/runme_intel64_dynamic

For a batch job using the same parameters, the script would be:

#!/bin/bash
#SBATCH --job-name=testMPI
#SBATCH --output=output.txt
#SBATCH --partition=normal
#SBATCH --nodelist=node00004.omnia.test,node00005.omnia.test

pwd; hostname; date
export FI_PROVIDER=tcp
module load pmi/pmix-x86_64
module use /opt/intel/oneapi/modulefiles
module load mkl
module load mpi

srun  /mnt/appshare/benchmarks/mp_linpack/runme_intel64_dynamic
date

If you have any feedback about Omnia documentation, please reach out at omnia.readme@dell.com.