Topics

Cant run ICC on my compute nodes

DARDO ARIEL VIÑAS VISCARDI
 

I had a problem when I tried to run a test domain in WRF on my cluster. 

 [prun] Error: Expected Job launcher mpiexec.hydra not found for impi

So I shh to my node, and try to run the command myself (after loading the intel and impi module)
 
icc
-bash: icc: command not found
mpiexec.hydra
-bash: mpiexec.hydra: command not found
mpirun
-bash: mpirun: command not found

Any idea why this happends? On my master I cant find the commands after loading the module (my master isn't acting as a compute node).

This is my slurm.conf config for the nodes:

# COMPUTE NODES
# OpenHPC default configuration
PropagateResourceLimitsExcept=MEMLOCK
AccountingStorageType=accounting_storage/filetxt
Epilog=/etc/slurm/slurm.epilog.clean
NodeName=yaku04 Weight=100 Sockets=2 CoresPerSocket=16 ThreadsPerCore=1 State=UNKNOWN
NodeName=yaku03 Weight=100 Sockets=2 CoresPerSocket=16 ThreadsPerCore=1 State=UNKNOWN
NodeName=yaku02 Weight=100 Sockets=2 CoresPerSocket=16 ThreadsPerCore=1 State=UNKNOWN
NodeName=yaku01 Weight=100 Sockets=2 CoresPerSocket=16 ThreadsPerCore=1 State=UNKNOWN
NodeName=yaku Weight=10 Sockets=2 CoresPerSocket=16 ThreadsPerCore=1 State=UNKNOWN
PartitionName=normal Nodes=yaku0[1-4] Default=YES MaxTime=24:00:00 State=UP PriorityTier=1
PartitionName=mono Nodes=yaku01 Default=NO MaxTime=4:00:00 State=UP PriorityTier=1
PartitionName=intensiva Nodes=yaku0[1-4] Default=NO MaxTime=UNLIMITED State=UP PriorityTier=1 PreemptMode=requeue
PartitionName=hipri Nodes=yaku0[1-4] Default=NO MaxTime=UNLIMITED State=UP PriorityTier=2 PreemptMode=off
PartitionName=Infiniband Nodes=yaku0[2-4] Default=NO MaxTime=UNLIMITED State=UP PriorityTier=2 PreemptMode=off
ReturnToService=1


 
 

Simba Nyamudzanga
 

The error you are getting might be because the path to the executables is not set, to check if the path is set use the command:

which mpirun
which icc
which mpiexec.hydra

If this does not show the path to the respctive executable try to configure the path to the executables by using:

export PATH=$PATH:/path/to/icc/executable/directory

Do the same for mpirun and mpiexec.hydra

On Thu, Mar 22, 2018 at 2:58 PM, DARDO ARIEL VIÑAS VISCARDI <dardo.vinas@...> wrote:
I had a problem when I tried to run a test domain in WRF on my cluster. 

 [prun] Error: Expected Job launcher mpiexec.hydra not found for impi

So I shh to my node, and try to run the command myself (after loading the intel and impi module)
 
icc
-bash: icc: command not found
mpiexec.hydra
-bash: mpiexec.hydra: command not found
mpirun
-bash: mpirun: command not found

Any idea why this happends? On my master I cant find the commands after loading the module (my master isn't acting as a compute node).

This is my slurm.conf config for the nodes:

# COMPUTE NODES
# OpenHPC default configuration
PropagateResourceLimitsExcept=MEMLOCK
AccountingStorageType=accounting_storage/filetxt
Epilog=/etc/slurm/slurm.epilog.clean
NodeName=yaku04 Weight=100 Sockets=2 CoresPerSocket=16 ThreadsPerCore=1 State=UNKNOWN
NodeName=yaku03 Weight=100 Sockets=2 CoresPerSocket=16 ThreadsPerCore=1 State=UNKNOWN
NodeName=yaku02 Weight=100 Sockets=2 CoresPerSocket=16 ThreadsPerCore=1 State=UNKNOWN
NodeName=yaku01 Weight=100 Sockets=2 CoresPerSocket=16 ThreadsPerCore=1 State=UNKNOWN
NodeName=yaku Weight=10 Sockets=2 CoresPerSocket=16 ThreadsPerCore=1 State=UNKNOWN
PartitionName=normal Nodes=yaku0[1-4] Default=YES MaxTime=24:00:00 State=UP PriorityTier=1
PartitionName=mono Nodes=yaku01 Default=NO MaxTime=4:00:00 State=UP PriorityTier=1
PartitionName=intensiva Nodes=yaku0[1-4] Default=NO MaxTime=UNLIMITED State=UP PriorityTier=1 PreemptMode=requeue
PartitionName=hipri Nodes=yaku0[1-4] Default=NO MaxTime=UNLIMITED State=UP PriorityTier=2 PreemptMode=off
PartitionName=Infiniband Nodes=yaku0[2-4] Default=NO MaxTime=UNLIMITED State=UP PriorityTier=2 PreemptMode=off
ReturnToService=1


 
 


Karl W. Schulz
 

On Mar 22, 2018, at 7:58 AM, DARDO ARIEL VIÑAS VISCARDI <dardo.vinas@...> wrote:

I had a problem when I tried to run a test domain in WRF on my cluster.

[prun] Error: Expected Job launcher mpiexec.hydra not found for impi

So I shh to my node, and try to run the command myself (after loading the intel and impi module)

icc
-bash: icc: command not found
mpiexec.hydra
-bash: mpiexec.hydra: command not found
mpirun
-bash: mpirun: command not found

Any idea why this happends? On my master I cant find the commands after loading the module (my master isn't acting as a compute node).
Did you install the parallel studio package on the head node in the default path, or put it in a path that is already visible to the compute nodes (like /opt/ohpc/pub/intel)? If you chose the default (which is likely /opt/intel), you will want to make sure to export that path to your compute nodes (so, update /etc/exports on head node and /etc/fstab on computes) if you haven’t already.

-k

DARDO ARIEL VIÑAS VISCARDI
 

You know I realized that, so I added the folder on the master in /etc/exports with the others:

/home *(rw,no_subtree_check,fsid=10,no_root_squash)
/opt/ohpc/pub *(ro,no_subtree_check,fsid=11)
/opt/intel *(ro,no_subtree_check,fsid=11)

And on the provosioning image, on the $CHROOT/etc/fstab

[root@n2 ~]# cat /etc/fstab  
tmpfs / tmpfs rw,relatime,mode=555 0 0
tmpfs /dev/shm tmpfs defaults 0 0
devpts /dev/pts devpts gid=5,mode=620 0 0
sysfs /sys sysfs defaults 0 0
proc /proc proc defaults 0 0
10.0.1.1:/home /home nfs nfsvers=3,nodev,nosuid,noatime 0 0
10.0.1.1:/opt/ohpc/pub /opt/ohpc/pub nfs nfsvers=3,nodev,noatime 0 0
10.0.1.1:/opt/intel /opt/intel nfs nfsvers=3,nodev,noatime 0 0

I ran the command "exportfs -a"

But steel, after rebuilding everthing, rebooting nodes, everything..... the folder /opt/intel is showing the content of /opt/ohpc/pub... 

Any ideas why this could be happening?
 

Karl W. Schulz
 

On Mar 22, 2018, at 2:44 PM, DARDO ARIEL VIÑAS VISCARDI <dardo.vinas@...> wrote:

You know I realized that, so I added the folder on the master in /etc/exports with the others:

/home *(rw,no_subtree_check,fsid=10,no_root_squash)
/opt/ohpc/pub *(ro,no_subtree_check,fsid=11)
/opt/intel *(ro,no_subtree_check,fsid=11)

And on the provosioning image, on the $CHROOT/etc/fstab

[root@n2 ~]# cat /etc/fstab
tmpfs / tmpfs rw,relatime,mode=555 0 0
tmpfs /dev/shm tmpfs defaults 0 0
devpts /dev/pts devpts gid=5,mode=620 0 0
sysfs /sys sysfs defaults 0 0
proc /proc proc defaults 0 0
10.0.1.1:/home /home nfs nfsvers=3,nodev,nosuid,noatime 0 0
10.0.1.1:/opt/ohpc/pub /opt/ohpc/pub nfs nfsvers=3,nodev,noatime 0 0
10.0.1.1:/opt/intel /opt/intel nfs nfsvers=3,nodev,noatime 0 0

I ran the command "exportfs -a"

But steel, after rebuilding everthing, rebooting nodes, everything..... the folder /opt/intel is showing the content of /opt/ohpc/pub...

Any ideas why this could be happening?
It might be due to the fact that you are using the same fsid in the /etc/exports file. Can you try making them unique (e.g. change the last line to have fsid=12) and see if that helps?

-k

DARDO ARIEL VIÑAS VISCARDI
 

Yup! You were right! Thank you very much for all your help Karl

2018-03-23 14:01 GMT-03:00 Karl W. Schulz <karl@...>:



> On Mar 22, 2018, at 2:44 PM, DARDO ARIEL VIÑAS VISCARDI <dardo.vinas@....ar> wrote:
>
> You know I realized that, so I added the folder on the master in /etc/exports with the others:
>
> /home *(rw,no_subtree_check,fsid=10,no_root_squash)
> /opt/ohpc/pub *(ro,no_subtree_check,fsid=11)
> /opt/intel *(ro,no_subtree_check,fsid=11)
>
> And on the provosioning image, on the $CHROOT/etc/fstab
>
> [root@n2 ~]# cat /etc/fstab
> tmpfs / tmpfs rw,relatime,mode=555 0 0
> tmpfs /dev/shm tmpfs defaults 0 0
> devpts /dev/pts devpts gid=5,mode=620 0 0
> sysfs /sys sysfs defaults 0 0
> proc /proc proc defaults 0 0
> 10.0.1.1:/home /home nfs nfsvers=3,nodev,nosuid,noatime 0 0
> 10.0.1.1:/opt/ohpc/pub /opt/ohpc/pub nfs nfsvers=3,nodev,noatime 0 0
> 10.0.1.1:/opt/intel /opt/intel nfs nfsvers=3,nodev,noatime 0 0
>
> I ran the command "exportfs -a"
>
> But steel, after rebuilding everthing, rebooting nodes, everything..... the folder /opt/intel is showing the content of /opt/ohpc/pub...
>
> Any ideas why this could be happening?

It might be due to the fact that you are using the same fsid in the /etc/exports file.  Can you try making them unique (e.g. change the last line to have fsid=12) and see if that helps?

-k






Patrick Goetz
 

I'm pretty sure you're not supposed to have 2 exports with the same fsid. Check to see if /opt/intel is even being mounted.

On 03/22/2018 02:44 PM, DARDO ARIEL VIÑAS VISCARDI wrote:
You know I realized that, so I added the folder on the master in /etc/exports with the others:
/home *(rw,no_subtree_check,fsid=10,no_root_squash)
/opt/ohpc/pub *(ro,no_subtree_check,fsid=11)
/opt/intel *(ro,no_subtree_check,fsid=11)
And on the provosioning image, on the $CHROOT/etc/fstab
[root@n2 ~]# cat /etc/fstab
tmpfs / tmpfs rw,relatime,mode=555 0 0
tmpfs /dev/shm tmpfs defaults 0 0
devpts /dev/pts devpts gid=5,mode=620 0 0
sysfs /sys sysfs defaults 0 0
proc /proc proc defaults 0 0
10.0.1.1:/home /home nfs nfsvers=3,nodev,nosuid,noatime 0 0
10.0.1.1:/opt/ohpc/pub /opt/ohpc/pub nfs nfsvers=3,nodev,noatime 0 0
10.0.1.1:/opt/intel /opt/intel nfs nfsvers=3,nodev,noatime 0 0
I ran the command "exportfs -a"
But steel, after rebuilding everthing, rebooting nodes, everything..... the folder /opt/intel is showing the content of /opt/ohpc/pub...
Any ideas why this could be happening?