Page 1 of 1

internal error in: mpi.F at line: 898

Posted: Thu Dec 07, 2023 8:56 am
by scanmat_centre
We installed Vasp 6.4.0, without any warning or error using nvdia_hpc_sdk/23.9 kit in our GPU machine.
We were able to run calculations for a few days and afterwards we are encountering an internal error in mpi.F file.

The detailed error is given below for your reference.

Code: Select all

Local host: scanmatdgx1
--------------------------------------------------------------------------
 running    1 mpi-ranks, on    1 nodes
 distrk:  each k-point on    1 cores,    1 groups
 distr:  one band on    1 cores,    1 groups
 OpenACC runtime initialized ...    1 GPUs detected
 -----------------------------------------------------------------------------
|                     _     ____    _    _    _____     _                     |
|                    | |   |  _ \  | |  | |  / ____|   | |                    |
|                    | |   | |_) | | |  | | | |      | |                    |
|                    |_|   |  _ <  | |  | | | | |_ |   |_|                    |
|                     _    | |_) | | || | | |__| |    _                     |
|                    (_)   |____/   \____/   \_____|   (_)                    |
|                                                                             |
|     internal error in: mpi.F  at line: 898                                  |
|                                                                             |
|     M_init_nccl: Error in ncclCommInitRank                                  |
|                                                                             |
|     If you are not a developer, you should not encounter this problem.      |
|     Please submit a bug report.                                             |
|                                                                             |
 -----------------------------------------------------------------------------

Warning: ieee_inexact is signaling
    1
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.

Please help us to resolve the issue.
Thanks in advance.
SCANMAT.

Re: internal error in: mpi.F at line: 898

Posted: Thu Dec 07, 2023 9:03 am
by jonathan_lahnsteiner2
Dear scanmat_centre,

Is it possible for you to update to the latest version of vasp. You can download vasp.6.4.2 from the vasp portal.
Please check if you still get the same issue.

All the best Jonathan