If I comment out the torch.no_grad() in the testing, everything works fine, but it does not when the torch.no_grad() exists.
In addition, when the error happens, Volatile GPU-Util is 100%.
To be honest, I cannot reproduce this error in a simple file. The error information is below:
python(PyEval_EvalCode+0x1c) [0x564c01d613ec]
python(+0x22f874) [0x564c01e79874]
python(PyRun_FileExFlags+0xa1) [0x564c01e83b81]
python(PyRun_SimpleFileExFlags+0x1c3) [0x564c01e83d73]
python(+0x23ae5f) [0x564c01e84e5f]
python(_Py_UnixMain+0x3c) [0x564c01e84f7c]
/lib/x86_64-linux-gnu/libc.so.6: __libc_start_main
python(+0x1e0122) [0x564c01e2a122]