尝试稀疏计算时遇到的问题

我试着给之前的墨戏引擎支持稀疏计算(这是修改后的代码code)但是一直弄不好。我又写了个小demo测试了一下,感觉思路没有问题。但是在墨戏中一直报错:

[Taichi] version 0.8.3, llvm 10.0.0, commit 021af5d2, win, python 3.9.6
[TaiGLSL] version 0.0.11
[Taichi] Starting on arch=cuda
[E 10/15/21 23:51:08.419 28968] [taichi/backends/cuda/cuda_driver.h:taichi::lang::CUDADriverFunction<void * *,char const *,unsigned int,unsigned int *,void * *>::operator ()@86] CUDA Error CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered while calling module_load_data_ex (cuModuleLoadDataEx)


***********************************
* Taichi Compiler Stack Traceback *
***********************************
0x7ffff364f81a: taichi::print_traceback in taichi_core.pyd
0x7ffff351ee79: PyInit_taichi_core in taichi_core.pyd
0x7ffff378d6cb: taichi::print_traceback in taichi_core.pyd
0x7ffff377938d: taichi::print_traceback in taichi_core.pyd
0x7ffff34f9798: PyInit_taichi_core in taichi_core.pyd
0x7ffff377919a: taichi::print_traceback in taichi_core.pyd
0x7ffff35b5db6: PyInit_taichi_core in taichi_core.pyd
0x7ffff35e9c68: PyInit_taichi_core in taichi_core.pyd
0x7ffff35d6275: PyInit_taichi_core in taichi_core.pyd
0x7ffff35d5c68: PyInit_taichi_core in taichi_core.pyd
0x7ffff3668505: taichi::print_traceback in taichi_core.pyd
0x7ffff35285bb: PyInit_taichi_core in taichi_core.pyd
0x7ffff3589239: PyInit_taichi_core in taichi_core.pyd
0x7ffff35285bb: PyInit_taichi_core in taichi_core.pyd
0x7ffff3668201: taichi::print_traceback in taichi_core.pyd
0x7ffff36679a5: taichi::print_traceback in taichi_core.pyd
0x7ffff36ac4f0: taichi::print_traceback in taichi_core.pyd
0x7ffff36665f3: taichi::print_traceback in taichi_core.pyd
0x7ffff36649c0: taichi::print_traceback in taichi_core.pyd
0x7ffff35d6d99: PyInit_taichi_core in taichi_core.pyd
0x7ffff35b5d91: PyInit_taichi_core in taichi_core.pyd
0x7ffff35e9c68: PyInit_taichi_core in taichi_core.pyd
0x7ffff35d6275: PyInit_taichi_core in taichi_core.pyd
0x7ffff35d5c68: PyInit_taichi_core in taichi_core.pyd
0x7ffff3473d7e: PyInit_taichi_core in taichi_core.pyd
0x7ffff3406126: PyInit_taichi_core in taichi_core.pyd
0x7ffff33d6ccb: pybind11::error_already_set::discard_as_unraisable in taichi_core.pyd
0x7ff8223987d2: PyArg_ParseTuple_SizeT in python39.dll
0x7ff8223f4cbc: PyObject_MakeTpCall in python39.dll
0x7ff8224ca420: Py_gitversion in python39.dll
0x7ff8223807dc: PyVectorcall_Call in python39.dll
0x7ff8223805d7: PyObject_Call in python39.dll
0x7ff8224fb937: Py_gitversion in python39.dll
0x7ff8223f4cbc: PyObject_MakeTpCall in python39.dll
0x7ff822459627: Py_DecodeUTF8Ex in python39.dll
0x7ff822560fa3: PyEval_ThreadsInitialized in python39.dll
0x7ff822561f12: Py_FatalError_TstateNULL in python39.dll
0x7ff8224ce87f: Py_gitversion in python39.dll
0x7ff82239e0a3: PyObject_GC_Del in python39.dll
0x7ff82239fa67: PyFunction_Vectorcall in python39.dll
0x7ff8223807dc: PyVectorcall_Call in python39.dll
0x7ff8223806ca: PyObject_Call in python39.dll
0x7ff8223a6c06: PyEval_EvalFrameDefault in python39.dll
0x7ff82239e0a3: PyObject_GC_Del in python39.dll
0x7ff82239fa67: PyFunction_Vectorcall in python39.dll
0x7ff8223f64db: PyObject_FastCallDictTstate in python39.dll
0x7ff822462b5c: PyObject_Call_Prepend in python39.dll
0x7ff822462ab8: PyArg_ParseStack_SizeT in python39.dll
0x7ff82238072c: PyObject_Call in python39.dll
0x7ff8223a6c06: PyEval_EvalFrameDefault in python39.dll
0x7ff82239e0a3: PyObject_GC_Del in python39.dll
0x7ff82239fa67: PyFunction_Vectorcall in python39.dll
0x7ff8223f64db: PyObject_FastCallDictTstate in python39.dll
0x7ff822462b5c: PyObject_Call_Prepend in python39.dll
0x7ff822462ab8: PyArg_ParseStack_SizeT in python39.dll
0x7ff8223f4cbc: PyObject_MakeTpCall in python39.dll
0x7ff822459627: Py_DecodeUTF8Ex in python39.dll
0x7ff822560fa3: PyEval_ThreadsInitialized in python39.dll
0x7ff822561f12: Py_FatalError_TstateNULL in python39.dll
0x7ff8224cdd41: Py_gitversion in python39.dll
0x7ff82239e0a3: PyObject_GC_Del in python39.dll
0x7ff8223f68fd: PyEval_EvalCodeWithName in python39.dll
0x7ff8223ee9d3: PyEval_EvalCodeEx in python39.dll
0x7ff8223ee931: PyEval_EvalCode in python39.dll
0x7ff8223ee7b2: PyMemoryView_FromObject in python39.dll
0x7ff8223ee6bb: PyMemoryView_FromObject in python39.dll
0x7ff8223dc97b: PyObject_GetBuffer in python39.dll
0x7ff8224595ec: Py_DecodeUTF8Ex in python39.dll
0x7ff822560fa3: PyEval_ThreadsInitialized in python39.dll
0x7ff822561fca: Py_FatalError_TstateNULL in python39.dll
0x7ff8224ce87f: Py_gitversion in python39.dll
0x7ff82239e0a3: PyObject_GC_Del in python39.dll
0x7ff82239fa67: PyFunction_Vectorcall in python39.dll
0x7ff8224595ec: Py_DecodeUTF8Ex in python39.dll
0x7ff822560fa3: PyEval_ThreadsInitialized in python39.dll
0x7ff822561f12: Py_FatalError_TstateNULL in python39.dll
0x7ff8224ce87f: Py_gitversion in python39.dll
0x7ff82239e0a3: PyObject_GC_Del in python39.dll
0x7ff82239fa67: PyFunction_Vectorcall in python39.dll
0x7ff8224595ec: Py_DecodeUTF8Ex in python39.dll
0x7ff822560fa3: PyEval_ThreadsInitialized in python39.dll
0x7ff822561f12: Py_FatalError_TstateNULL in python39.dll
0x7ff8224cef8a: Py_gitversion in python39.dll
0x7ff82239e0a3: PyObject_GC_Del in python39.dll
0x7ff82239fa67: PyFunction_Vectorcall in python39.dll
0x7ff8224595ec: Py_DecodeUTF8Ex in python39.dll
0x7ff822560fa3: PyEval_ThreadsInitialized in python39.dll
0x7ff822561f12: Py_FatalError_TstateNULL in python39.dll
0x7ff8224cef8a: Py_gitversion in python39.dll
0x7ff8223a2fa2: PyEval_EvalFrameDefault in python39.dll
0x7ff82239f984: PyFunction_Vectorcall in python39.dll
0x7ff8223a2883: PyEval_EvalFrameDefault in python39.dll
0x7ff82239e0a3: PyObject_GC_Del in python39.dll
0x7ff8223f68fd: PyEval_EvalCodeWithName in python39.dll
0x7ff8223ee9d3: PyEval_EvalCodeEx in python39.dll
0x7ff8223ee931: PyEval_EvalCode in python39.dll
0x7ff8223ee7b2: PyMemoryView_FromObject in python39.dll
0x7ff8223ee6bb: PyMemoryView_FromObject in python39.dll
0x7ff8223a1e1f: PyEval_EvalFrameDefault in python39.dll
0x7ff82239e0a3: PyObject_GC_Del in python39.dll
0x7ff8223a4904: PyEval_EvalFrameDefault in python39.dll
0x7ff82239e0a3: PyObject_GC_Del in python39.dll
0x7ff82239fa67: PyFunction_Vectorcall in python39.dll
0x7ff8223807dc: PyVectorcall_Call in python39.dll
0x7ff8223805d7: PyObject_Call in python39.dll
0x7ff82243ea52: Py_MakePendingCalls in python39.dll
0x7ff8224022c6: Py_RunMain in python39.dll
0x7ff8224021d1: Py_RunMain in python39.dll
0x7ff8223fe3f1: Py_Main in python39.dll
0x7ff64df01254: Unknown Function in python.exe
0x7ff8ada97034: BaseThreadInitThunk in KERNEL32.DLL
0x7ff8ade62651: RtlUserThreadStart in ntdll.dll

Internal error occurred. Check out this page for possible solutions:
https://docs.taichi.graphics/lang/articles/misc/install
Backend Qt5Agg is interactive backend. Turning interactive mode on.
========== Taichi Stack Traceback ==========
In _run_module_as_main() at C:\Users\Vineyo\AppData\Local\Programs\Python\Python39\lib\runpy.py:197:
--------------------------------------------
        sys.exit(msg)
    main_globals = sys.modules["__main__"].__dict__
    if alter_argv:
        sys.argv[0] = mod_spec.origin
    return _run_code(code, main_globals, None,  <--
                     "__main__", mod_spec)

--------------------------------------------
In _run_code() at C:\Users\Vineyo\AppData\Local\Programs\Python\Python39\lib\runpy.py:87:
--------------------------------------------
                       __doc__ = None,
                       __loader__ = loader,
                       __package__ = pkg_name,
                       __spec__ = mod_spec)
    exec(code, run_globals)  <--
    return run_globals

--------------------------------------------
In <module>() at c:\Users\Vineyo\.vscode\extensions\ms-python.python-2021.10.1317843341\pythonFiles\lib\python\debugpy\__main__.py:45:       
--------------------------------------------
        del sys.path[0]

    from debugpy.server import cli

    cli.main()  <--
--------------------------------------------
In main() at c:\Users\Vineyo\.vscode\extensions\ms-python.python-2021.10.1317843341\pythonFiles\lib\python\debugpy/..\debugpy\server\cli.py:444:
--------------------------------------------
            "module": run_module,
            "code": run_code,
            "pid": attach_to_pid,
        }[options.target_kind]
        run()  <--
    except SystemExit as exc:
        log.reraise_exception(
--------------------------------------------
In run_file() at c:\Users\Vineyo\.vscode\extensions\ms-python.python-2021.10.1317843341\pythonFiles\lib\python\debugpy/..\debugpy\server\cli.py:285:
--------------------------------------------

    log.describe_environment("Pre-launch environment:")

    log.info("Running file {0!r}", target)
    runpy.run_path(target_as_str, run_name=compat.force_str("__main__"))  <--


--------------------------------------------
In run_path() at C:\Users\Vineyo\AppData\Local\Programs\Python\Python39\lib\runpy.py:268:
--------------------------------------------
    if isinstance(importer, type(None)) or is_NullImporter:
        # Not a valid sys.path entry, so run the code directly
        # execfile() doesn't help as we want to allow compiled files
        code, fname = _get_code_from_file(run_name, path_name)
        return _run_module_code(code, init_globals, run_name,  <--
                                pkg_name=pkg_name, script_name=fname)
    else:
--------------------------------------------
In _run_module_code() at C:\Users\Vineyo\AppData\Local\Programs\Python\Python39\lib\runpy.py:97:
--------------------------------------------
    """Helper to run code in new namespace with sys modified"""
    fname = script_name if mod_spec is None else mod_spec.origin
    with _TempModule(mod_name) as temp_module, _ModifiedArgv0(fname):
        mod_globals = temp_module.module.__dict__
        _run_code(code, mod_globals, init_globals,  <--
                  mod_name, mod_spec, pkg_name, script_name)
    # Copy the globals of the temporary module, as they
--------------------------------------------
In _run_code() at C:\Users\Vineyo\AppData\Local\Programs\Python\Python39\lib\runpy.py:87:
--------------------------------------------
                       __doc__ = None,
                       __loader__ = loader,
                       __package__ = pkg_name,
                       __spec__ = mod_spec)
    exec(code, run_globals)  <--
    return run_globals

--------------------------------------------
In <module>() at c:\Users\Vineyo\Desktop\Ti-Moxi\demo.py:25:
--------------------------------------------
    if(gui.is_pressed(ti.GUI.LMB)):
        moxi.setCursor(gui.get_cursor_pos()[0],gui.get_cursor_pos()[1])
        moxi.drawStrok()
    moxi.update()
    moxi.render()  <--
    gui.set_image(moxi.FrameBuffer)
    gui.show()
--------------------------------------------
In func__() at c:\pyvenv\math\lib\site-packages\taichi\lang\kernel_impl.py:595:
--------------------------------------------
            # gradient. For class kernels, args[0] is always the kernel owner.
            if not self.is_grad and self.runtime.target_tape and not self.runtime.grad_replaced:
                self.runtime.target_tape.insert(self, args)

            t_kernel(launch_ctx)  <--

            ret = None
--------------------------------------------
RuntimeError: [taichi/backends/cuda/cuda_driver.h:taichi::lang::CUDADriverFunction<void * *,char const *,unsigned int,unsigned int *,void * *>::operator ()@86] CUDA Error CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered while calling module_load_data_ex (cuModuleLoadDataEx)
[E 10/15/21 23:51:16.007 28968] [taichi/backends/cuda/cuda_driver.h:taichi::lang::CUDADriverFunction<void *>::operator ()@86] CUDA Error CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered while calling stream_synchronize (cuStreamSynchronize)


***********************************
* Taichi Compiler Stack Traceback *
***********************************
0x7ffff364f81a: ?print_traceback@taichi@@YAXXZ in taichi_core.pyd
0x7ffff351ee79: PyInit_taichi_core in taichi_core.pyd
0x7ffff35b38e3: PyInit_taichi_core in taichi_core.pyd
0x7ffff378ac0e: ?print_traceback@taichi@@YAXXZ in taichi_core.pyd
0x7ffff35ec8d6: PyInit_taichi_core in taichi_core.pyd
0x7ffff35ea4cc: PyInit_taichi_core in taichi_core.pyd
0x7ffff35e756b: PyInit_taichi_core in taichi_core.pyd
0x7ffff347a638: PyInit_taichi_core in taichi_core.pyd
0x7ffff33d4e0e: ?clear@error_already_set@pybind11@@QEAAXXZ in taichi_core.pyd
0x7ffff33dca62: PyInit_taichi_core in taichi_core.pyd
0x7ff8223f58a9: _PyObject_GetDictPtr in python39.dll
0x7ff82236ef84: _PyGC_CollectNoFail in python39.dll
0x7ff82234f2be: PyCapsule_GetPointer in python39.dll
0x7ff8223eec59: PyEval_EvalCodeEx in python39.dll
0x7ff82236eecb: _PyGC_CollectNoFail in python39.dll
0x7ff822379b05: PyModule_GetDict in python39.dll
0x7ff8223b858a: Py_FinalizeEx in python39.dll
0x7ff8224021d6: Py_RunMain in python39.dll
0x7ff8223fe3f1: Py_Main in python39.dll
0x7ff64df01254: Unknown Function in python.exe
0x7ff8ada97034: BaseThreadInitThunk in KERNEL32.DLL
0x7ff8ade62651: RtlUserThreadStart in ntdll.dll

Internal error occurred. Check out this page for possible solutions:
https://docs.taichi.graphics/lang/articles/misc/install
[E 10/15/21 23:51:16.211 28968] Received signal 22 (SIGABRT)


***********************************
* Taichi Compiler Stack Traceback *
***********************************
0x7ffff364f81a: ?print_traceback@taichi@@YAXXZ in taichi_core.pyd
0x7ffff351ee79: PyInit_taichi_core in taichi_core.pyd
0x7ffff36399c8: PyInit_taichi_core in taichi_core.pyd
0x7ff8ab8a1881: raise in ucrtbase.dll
0x7ff8ab8a2851: abort in ucrtbase.dll
0x7ff8ab8a1f9f: terminate in ucrtbase.dll
0x7ff892311aab: __NLG_Return2 in VCRUNTIME140_1.dll
0x7ff892312317: __NLG_Return2 in VCRUNTIME140_1.dll
0x7ff8923140d9: __CxxFrameHandler4 in VCRUNTIME140_1.dll
0x7ff8adeb217f: __chkstk in ntdll.dll
0x7ff8ade61454: RtlRaiseException in ntdll.dll
0x7ff8ade611a5: RtlRaiseException in ntdll.dll
0x7ff8aba74ed9: RaiseException in KERNELBASE.dll
0x7ff892fe6480: _CxxThrowException in VCRUNTIME140.dll
0x7ffff351eec0: PyInit_taichi_core in taichi_core.pyd
0x7ffff35b38e3: PyInit_taichi_core in taichi_core.pyd
0x7ffff378ac0e: ?print_traceback@taichi@@YAXXZ in taichi_core.pyd
0x7ffff35ec8d6: PyInit_taichi_core in taichi_core.pyd
0x7ffff35ea4cc: PyInit_taichi_core in taichi_core.pyd
0x7ffff35e756b: PyInit_taichi_core in taichi_core.pyd
0x7ffff347a638: PyInit_taichi_core in taichi_core.pyd
0x7ffff33d4e0e: ?clear@error_already_set@pybind11@@QEAAXXZ in taichi_core.pyd
0x7ffff33dca62: PyInit_taichi_core in taichi_core.pyd
0x7ff8223f58a9: _PyObject_GetDictPtr in python39.dll
0x7ff82236ef84: _PyGC_CollectNoFail in python39.dll
0x7ff82234f2be: PyCapsule_GetPointer in python39.dll
0x7ff8223eec59: PyEval_EvalCodeEx in python39.dll
0x7ff82236eecb: _PyGC_CollectNoFail in python39.dll
0x7ff822379b05: PyModule_GetDict in python39.dll
0x7ff8223b858a: Py_FinalizeEx in python39.dll
0x7ff8224021d6: Py_RunMain in python39.dll
0x7ff8223fe3f1: Py_Main in python39.dll
0x7ff64df01254: Unknown Function in python.exe
0x7ff8ada97034: BaseThreadInitThunk in KERNEL32.DLL
0x7ff8ade62651: RtlUserThreadStart in ntdll.dll

Internal error occurred. Check out this page for possible solutions:
https://docs.taichi.graphics/lang/articles/misc/install

重点是这两句:

RuntimeError: [taichi/backends/cuda/cuda_driver.h:taichi::lang::CUDADriverFunction<void * *,char const *,unsigned int,unsigned int *,void * *>::operator ()@86] CUDA Error CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered while calling module_load_data_ex (cuModuleLoadDataEx)
[E 10/15/21 23:51:16.007 28968] [taichi/backends/cuda/cuda_driver.h:taichi::lang::CUDADriverFunction<void *>::operator ()@86] CUDA Error CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered while calling stream_synchronize (cuStreamSynchronize)

原代码中的大部分功能都注释掉了,剩下的部分应该跟那个小demo是差不多的,不知道是什么原因会导致这个错误。

1 个赞

Hi! 非常感谢你report这个问题,根据我们的测试,这是一个由于 dynamic SNode allocation 触发的 bug。现在已经被 @AmesingFlank 光速修复了:https://github.com/taichi-dev/taichi/pull/3205 。 下一个稳定版本 v0.8.4 应该就没有这个问题了。

触发bug的位置在这里:https://github.com/Vineyo/Taichi-Moxi/blob/d1add7f7212d3252a5121db57b38fdd6cabee07c/Taichi_Moxi.py#L13-L25 。由于 self.e 是一个 Taichi field,所以你对 self.e 的赋值本质上是生成并调用了一系列 Taichi kernels. 而一旦调用了 Taichi kernel,这之前和之后定义的 fields 会挂在不同的 ti.root 下面 (也就是 dynamic snode allocation的原理)。

所以为了在现在这个版本 v0.8.3 避开这个bug,可能要麻烦你修改一下代码,把所有 Taichi fields 的赋值/调用移动到所有 fields 定义的后面(即先进行定义再进行计算)。 最后再次感谢你的 feedback! :smile:

5 个赞

非常感谢!我还以为我的代码太乱没人会看😂辛苦开发人员了

我改好了,现在可以了,性能好了很多!太感谢了!

1 个赞