**Describe the bug**
Running any example with CUDA backend cause `RuntimeError:… [cuda_driver.h:operator()@68] CUDA Error CUDA_ERROR_NOT_FOUND: named symbol not found while calling module_get_function (cuModuleGetFunction)`.
But works fine when I specify `TI_ARCH=opengl`, therefore not thrown from `with_cuda`.
**Log/Screenshots**
```
[bate@archit ~]$ TI_ARCH=cuda ti example simple_uv
[Taichi] mode=release
[Taichi] version 0.6.11, supported archs: [cpu, cuda, opengl], commit 762aca58, python 3.8.2
*******************************************
** Taichi Programming Language **
*******************************************
Running example simple_uv ...
Following TI_ARCH setting up for arch=cuda
[E 06/17/20 14:42:17.324] [cuda_driver.h:operator()@68] CUDA Error CUDA_ERROR_NOT_FOUND: named symbol not found while calling module_get_function (cuModuleGetFunction)
***********************************
* Taichi Compiler Stack Traceback *
***********************************
/home/bate/.local/lib/python3.8/site-packages/taichi/core/../lib/taichi_core.so: taichi::Logger::error(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool)
/home/bate/.local/lib/python3.8/site-packages/taichi/core/../lib/taichi_core.so: taichi::lang::CUDADriverFunction<void**, void*, char const*>::operator()(void**, void*, char const*)
/home/bate/.local/lib/python3.8/site-packages/taichi/core/../lib/taichi_core.so: taichi::lang::JITModuleCUDA::lookup_function(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
/home/bate/.local/lib/python3.8/site-packages/taichi/core/../lib/taichi_core.so: taichi::lang::JITModuleCUDA::launch(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, unsigned long, unsigned long, std::vector<void*, std::allocator<void*> > const&)
/home/bate/.local/lib/python3.8/site-packages/taichi/core/../lib/taichi_core.so: void taichi::lang::JITModule::call<void*, void*>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, void*, void*)
/home/bate/.local/lib/python3.8/site-packages/taichi/core/../lib/taichi_core.so: taichi::lang::Program::initialize_runtime_system(taichi::lang::StructCompiler*)
/home/bate/.local/lib/python3.8/site-packages/taichi/core/../lib/taichi_core.so: taichi::lang::Program::materialize_layout()
/home/bate/.local/lib/python3.8/site-packages/taichi/core/../lib/taichi_core.so: taichi::lang::layout(std::function<void ()> const&)
/home/bate/.local/lib/python3.8/site-packages/taichi/core/../lib/taichi_core.so(+0x7f2639) [0x7f0c47428639]
/home/bate/.local/lib/python3.8/site-packages/taichi/core/../lib/taichi_core.so(+0x6e5d70) [0x7f0c4731bd70]
/usr/lib/libpython3.8.so.1.0: PyCFunction_Call
/usr/lib/libpython3.8.so.1.0: _PyObject_MakeTpCall
/usr/lib/libpython3.8.so.1.0: _PyEval_EvalFrameDefault
/usr/lib/libpython3.8.so.1.0: _PyEval_EvalCodeWithName
/usr/lib/libpython3.8.so.1.0: _PyFunction_Vectorcall
/usr/lib/libpython3.8.so.1.0: _PyEval_EvalFrameDefault
/usr/lib/libpython3.8.so.1.0: _PyEval_EvalCodeWithName
/usr/lib/libpython3.8.so.1.0(+0x1e0902) [0x7f0c575af902]
/usr/lib/libpython3.8.so.1.0: _PyEval_EvalFrameDefault
/usr/lib/libpython3.8.so.1.0: _PyEval_EvalCodeWithName
/usr/lib/libpython3.8.so.1.0: _PyFunction_Vectorcall
/usr/lib/libpython3.8.so.1.0: _PyObject_FastCallDict
/usr/lib/libpython3.8.so.1.0: _PyObject_Call_Prepend
/usr/lib/libpython3.8.so.1.0(+0x23d0e9) [0x7f0c5760c0e9]
/usr/lib/libpython3.8.so.1.0: PyObject_Call
/usr/lib/libpython3.8.so.1.0: _PyEval_EvalFrameDefault
/usr/lib/libpython3.8.so.1.0: _PyEval_EvalCodeWithName
/usr/lib/libpython3.8.so.1.0: _PyFunction_Vectorcall
/usr/lib/libpython3.8.so.1.0: _PyEval_EvalFrameDefault
/usr/lib/libpython3.8.so.1.0: _PyEval_EvalCodeWithName
/usr/lib/libpython3.8.so.1.0: PyEval_EvalCode
/usr/lib/libpython3.8.so.1.0(+0x26d3ed) [0x7f0c5763c3ed]
/usr/lib/libpython3.8.so.1.0(+0x141e67) [0x7f0c57510e67]
/usr/lib/libpython3.8.so.1.0: _PyEval_EvalFrameDefault
/usr/lib/libpython3.8.so.1.0: _PyEval_EvalCodeWithName
/usr/lib/libpython3.8.so.1.0: _PyFunction_Vectorcall
/usr/lib/libpython3.8.so.1.0: _PyEval_EvalFrameDefault
/usr/lib/libpython3.8.so.1.0: _PyEval_EvalCodeWithName
/usr/lib/libpython3.8.so.1.0: _PyFunction_Vectorcall
/usr/lib/libpython3.8.so.1.0: _PyEval_EvalFrameDefault
/usr/lib/libpython3.8.so.1.0: _PyEval_EvalCodeWithName
/usr/lib/libpython3.8.so.1.0: _PyFunction_Vectorcall
/usr/lib/libpython3.8.so.1.0: _PyEval_EvalFrameDefault
/usr/lib/libpython3.8.so.1.0: _PyEval_EvalCodeWithName
/usr/lib/libpython3.8.so.1.0(+0x1e0902) [0x7f0c575af902]
/usr/lib/libpython3.8.so.1.0: _PyEval_EvalFrameDefault
/usr/lib/libpython3.8.so.1.0: _PyFunction_Vectorcall
/usr/lib/libpython3.8.so.1.0: PyObject_Call
/usr/lib/libpython3.8.so.1.0: _PyEval_EvalFrameDefault
/usr/lib/libpython3.8.so.1.0: _PyEval_EvalCodeWithName
/usr/lib/libpython3.8.so.1.0: _PyFunction_Vectorcall
/usr/lib/libpython3.8.so.1.0: _PyObject_FastCallDict
/usr/lib/libpython3.8.so.1.0: _PyObject_Call_Prepend
/usr/lib/libpython3.8.so.1.0(+0x23d0e9) [0x7f0c5760c0e9]
/usr/lib/libpython3.8.so.1.0: _PyObject_MakeTpCall
/usr/lib/libpython3.8.so.1.0: _PyEval_EvalFrameDefault
/usr/lib/libpython3.8.so.1.0: _PyFunction_Vectorcall
/usr/lib/libpython3.8.so.1.0: _PyEval_EvalFrameDefault
/usr/lib/libpython3.8.so.1.0: _PyEval_EvalCodeWithName
/usr/lib/libpython3.8.so.1.0: PyEval_EvalCode
/usr/lib/libpython3.8.so.1.0(+0x2668c8) [0x7f0c576358c8]
/usr/lib/libpython3.8.so.1.0(+0x26aba3) [0x7f0c57639ba3]
/usr/lib/libpython3.8.so.1.0: PyRun_FileExFlags
/usr/lib/libpython3.8.so.1.0: PyRun_SimpleFileExFlags
/usr/lib/libpython3.8.so.1.0: Py_RunMain
/usr/lib/libpython3.8.so.1.0: Py_BytesMain
/usr/lib/libc.so.6: __libc_start_main
/usr/bin/python: _start
Internal Error occurred, check this page for possible solutions:
https://taichi.readthedocs.io/en/stable/install.html#troubleshooting
Traceback (most recent call last):
File "/home/bate/.local/bin/ti", line 8, in <module>
sys.exit(main())
File "/home/bate/.local/lib/python3.8/site-packages/taichi/main.py", line 948, in main
return cli()
File "/home/bate/.local/lib/python3.8/site-packages/taichi/main.py", line 27, in wrapper
result = func(*args, **kwargs)
File "/home/bate/.local/lib/python3.8/site-packages/taichi/main.py", line 95, in __call__
return getattr(self, args.command)(sys.argv[2:])
File "/home/bate/.local/lib/python3.8/site-packages/taichi/main.py", line 210, in example
runpy.run_path(target, run_name='__main__')
File "/usr/lib/python3.8/runpy.py", line 263, in run_path
return _run_module_code(code, init_globals, run_name,
File "/usr/lib/python3.8/runpy.py", line 96, in _run_module_code
_run_code(code, mod_globals, init_globals,
File "/usr/lib/python3.8/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/bate/.local/lib/python3.8/site-packages/taichi/examples/simple_uv.py", line 20, in <module>
paint()
File "/home/bate/.local/lib/python3.8/site-packages/taichi/lang/kernel.py", line 527, in wrapped
return primal(*args, **kwargs)
File "/home/bate/.local/lib/python3.8/site-packages/taichi/lang/kernel.py", line 458, in __call__
self.materialize(key=key, args=args, arg_features=arg_features)
File "/home/bate/.local/lib/python3.8/site-packages/taichi/lang/kernel.py", line 251, in materialize
self.runtime.materialize()
File "/home/bate/.local/lib/python3.8/site-packages/taichi/lang/impl.py", line 177, in materialize
taichi_lang_core.layout(layout)
RuntimeError: [cuda_driver.h:operator()@68] CUDA Error CUDA_ERROR_NOT_FOUND: named symbol not found while calling module_get_function (cuModuleGetFunction)
[bate@archit ~]$ nvidia-smi
Wed Jun 17 14:42:28 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.82 Driver Version: 440.82 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce 940MX Off | 00000000:02:00.0 Off | N/A |
| N/A 40C P0 N/A / N/A | 0MiB / 2004MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
```
**To Reproduce**
Run any code with CUDA backend.
`TI_USE_UNIFIED_MEMORY=0` won't fix this.
**If you have local commits (e.g. compile fixes before you reproduce the bug), please make sure you first make a PR to fix the build errors and then report the bug.**
My CUDA version is 10.2, is `cuModuleGetFunction` deprecated there?
Could you make the Jeksins CUDA CI test against different cards (with unified memory or not), and different CUDA version (10.0~10.2)? @yuanming-hu