taichi自动微分程序产生的out of memory while calling launch_kernel问题

WHI · 2025 年3 月 24 日 09:35

我使用taichi实现了一个自动微分程序，来通过得到的梯度进行参数优化。这个程序在windows GPU版本为GTX 1650中可以正常运行，但是在GTX 4090D中出现了CUDA Error CUDA_ERROR_OUT_OF_MEMORY: out of memory while calling launch_kernel (cuLaunchKernel)的问题。
在4090D中运行的时候，会在程序刚开始占满SWP和MEM，然后突然崩溃，如下图：

报错信息如下：

Traceback (most recent call last):
File “testTC.py”, line 1541, in
TestDiff(i, upx, upz)
File “testTC.py”, line 1446, in TestDiff
loss[None] = tm.sqrt(loss[None])/(fluxsum[None]+Eps)
File “G:\anaconda3\envs\heilo\lib\site-packages\taichi\ad_ad.py”, line 232, in exit
self.grad()
File “G:\anaconda3\envs\heilo\lib\site-packages\taichi\ad_ad.py”, line 269, in grad
func.grad(*args)
File “G:\anaconda3\envs\heilo\lib\site-packages\taichi\lang\kernel_impl.py”, line 1045, in call
return self.launch_kernel(kernel_cpp, *args)
File “G:\anaconda3\envs\heilo\lib\site-packages\taichi\lang\kernel_impl.py”, line 976, in launch_kernel
raise e from None
File “G:\anaconda3\envs\heilo\lib\site-packages\taichi\lang\kernel_impl.py”, line 971, in launch_kernel
prog.launch_kernel(compiled_kernel_data, launch_ctx)
RuntimeError: [taichi/rhi/cuda/cuda_driver.h:taichi::lang::CUDADriverFunction<void *,unsigned int,unsigned int,unsigned int,unsigned int,unsigned int,unsigned int,unsigned int,void *,void * *,void * *>::operator ()@92] CUDA Error CUDA_ERROR_OUT_OF_MEMORY: out of memory while calling launch_kernel (cuLaunchKernel)

需要优化的参数设定在全局：

p0 = ti.field(real, shape=(), needs_grad=True)
p1 = ti.field(real, shape=(), needs_grad=True)
p2 = ti.field(real, shape=(), needs_grad=True)
p3 = ti.field(real, shape=(), needs_grad=True)
p4 = ti.field(real, shape=(), needs_grad=True)
p5 = ti.field(real, shape=(), needs_grad=True)
p6 = ti.field(real, shape=(), needs_grad=True)

自动微分框架如下：

loss[None] = 0
with ti.ad.Tape(loss):
    run()
    loss_func2()
    loss[None] = tm.sqrt(loss[None])/(res_sum[None]+Eps)
update(learn_rate)

run()中存在许多循环和判断，所以性能也比较差，但是不知道为什么在1650可以运行，4090有内存问题。