我使用taichi实现了一个自动微分程序,来通过得到的梯度进行参数优化。这个程序在windows GPU版本为GTX 1650中可以正常运行,但是在GTX 4090D中出现了CUDA Error CUDA_ERROR_OUT_OF_MEMORY: out of memory while calling launch_kernel (cuLaunchKernel)的问题。
在4090D中运行的时候,会在程序刚开始占满SWP和MEM,然后突然崩溃,如下图:
报错信息如下:
Traceback (most recent call last):
File “testTC.py”, line 1541, in
TestDiff(i, upx, upz)
File “testTC.py”, line 1446, in TestDiff
loss[None] = tm.sqrt(loss[None])/(fluxsum[None]+Eps)
File “G:\anaconda3\envs\heilo\lib\site-packages\taichi\ad_ad.py”, line 232, in exit
self.grad()
File “G:\anaconda3\envs\heilo\lib\site-packages\taichi\ad_ad.py”, line 269, in grad
func.grad(*args)
File “G:\anaconda3\envs\heilo\lib\site-packages\taichi\lang\kernel_impl.py”, line 1045, in call
return self.launch_kernel(kernel_cpp, *args)
File “G:\anaconda3\envs\heilo\lib\site-packages\taichi\lang\kernel_impl.py”, line 976, in launch_kernel
raise e from None
File “G:\anaconda3\envs\heilo\lib\site-packages\taichi\lang\kernel_impl.py”, line 971, in launch_kernel
prog.launch_kernel(compiled_kernel_data, launch_ctx)
RuntimeError: [taichi/rhi/cuda/cuda_driver.h:taichi::lang::CUDADriverFunction<void *,unsigned int,unsigned int,unsigned int,unsigned int,unsigned int,unsigned int,unsigned int,void *,void * *,void * *>::operator ()@92] CUDA Error CUDA_ERROR_OUT_OF_MEMORY: out of memory while calling launch_kernel (cuLaunchKernel)
需要优化的参数设定在全局:
p0 = ti.field(real, shape=(), needs_grad=True)
p1 = ti.field(real, shape=(), needs_grad=True)
p2 = ti.field(real, shape=(), needs_grad=True)
p3 = ti.field(real, shape=(), needs_grad=True)
p4 = ti.field(real, shape=(), needs_grad=True)
p5 = ti.field(real, shape=(), needs_grad=True)
p6 = ti.field(real, shape=(), needs_grad=True)
自动微分框架如下:
loss[None] = 0
with ti.ad.Tape(loss):
run()
loss_func2()
loss[None] = tm.sqrt(loss[None])/(res_sum[None]+Eps)
update(learn_rate)
run()中存在许多循环和判断,所以性能也比较差,但是不知道为什么在1650可以运行,4090有内存问题。