在使用cuda时F.cross_entropy()报错

qhz991029 · 2022 年9 月 29 日 15:24

你好！
我希望使用taichi加速两个用于统计数据的函数，以方便统计深度学习模型训练后的效果

@ti.kernel
def type_count(types: ti.types.ndarray(), pos: ti.types.ndarray()):
    for i in range(pos.shape[0]):
        types[int(pos[i])] += 1

@ti.kernel
def type_err_count(err_count: ti.types.ndarray(), if_correct: ti.types.ndarray(), op_type: ti.types.ndarray()):
    for i in range(if_correct.shape[0]):
        if if_correct[i] == 0:
            err_count[int(op_type[i])] += 1

但是当我ti.init(arch=ti.cuda)后，损失函数会发生报错：

Traceback (most recent call last):
  File "/home/huaizhi_qu/workspace/Occupancy_prediction/dnnperf_code/dnnperf.py", line 353, in <module>
    model = model_train(args, trainset, validset, testset)
  File "/home/huaizhi_qu/workspace/Occupancy_prediction/dnnperf_code/train_def.py", line 157, in model_train
    loss_mask = loss_func_mask(mask, label_mask)
  File "/home/huaizhi_qu/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/huaizhi_qu/anaconda3/lib/python3.9/site-packages/torch/nn/modules/loss.py", line 1164, in forward
    return F.cross_entropy(input, target, weight=self.weight,
  File "/home/huaizhi_qu/anaconda3/lib/python3.9/site-packages/torch/nn/functional.py", line 3014, in cross_entropy
    return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index, label_smoothing)
RuntimeError: CUDA error: invalid argument

我确定自己输入cross_entropy的两个参数都是符合要求的。而且当taichi运行在cpu或者不使用taichi的时候都可以得到正确的结果，只有使用cuda的时候会出现错误。

YuPeng · 2022 年9 月 29 日 16:34

看起来你的错误信息和你贴出的两个kernel函数没有什么关系？

mzhang · 2022 年9 月 30 日 03:15

这里是同时使用pytorch和taichi遇到了这个问题吗？有没有一个能够复现这个问题的代码呢

qhz991029 · 2022 年9 月 30 日 08:04

在更新到1.1.3后问题解决了