在进行强制向下类型转换和更新field时出现报错

我需要将一个int32的数据压缩到int8大小,并在原来field数据的基础上进行累加更新。但是这样会出现如下报错:

[Taichi] version 1.2.1, llvm 10.0.0, commit 12ab828a, linux, python 3.10.4
[Taichi] Starting on arch=cuda
[E 11/13/22 11:55:59.749 242178] [llvm_context.cpp:operator()@78] LLVM Fatal Error: Cannot select: t31: i16,ch = AtomicLoadAdd<(load store seq_cst 1 on %ir.32)> t29:1, t11, t30
  t11: i64,ch = CopyFromReg t0, Register:i64 %6
    t10: i64 = Register %6
  t30: i16 = truncate t8
    t8: i32,ch = CopyFromReg t0, Register:i32 %33
      t7: i32 = Register %33
In function: L0andL1Estimate_c84_0_kernel_0_range_for

我传入的field如下

t1_0 = ti.field(dtype = ti.uint32, shape = t_0_shape)
t2_0 = ti.field(dtype = ti.uint32, shape = t_0_shape)
u0 = ti.field(dtype = ti.int8, shape = [height // 2 + MEDIAN_KERNEL_SIZE_LEVEL_1 - 1, width // 2 + MEDIAN_KERNEL_SIZE_LEVEL_1 - 1], \
        offset=( -(MEDIAN_KERNEL_SIZE_LEVEL_1 // 2), -(MEDIAN_KERNEL_SIZE_LEVEL_1 // 2)))
v0 = ti.field(dtype = ti.int8, shape = [height // 2 + MEDIAN_KERNEL_SIZE_LEVEL_1 - 1, width // 2 + MEDIAN_KERNEL_SIZE_LEVEL_1 - 1], \
        offset=( -(MEDIAN_KERNEL_SIZE_LEVEL_1 // 2), -(MEDIAN_KERNEL_SIZE_LEVEL_1 // 2)))

函数调用为

L0andL1Estimate(t1_0, t2_0, v0, u0)

kernel如下

@ti.kernel
def L0andL1Estimate(t1 : ti.template(), t2 : ti.template(), v : ti.template(), u : ti.template()):
    channels_in, height_in, width_in = t1.shape
    flow_height, flow_width = u.shape

    for x, y in ti.ndrange(flow_height - MEDIAN_KERNEL_SIZE_LEVEL_1 + 1, flow_width - MEDIAN_KERNEL_SIZE_LEVEL_1 + 1):
        block_x, block_y = x + PADDING_SIZE, y + PADDING_SIZE
        
        ans_x, ans_y, ans_hm = PADDING_SIZE, PADDING_SIZE, BIT_WIDTH * channels_in

        for win_x in range(-PADDING_SIZE, PADDING_SIZE + 1):
            for win_y in range(-PADDING_SIZE, PADDING_SIZE + 1):
                hanmingDistance = ti.int32(0)
                for k in range(channels_in):
                    img1_feature = t1[k, block_x, block_y]
                    if ti.static(block_x + v[x, y] + win_x in [0, height_in]) and ti.static(block_y + u[x, y] + win_y in [0, width_in]):
                        img2_feature = t2[k, block_x + v[x, y] + win_x, block_y + u[x, y] + win_y]
                        hanmingDistance += getHammingDistance(img1_feature, img2_feature)
                    else:
                        hanmingDistance += 99999
                
                if hanmingDistance < ans_hm or (hanmingDistance == ans_hm and (win_x ** 2 + win_y ** 2 < ans_x ** 2 + ans_y ** 2)):
                    ans_x, ans_y, ans_hm = win_x, win_y, hanmingDistance

        v[x, y] += ti.int8(ans_x)
        u[x, y] += ti.int8(ans_y)

报错主要时因为最后两行,如果我将其换成直接赋值就没有问题。我看生成的IR中有AtomicLoadAdd字样,但是我并不需要对这个field进行数据竞争的保护,有什么方法关掉吗?
并且, 如果我不进行强制类型转换,那么累加是可以的,不过我估计还是加上了原子操作,有什么办法关掉吗?

你可以重新编辑下你的问题,提供一个更小的复现例子吗?
现在的提问读起来让人有点吃力。

大概这样可以复现?


import taichi as ti

ti.init(arch=ti.gpu, debug = False)

buffer = ti.field(dtype = ti.int8, shape = (9, 9),offset=(-1, -1))

@ti.kernel
def func(input : ti.template()):
    for i, j in ti.ndrange(7, 7):
        buffer[i, j] += ti.int8(i+j)

func(buffer)

报错信息:

[Taichi] version 1.2.1, llvm 10.0.0, commit 12ab828a, linux, python 3.10.4
[Taichi] Starting on arch=cuda
[E 11/14/22 06:20:57.914 311058] [llvm_context.cpp:operator()@78] LLVM Fatal Error: Cannot select: t90: i16,ch = AtomicLoadAdd<(load store seq_cst 1 on %ir.28)> t26:1, t37, t88
  t37: i64 = add t26, t36
    t26: i64,ch = load<(load 8 from %ir.20, addrspace 1)> t0, t23, undef:i64
      t23: i64 = add nuw t21, Constant:i64<72>
        t21: i64 = addrspacecast[0 -> 1] t20
          t20: i64,ch = CopyFromReg t0, Register:i64 %0
            t19: i64 = Register %0
        t22: i64 = Constant<72>
      t25: i64 = undef
    t36: i64 = zero_extend t35
      t35: i32 = or t28, t34
        t28: i32 = and t18, Constant:i32<15>
          t18: i32 = add t10, Constant:i32<1>
            t10: i32 = sub t9, t7
              t9: i32,ch = CopyFromReg t0, Register:i32 %7
                t8: i32 = Register %7
              t7: i32 = mul t69, Constant:i32<7>
                t69: i32 = srl t68, Constant:i32<2>
                  t68: i32 = add t67, t76


                  t64: i32 = Constant<2>
                t5: i32 = Constant<7>
            t17: i32 = Constant<1>
          t27: i32 = Constant<15>
        t34: i32 = and t32, Constant:i32<240>
          t32: i32 = add t111, Constant:i32<16>
            t111: i32 = any_extend t110
              t110: i16 = shl t99, Constant:i32<4>
                t99: i16 = srl t98, Constant:i32<2>
                  t98: i16 = add t97, t107


                  t64: i32 = Constant<2>
                t29: i32 = Constant<4>
            t31: i32 = Constant<16>
          t33: i32 = Constant<240>
  t88: i16 = add nsw t99, t83
    t99: i16 = srl t98, Constant:i32<2>
      t98: i16 = add t97, t107
        t97: i16 = srl t96, Constant:i32<1>
          t96: i16 = sub t86, t107
            t86: i16 = and t84, Constant:i16<255>
              t84: i16 = truncate t9
                t9: i32,ch = CopyFromReg t0, Register:i32 %7
                  t8: i32 = Register %7
              t85: i16 = Constant<255>
            t107: i16 = truncate t106
              t106: i32 = srl t109, Constant:i32<16>
                t109: i32 = NVPTXISD::MUL_WIDE_UNSIGNED t86, Constant:i16<9363>
                  t86: i16 = and t84, Constant:i16<255>


                  t108: i16 = Constant<9363>
                t31: i32 = Constant<16>
          t17: i32 = Constant<1>
        t107: i16 = truncate t106
          t106: i32 = srl t109, Constant:i32<16>
            t109: i32 = NVPTXISD::MUL_WIDE_UNSIGNED t86, Constant:i16<9363>
              t86: i16 = and t84, Constant:i16<255>
                t84: i16 = truncate t9
                  t9: i32,ch = CopyFromReg t0, Register:i32 %7

                t85: i16 = Constant<255>
              t108: i16 = Constant<9363>
            t31: i32 = Constant<16>
      t64: i32 = Constant<2>
    t83: i16 = truncate t10
      t10: i32 = sub t9, t7
        t9: i32,ch = CopyFromReg t0, Register:i32 %7
          t8: i32 = Register %7
        t7: i32 = mul t69, Constant:i32<7>
          t69: i32 = srl t68, Constant:i32<2>
            t68: i32 = add t67, t76
              t67: i32 = srl t66, Constant:i32<1>
                t66: i32 = sub t81, t76
                  t81: i32 = and t79, Constant:i32<255>


                  t76: i32 = truncate t75

                t17: i32 = Constant<1>
              t76: i32 = truncate t75
                t75: i64 = srl t78, Constant:i32<32>
                  t78: i64 = NVPTXISD::MUL_WIDE_UNSIGNED t81, Constant:i32<613566757>


                  t112: i32 = Constant<32>
            t64: i32 = Constant<2>
          t5: i32 = Constant<7>
In function: func_c76_0_kernel_0_range_for


python: /home/leon/repos/llvm-10.0.0.src/lib/CodeGen/SelectionDAG/SelectionDAG.cpp:1039: llvm::SelectionDAG::~SelectionDAG(): Assertion `!UpdateListeners && "Dangling registered DAGUpdateListeners"' failed.
Aborted (core dumped)

诶,这个错误我在 v1.3.0 和 cpu, vulkan 后端上都不能复现,你试一下换一个后端,或者试试 taichi 的 nightly 版本?

确实换了后端没有问题,cpu和vulkan都正常,1.3.0我这里cuda后端还是有同样的问题, 也许我驱动没装好.

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Sun_Mar_21_19:15:46_PDT_2021
Cuda compilation tools, release 11.3, V11.3.58
Build cuda_11.3.r11.3/compiler.29745058_0
[Taichi] version 1.3.0, llvm 10.0.0, commit 38b8ef76, linux, python 3.10.4
[Taichi] Starting on arch=cuda
[E 11/14/22 12:45:47.730 368410] [llvm_context.cpp:operator()@79] LLVM Fatal Error: Cannot select: t90: i16,ch = AtomicLoadAdd<(load store seq_cst 1 on %ir.28)> t26:1, t37, t88
  t37: i64 = add t26, t36
    t26: i64,ch = load<(load 8 from %ir.20, addrspace 1)> t0, t23, undef:i64
      t23: i64 = add nuw t21, Constant:i64<72>
        t21: i64 = addrspacecast[0 -> 1] t20
          t20: i64,ch = CopyFromReg t0, Register:i64 %0
            t19: i64 = Register %0
        t22: i64 = Constant<72>
      t25: i64 = undef
    t36: i64 = zero_extend t35
      t35: i32 = or t28, t34
        t28: i32 = and t18, Constant:i32<15>
          t18: i32 = add t10, Constant:i32<1>
            t10: i32 = sub t9, t7
              t9: i32,ch = CopyFromReg t0, Register:i32 %7
                t8: i32 = Register %7
              t7: i32 = mul t69, Constant:i32<7>
                t69: i32 = srl t68, Constant:i32<2>
                  t68: i32 = add t67, t76


                  t64: i32 = Constant<2>
                t5: i32 = Constant<7>
            t17: i32 = Constant<1>
          t27: i32 = Constant<15>
        t34: i32 = and t32, Constant:i32<240>
          t32: i32 = add t111, Constant:i32<16>
            t111: i32 = any_extend t110
              t110: i16 = shl t99, Constant:i32<4>
                t99: i16 = srl t98, Constant:i32<2>
                  t98: i16 = add t97, t107


                  t64: i32 = Constant<2>
                t29: i32 = Constant<4>
            t31: i32 = Constant<16>
          t33: i32 = Constant<240>
  t88: i16 = add nsw t99, t83
    t99: i16 = srl t98, Constant:i32<2>
      t98: i16 = add t97, t107
        t97: i16 = srl t96, Constant:i32<1>
          t96: i16 = sub t86, t107
            t86: i16 = and t84, Constant:i16<255>
              t84: i16 = truncate t9
                t9: i32,ch = CopyFromReg t0, Register:i32 %7
                  t8: i32 = Register %7
              t85: i16 = Constant<255>
            t107: i16 = truncate t106
              t106: i32 = srl t109, Constant:i32<16>
                t109: i32 = NVPTXISD::MUL_WIDE_UNSIGNED t86, Constant:i16<9363>
                  t86: i16 = and t84, Constant:i16<255>


                  t108: i16 = Constant<9363>
                t31: i32 = Constant<16>
          t17: i32 = Constant<1>
        t107: i16 = truncate t106
          t106: i32 = srl t109, Constant:i32<16>
            t109: i32 = NVPTXISD::MUL_WIDE_UNSIGNED t86, Constant:i16<9363>
              t86: i16 = and t84, Constant:i16<255>
                t84: i16 = truncate t9
                  t9: i32,ch = CopyFromReg t0, Register:i32 %7

                t85: i16 = Constant<255>
              t108: i16 = Constant<9363>
            t31: i32 = Constant<16>
      t64: i32 = Constant<2>
    t83: i16 = truncate t10
      t10: i32 = sub t9, t7
        t9: i32,ch = CopyFromReg t0, Register:i32 %7
          t8: i32 = Register %7
        t7: i32 = mul t69, Constant:i32<7>
          t69: i32 = srl t68, Constant:i32<2>
            t68: i32 = add t67, t76
              t67: i32 = srl t66, Constant:i32<1>
                t66: i32 = sub t81, t76
                  t81: i32 = and t79, Constant:i32<255>


                  t76: i32 = truncate t75

                t17: i32 = Constant<1>
              t76: i32 = truncate t75
                t75: i64 = srl t78, Constant:i32<32>
                  t78: i64 = NVPTXISD::MUL_WIDE_UNSIGNED t81, Constant:i32<613566757>


                  t112: i32 = Constant<32>
            t64: i32 = Constant<2>
          t5: i32 = Constant<7>
In function: func_c76_0_kernel_0_range_for


python: /home/leon/repos/llvm-10.0.0.src/lib/CodeGen/SelectionDAG/SelectionDAG.cpp:1039: llvm::SelectionDAG::~SelectionDAG(): Assertion `!UpdateListeners && "Dangling registered DAGUpdateListeners"' failed.
Aborted (core dumped)

这可能是个 bug,你要不去 repo 上开个 issue?我们这周会有人看一下。