我需要将一个int32的数据压缩到int8大小,并在原来field数据的基础上进行累加更新。但是这样会出现如下报错:
[Taichi] version 1.2.1, llvm 10.0.0, commit 12ab828a, linux, python 3.10.4
[Taichi] Starting on arch=cuda
[E 11/13/22 11:55:59.749 242178] [llvm_context.cpp:operator()@78] LLVM Fatal Error: Cannot select: t31: i16,ch = AtomicLoadAdd<(load store seq_cst 1 on %ir.32)> t29:1, t11, t30
t11: i64,ch = CopyFromReg t0, Register:i64 %6
t10: i64 = Register %6
t30: i16 = truncate t8
t8: i32,ch = CopyFromReg t0, Register:i32 %33
t7: i32 = Register %33
In function: L0andL1Estimate_c84_0_kernel_0_range_for
我传入的field如下
t1_0 = ti.field(dtype = ti.uint32, shape = t_0_shape)
t2_0 = ti.field(dtype = ti.uint32, shape = t_0_shape)
u0 = ti.field(dtype = ti.int8, shape = [height // 2 + MEDIAN_KERNEL_SIZE_LEVEL_1 - 1, width // 2 + MEDIAN_KERNEL_SIZE_LEVEL_1 - 1], \
offset=( -(MEDIAN_KERNEL_SIZE_LEVEL_1 // 2), -(MEDIAN_KERNEL_SIZE_LEVEL_1 // 2)))
v0 = ti.field(dtype = ti.int8, shape = [height // 2 + MEDIAN_KERNEL_SIZE_LEVEL_1 - 1, width // 2 + MEDIAN_KERNEL_SIZE_LEVEL_1 - 1], \
offset=( -(MEDIAN_KERNEL_SIZE_LEVEL_1 // 2), -(MEDIAN_KERNEL_SIZE_LEVEL_1 // 2)))
函数调用为
L0andL1Estimate(t1_0, t2_0, v0, u0)
kernel如下
@ti.kernel
def L0andL1Estimate(t1 : ti.template(), t2 : ti.template(), v : ti.template(), u : ti.template()):
channels_in, height_in, width_in = t1.shape
flow_height, flow_width = u.shape
for x, y in ti.ndrange(flow_height - MEDIAN_KERNEL_SIZE_LEVEL_1 + 1, flow_width - MEDIAN_KERNEL_SIZE_LEVEL_1 + 1):
block_x, block_y = x + PADDING_SIZE, y + PADDING_SIZE
ans_x, ans_y, ans_hm = PADDING_SIZE, PADDING_SIZE, BIT_WIDTH * channels_in
for win_x in range(-PADDING_SIZE, PADDING_SIZE + 1):
for win_y in range(-PADDING_SIZE, PADDING_SIZE + 1):
hanmingDistance = ti.int32(0)
for k in range(channels_in):
img1_feature = t1[k, block_x, block_y]
if ti.static(block_x + v[x, y] + win_x in [0, height_in]) and ti.static(block_y + u[x, y] + win_y in [0, width_in]):
img2_feature = t2[k, block_x + v[x, y] + win_x, block_y + u[x, y] + win_y]
hanmingDistance += getHammingDistance(img1_feature, img2_feature)
else:
hanmingDistance += 99999
if hanmingDistance < ans_hm or (hanmingDistance == ans_hm and (win_x ** 2 + win_y ** 2 < ans_x ** 2 + ans_y ** 2)):
ans_x, ans_y, ans_hm = win_x, win_y, hanmingDistance
v[x, y] += ti.int8(ans_x)
u[x, y] += ti.int8(ans_y)
报错主要时因为最后两行,如果我将其换成直接赋值就没有问题。我看生成的IR中有AtomicLoadAdd
字样,但是我并不需要对这个field进行数据竞争的保护,有什么方法关掉吗?
并且, 如果我不进行强制类型转换,那么累加是可以的,不过我估计还是加上了原子操作,有什么办法关掉吗?