我尝试实现Kogge-Stone网络计算数组前缀和。其代码类似:
@ti.data_orianted
class KoggeStone:
def __init__(self):
# double buffer swap after every pass
self.input = ti.i32.field()
self.output = ti.i32. field()
@ti.kernel
def scan(self):
# 每个pass打印相同值,预期打印值为上一个pass的输出
print(self.input)
for i in range(self.input):
self.output[i] = do_something(self.input, i)
def add(self):
# for multiple passes
for i in range(n_pass):
self.scan()
# swap double buffer
tmp = self.input
self.input = self.output
self.output = tmp
我发现每个pass中input和output 未成功交换
但当我将scan改写为用ti.template()
实现时,能达到预期结果:
@ti.data_orianted
class KoggeStone:
def __init__(self):
# double buffer swap after every pass
self.input = ti.i32.field()
self.output = ti.i32. field()
@ti.kernel
def scan(self, input: ti.template(), output: ti.template()):
# 达到预期结果
print(input)
for i in range(input):
output[i] = do_something(input, i)
def add(self):
# for multiple passes
for i in range(n_pass):
self.scan(self.input, self.output)
# swap double buffer
tmp = self.input
self.input = self.output
self.output = tmp
我的问题有两个
- 是什么导致了这一现象?
- template在每次调用时都会实例化一次,我后面这种实现方式会给性能带来多大额外开销?有更好的double buffer实现方式吗?