我正在尝试用Taichi加速以下问题:
对一个batch中的每个character的手部和其他部位(通过关节点joint区分)进行穿模检测。
- 想请教一下taichi kernel是否只能对一序列计算的最外层for-loop并行化(看文档,kernel无法调用kernel)?
参考以下伪代码,在调用detect_batch_collision
的时候,会有好几层的for-loop,按理说每一层的for-loop都可以受益于并行化计算(比如将内部的node_to_node_overlap_test
也并行化)。 - 如果taichi不可以做到多层for-loop并行化的话,是否有其他的工具、方法建议?(CUDA、代码编写方式等)
- 即使只用到最外层的并行,在保证速度情况下(减少cpu-gpu的overhead)是否需要将kernel中调用的计算代码都转化为taichi-function?
代码:
def detect_batch_collision(batch_character_bvh):
for hand_bvh, body_bvh in batch_character_bvh:
detect_character_collision(hand_bvh, body_bvh)
def detect_character_collision(hand_bvh, body_bvh):
for body_part_bvh in body_bvh:
detect_bvh_node_collision(hand_bvh, body_part_bvh)
def detect_bvh_node_collision(b1, b2):
# TODO: use stack to replace recursion
if not aabb_to_aabb_overlap_test(b1.aabb, b2.aabb):
return False
if b1.is_leaf and b2.is_leaf:
return node_to_node_overlap_test(b1, b2)
elif b1.is_leaf:
return detect_bvh_node_collision(b1, b2.left) or \
detect_bvh_node_collision(b1, b2.right)
elif b2.is_leaf:
return detect_bvh_node_collision(b1.left, b2) or \
detect_bvh_node_collision(b1.right, b2)
else:
for b1_child in (b1.left, b1.right):
for b2_child in (b2.left, b2.right):
if detect_bvh_node_collision(b1_child, b2_child):
return True
return False
def node_to_node_overlap_test(a, b):
for a_tri in a.triangles:
for b_tri in b.triangles:
if tri_tri_overlap_test_3d(a_tri, b_tri):
return True
return False