如何在太极中使用稀疏网格

beidou · 2023 年4 月 6 日 07:31

为啥要用稀疏网格？

大家好，我们来讲解一下如何在太极中使用稀疏网格。稀疏网格的作用是节约内存。为什么要特意地去节约内存呢？主要有两个原因：

GPU上显存寸土寸金，目前4090才24GB显存。
三维网格这种数据结构如果不是稀疏的，很容易有很大的内存开销。

为了说明第二点，我们来粗略地估计一下网格的内存开销有多大。假设我们有一个三维网格，它的大小是1000x1000x1000，每个网格中仅仅存一个32位vec3的数据（比如说就是pos），那么这个网格的内存开销是多少呢？我们来算一下：

1000x1000x1000x4x3 = 12GB

也就是说，我们仅仅存了一个vec3的数据，就需要12GB的内存。这个内存开销是非常大的。

这种情况几乎只出现在三维网格上。如果是存粒子，基本不需要担心内存开销。比如我们要用100万个粒子，也就存一个pos。那么这个内存开销是多少呢？我们来算一下：

100万x4x3 = 12MB

也就是，即使存100万个粒子，也只需要12MB的内存。这个内存开销是非常小的。

并且，对于二维的网格，内存开销也比较小的。同样是分辨率为1000的网格，那么内存开销仅为：

1000x1000x4x2 = 8MB

也就是说，对于二维网格只需要8MB的内存。这个内存开销同样是非常小的。

所以我们说，稀疏数据结构几乎只在1.三维 2.网格这种数据结构上需要。并且一旦模拟的规模上去了，很容易出现显存不够的情况。所以我们需要稀疏网格。（当然为了简便后面的例子实际用的是二维网格）

接下来我们言归正传，介绍如何在太极中使用稀疏网格。

struct field

首先我们从文档中的最简单的struct field开始，代码来自于官方文档，我们这里只是复制粘贴一下：

import taichi as ti
ti.init()

# vec3 is a built-in vector type suppied in the `taichi.math` module
vec3 = ti.math.vec3
n = 10
# Declares a struct comprising three vectors and one floating-point number
particle = ti.types.struct(
  pos=vec3, vel=vec3, acc=vec3, mass=float,
)
# Declares a 1D field of the struct particle by calling field()
particle_field = particle.field(shape=(n,))

particle_field[0].pos = vec3(1,2,3) 
print(particle_field[0].pos)

总共有三步

用 particle = ti.types.struct定义类型(这里particle是一个struct)
用particle.field(shape=(n,))定义field一个sturct field
用particle_field[0].pos = vec3(1,2,3) 来使用这个sturct field

封装一下struct field

接下来我们仅仅把这个东西封装为类，其他的都不变。

import taichi as ti
ti.init()

vec3 = ti.math.vec3
n = 10
struct_type = ti.types.struct(
  pos=vec3, vel=vec3, acc=vec3, mass=float,
)

@ti.data_oriented
class Particle():
        def __init__(self, struct_type, shape):
                self.field = struct_type.field(shape=shape)
p = Particle(struct_type, (n,))
p.field[0].pos = vec3(1,2,3)
print(p.field[0].pos)

这个类的实例化需要传入两个参数：struct_type和shape，其中struct_type就是刚才的particle。

封装之后，我们在使用的时候需要两步：

通过p = Particle(struct_type, (n,)) 来实例化一个Particle类的对象p
通过p.field[0].pos = vec3(1,2,3)来访问field中的元素

其中struct_type仍然是 struct_type = ti.types.struct(pos=vec3, vel=vec3, acc=vec3, mass=float,)

pbf2d的网格封装为struct field

接下来我们试试用这个类来实现官方example中pbf2d.py中的网格数据结构：

一个网格内有两个数据：

一个是grid_num_particles，代表当前这个网格里面有几个粒子
一个是grid2particles，代表当前这个网格里面的粒子的索引

import taichi as ti
ti.init()

vec3 = ti.math.vec3
grid_size = (10,10)
max_num_particles_per_cell = 5
struct_type = ti.types.struct(
  grid_num_particles=ti.i32,
  grid2particles=ti.types.vector(max_num_particles_per_cell, ti.i32)
)

@ti.data_oriented
class Grid():
        def __init__(self, struct_type, shape):
                self.field = struct_type.field(shape=(shape))
g = Grid(struct_type, grid_size)
g.field[0,0].grid_num_particles = 2
g.field[0,0].grid2particles[0] = 17
print(g.field[0,0].grid_num_particles)
print(g.field[0,0].grid2particles[0])

我们还可以稍微改一下写法，用一个dict设定struct的类型，然后用ti.Struct.field()来创建struct field。这两种是完全一样的，只不过写法不同。

import taichi as ti
ti.init()

vec3 = ti.math.vec3
grid_size = (10,10)
max_num_particles_per_cell = 5
type_dict = {
        "grid_num_particles": ti.i32,
        "grid2particles": ti.types.vector(max_num_particles_per_cell, ti.i32)
}

@ti.data_oriented
class Grid():
        def __init__(self, type_dict, shape):
                self.field = ti.Struct.field(type_dict, shape=(shape))
g = Grid(type_dict, grid_size)
g.field[0,0].grid_num_particles = 2
g.field[0,0].grid2particles[0] = 17
print(g.field[0,0].grid_num_particles)
print(g.field[0,0].grid2particles[0])

我们只不过把struct_type换为了type_dict，并且把struct_type.field(shape=(shape))换为了ti.Struct.field(type_dict, shape=(shape))。其他的都一样。

稀疏网格（重点）

接下来，我们尝试将Grid改为SparseGrid。也就是利用taichi的sparse data structure来实现。

sparse data structure的用法大致如下：

首先设定snode, 就是设定内存的排布方式。
挂载实际的field。

import taichi as ti
ti.init()
max_num_particles_per_cell = 5
grid_size = (10,10)
type_dict = {
        "grid_num_particles": ti.i32,
        "grid2particles": ti.types.vector(max_num_particles_per_cell, ti.i32)
}
@ti.data_oriented
class SparseGrid():
    def __init__(self, type_dict, shape):
        self.field = ti.Struct.field(type_dict)
        self.snode = ti.root.bitmasked(ti.ij, shape)
        self.snode.place(self.field)

sp = SparseGrid(type_dict=type_dict, shape=grid_size)
grid = sp.field
grid[0,0].grid_num_particles = 2
grid[0,0].grid2particles[0] = 17
print("grid[0,0].grid_num_particles=", grid[0,0].grid_num_particles)
print("grid[0,0].grid2particles[0]=", grid[0,0].grid2particles[0])

在上面例子的__init__函数中，我们增加了两行

self.snode = ti.root.bitmasked(ti.ij, shape)
self.snode.place(self.field)

首先我们建立一个snode。我们指定了bitmasked这种方式，这代表taichi会为我们分配一个bit来判断这个网格是否被使用。其他方式还有dense，pointer和dynamic。这里我们就只用bitmasked。然后ti.ij表示了我们的内存的排布方式。因为我们的网格是二维的，所以我们用ti.ij来表示。其他的还包括ti.i， ti.j， ti.ijk等等。

然后在snode上面挂载field数据。我们只需要用place这个API即可。请记住field是一个struct field。

最后我们给sp.field 起了个别名叫grid。这样，在使用的时候，我们只需要用grid[0,0].grid_num_particles = 2这样的写法来访问其中的数据。这种写法非常符合我们的直觉。

看一下使用率

当然，我们使用sparse data的主要目的就是为了节约内存。那么有没有方法查看网格的使用率呢？为此，我们需要增加一个usage函数。在这个函数中，我们用struct-for遍历所有的网格，然后用is_active()函数来判断当前网格是否被使用。

import taichi as ti
ti.init()
max_num_particles_per_cell = 5
grid_size = (10,10)
type_dict = {
        "grid_num_particles": ti.i32,
        "grid2particles": ti.types.vector(max_num_particles_per_cell, ti.i32)
}
@ti.data_oriented
class SparseGrid():
    def __init__(self, type_dict, shape):
        self.field = ti.Struct.field(type_dict)
        self.snode = ti.root.bitmasked(ti.ij, shape) # grid_snode
        self.snode.place(self.field)
        self.shape = shape
    
    @ti.kernel
    def usage(self)->ti.f32:
        cnt = 0
        for I in ti.grouped(self.snode):
            if ti.is_active(self.snode, I):
                cnt+=1
        usage =  cnt/(self.shape[0]*self.shape[1])
        print("Grid usage: ", usage)
        return usage

sp = SparseGrid(type_dict=type_dict, shape=grid_size)
grid = sp.field
grid[0,0].grid_num_particles = 2
grid[0,0].grid2particles[0] = 17
print("grid[0,0].grid_num_particles=", grid[0,0].grid_num_particles)
print("grid[0,0].grid2particles[0]=", grid[0,0].grid2particles[0])
sp.usage()

在这部分代码里，我们只是增加了usage这个成员函数。用struct-for访问的时候，会自动跳过没有激活的网格。而range-for则会激活所有的网格。我们遍历并加和所有激活的网格，然后除以总的网格数，就可以得到使用率了。

我们这里可以看到使用率是0.01左右，也就是说，我们的网格只有1%的网格被使用。因为我们只激活了一个网格，所以使用率是1/100。

于是我们节约了剩下的99%的内存。但是由于taichi中所有内存都是预分配的。一旦ti.init()，所有内存都已经被分配完毕。所以从系统占用率上来看，我们看不出节约了内存。但是程序内部可以使用节约下来的内存。

但是我们的程序还有一点美中不足，那就是一旦激活了网格，就一直激活。当然，大部分情况下我们只需要这样就行了。因为去激活和熄灭网格是需要额外的计算量的。

但是假如你希望比如隔个500步就去重新熄灭再激活一下，也是可以的。只需要使用sp.snode.deactivate_all()即可。这样会熄灭所有这个snode的网格。然后再重新激活需要的网格即可。我们这里就不再赘述了。

用稀疏网格改写一下完整的pbf2d

接下来我们就去用上面学到的技巧去改写pbf2d这个taichi的官方example。我们啥都不改，除了把网格的数据结构改成了sparse data structure。也就是grid_num_particles和grid2particles这两个数据结构。

# Macklin, M. and Müller, M., 2013. Position based fluids. ACM Transactions on Graphics (TOG), 32(4), p.104.
# Taichi implementation by Ye Kuang (k-ye)

import math

import numpy as np

import taichi as ti

ti.init(arch=ti.gpu)

screen_res = (800, 400)
screen_to_world_ratio = 10.0
boundary = (screen_res[0] / screen_to_world_ratio,
            screen_res[1] / screen_to_world_ratio)
cell_size = 2.51
cell_recpr = 1.0 / cell_size


def round_up(f, s):
    return (math.floor(f * cell_recpr / s) + 1) * s


grid_size = (round_up(boundary[0], 1), round_up(boundary[1], 1))

dim = 2
bg_color = 0x112f41
particle_color = 0x068587
boundary_color = 0xebaca2
num_particles_x = 60
num_particles = num_particles_x * 20
max_num_particles_per_cell = 100
max_num_neighbors = 100
time_delta = 1.0 / 20.0
epsilon = 1e-5
particle_radius = 3.0
particle_radius_in_world = particle_radius / screen_to_world_ratio

# PBF params
h_ = 1.1
mass = 1.0
rho0 = 1.0
lambda_epsilon = 100.0
pbf_num_iters = 5
corr_deltaQ_coeff = 0.3
corrK = 0.001
# Need ti.pow()
# corrN = 4.0
neighbor_radius = h_ * 1.05

poly6_factor = 315.0 / 64.0 / math.pi
spiky_grad_factor = -45.0 / math.pi

old_positions = ti.Vector.field(dim, float)
positions = ti.Vector.field(dim, float)
velocities = ti.Vector.field(dim, float)
# grid_num_particles = ti.field(int)
# grid2particles = ti.field(int)
particle_num_neighbors = ti.field(int)
particle_neighbors = ti.field(int)
lambdas = ti.field(float)
position_deltas = ti.Vector.field(dim, float)
# 0: x-pos, 1: timestep in sin()
board_states = ti.Vector.field(2, float)

ti.root.dense(ti.i, num_particles).place(old_positions, positions, velocities)
# grid_snode = ti.root.dense(ti.ij, grid_size)
# grid_snode.place(grid_num_particles)
# grid_snode.dense(ti.k, max_num_particles_per_cell).place(grid2particles)
nb_node = ti.root.dense(ti.i, num_particles)
nb_node.place(particle_num_neighbors)
nb_node.dense(ti.j, max_num_neighbors).place(particle_neighbors)
ti.root.dense(ti.i, num_particles).place(lambdas, position_deltas)
ti.root.place(board_states)


type_dict = {
        "grid_num_particles": ti.i32,
        "grid2particles": ti.types.vector(max_num_particles_per_cell, ti.i32)
}
@ti.data_oriented
class SparseGrid():
    def __init__(self, type_dict, shape):
        self.field = ti.Struct.field(type_dict)
        self.snode = ti.root.bitmasked(ti.ij, shape) # grid_snode
        self.snode.place(self.field)
        self.shape = shape
    
    @ti.kernel
    def usage(self)->ti.f32:
        cnt = 0
        for I in ti.grouped(self.snode):
            if ti.is_active(self.snode, I):
                cnt+=1
        usage =  cnt/(self.shape[0]*self.shape[1])
        print("Grid usage: ", usage)
        return usage

sp = SparseGrid(type_dict=type_dict, shape=grid_size)
grid = sp.field


@ti.func
def poly6_value(s, h):
    result = 0.0
    if 0 < s and s < h:
        x = (h * h - s * s) / (h * h * h)
        result = poly6_factor * x * x * x
    return result


@ti.func
def spiky_gradient(r, h):
    result = ti.Vector([0.0, 0.0])
    r_len = r.norm()
    if 0 < r_len and r_len < h:
        x = (h - r_len) / (h * h * h)
        g_factor = spiky_grad_factor * x * x
        result = r * g_factor / r_len
    return result


@ti.func
def compute_scorr(pos_ji):
    # Eq (13)
    x = poly6_value(pos_ji.norm(), h_) / poly6_value(corr_deltaQ_coeff * h_,
                                                     h_)
    # pow(x, 4)
    x = x * x
    x = x * x
    return (-corrK) * x


@ti.func
def get_cell(pos):
    return int(pos * cell_recpr)


@ti.func
def is_in_grid(c):
    # @c: Vector(i32)
    return 0 <= c[0] and c[0] < grid_size[0] and 0 <= c[1] and c[
        1] < grid_size[1]


@ti.func
def confine_position_to_boundary(p):
    bmin = particle_radius_in_world
    bmax = ti.Vector([board_states[None][0], boundary[1]
                      ]) - particle_radius_in_world
    for i in ti.static(range(dim)):
        # Use randomness to prevent particles from sticking into each other after clamping
        if p[i] <= bmin:
            p[i] = bmin + epsilon * ti.random()
        elif bmax[i] <= p[i]:
            p[i] = bmax[i] - epsilon * ti.random()
    return p


@ti.kernel
def move_board():
    # probably more accurate to exert force on particles according to hooke's law.
    b = board_states[None]
    b[1] += 1.0
    period = 90
    vel_strength = 8.0
    if b[1] >= 2 * period:
        b[1] = 0
    b[0] += -ti.sin(b[1] * np.pi / period) * vel_strength * time_delta
    board_states[None] = b


@ti.kernel
def prologue():
    # save old positions
    for i in positions:
        old_positions[i] = positions[i]
    # apply gravity within boundary
    for i in positions:
        g = ti.Vector([0.0, -9.8])
        pos, vel = positions[i], velocities[i]
        vel += g * time_delta
        pos += vel * time_delta
        positions[i] = confine_position_to_boundary(pos)

    # clear neighbor lookup table
    for I in ti.grouped(grid):
        grid[I].grid_num_particles = 0
    for I in ti.grouped(particle_neighbors):
        particle_neighbors[I] = -1

    # update grid
    for p_i in positions:
        cell = get_cell(positions[p_i])
        # ti.Vector doesn't seem to support unpacking yet
        # but we can directly use int Vectors as indices
        offs = ti.atomic_add(grid[cell].grid_num_particles, 1)
        grid[cell].grid2particles[offs] = p_i
    # find particle neighbors
    for p_i in positions:
        pos_i = positions[p_i]
        cell = get_cell(pos_i)
        nb_i = 0
        for offs in ti.static(ti.grouped(ti.ndrange((-1, 2), (-1, 2)))):
            cell_to_check = cell + offs
            if is_in_grid(cell_to_check):
                for j in range(grid[cell_to_check].grid_num_particles):
                    # p_j = grid2particles[cell_to_check, j]
                    p_j = grid[cell_to_check].grid2particles[j]
                    if nb_i < max_num_neighbors and p_j != p_i and (
                            pos_i - positions[p_j]).norm() < neighbor_radius:
                        particle_neighbors[p_i, nb_i] = p_j
                        nb_i += 1
        particle_num_neighbors[p_i] = nb_i


@ti.kernel
def substep():
    # compute lambdas
    # Eq (8) ~ (11)
    for p_i in positions:
        pos_i = positions[p_i]

        grad_i = ti.Vector([0.0, 0.0])
        sum_gradient_sqr = 0.0
        density_constraint = 0.0

        for j in range(particle_num_neighbors[p_i]):
            p_j = particle_neighbors[p_i, j]
            if p_j < 0:
                break
            pos_ji = pos_i - positions[p_j]
            grad_j = spiky_gradient(pos_ji, h_)
            grad_i += grad_j
            sum_gradient_sqr += grad_j.dot(grad_j)
            # Eq(2)
            density_constraint += poly6_value(pos_ji.norm(), h_)

        # Eq(1)
        density_constraint = (mass * density_constraint / rho0) - 1.0

        sum_gradient_sqr += grad_i.dot(grad_i)
        lambdas[p_i] = (-density_constraint) / (sum_gradient_sqr +
                                                lambda_epsilon)
    # compute position deltas
    # Eq(12), (14)
    for p_i in positions:
        pos_i = positions[p_i]
        lambda_i = lambdas[p_i]

        pos_delta_i = ti.Vector([0.0, 0.0])
        for j in range(particle_num_neighbors[p_i]):
            p_j = particle_neighbors[p_i, j]
            if p_j < 0:
                break
            lambda_j = lambdas[p_j]
            pos_ji = pos_i - positions[p_j]
            scorr_ij = compute_scorr(pos_ji)
            pos_delta_i += (lambda_i + lambda_j + scorr_ij) * \
                spiky_gradient(pos_ji, h_)

        pos_delta_i /= rho0
        position_deltas[p_i] = pos_delta_i
    # apply position deltas
    for i in positions:
        positions[i] += position_deltas[i]


@ti.kernel
def epilogue():
    # confine to boundary
    for i in positions:
        pos = positions[i]
        positions[i] = confine_position_to_boundary(pos)
    # update velocities
    for i in positions:
        velocities[i] = (positions[i] - old_positions[i]) / time_delta
    # no vorticity/xsph because we cannot do cross product in 2D...


def run_pbf():
    prologue()
    for _ in range(pbf_num_iters):
        substep()
    epilogue()


def render(gui):
    gui.clear(bg_color)
    pos_np = positions.to_numpy()
    for j in range(dim):
        pos_np[:, j] *= screen_to_world_ratio / screen_res[j]
    gui.circles(pos_np, radius=particle_radius, color=particle_color)
    gui.rect((0, 0), (board_states[None][0] / boundary[0], 1),
             radius=1.5,
             color=boundary_color)
    gui.show()


@ti.kernel
def init_particles():
    for i in range(num_particles):
        delta = h_ * 0.8
        offs = ti.Vector([(boundary[0] - delta * num_particles_x) * 0.5,
                          boundary[1] * 0.02])
        positions[i] = ti.Vector([i % num_particles_x, i // num_particles_x
                                  ]) * delta + offs
        for c in ti.static(range(dim)):
            velocities[i][c] = (ti.random() - 0.5) * 4
    board_states[None] = ti.Vector([boundary[0] - epsilon, -0.0])


def print_stats():
    print('PBF stats:')
    # num = grid_num_particles.to_numpy()
    # avg, max_ = np.mean(num), np.max(num)
    # print(f'  #particles per cell: avg={avg:.2f} max={max_}')
    num = particle_num_neighbors.to_numpy()
    avg, max_ = np.mean(num), np.max(num)
    print(f'  #neighbors per particle: avg={avg:.2f} max={max_}')
    print(sp.usage())


def main():
    init_particles()
    print(f'boundary={boundary} grid={grid_size} cell_size={cell_size}')
    gui = ti.GUI('PBF2D', screen_res)
    while gui.running and not gui.get_event(gui.ESCAPE):
        move_board()
        run_pbf()
        if gui.frame % 20 == 1:
            print_stats()
        render(gui)


if __name__ == '__main__':
    main()

上面的代码和原始的pbf2d相比，只是将 grid_num_particles和grid2particles改为了我们改写API。对于grid_num_particles[i,j]，我们将其改为了grid[i,j].cell_num_particles，对于grid2particles，我们将其改为了grid[i,j].cell2particles[offs]。其他的都没有改变。

不用struct field但使用sparse grid的pbf2d

实际上，假如不使用struct field的话，我们还可以连 grid_num_particles和 grid2particles这两个API都不用改变，比如下面的写法更加灵活：

import taichi as ti
ti.init()
max_num_particles_per_cell = 5
grid_size = (10,10)
@ti.data_oriented
class SparseGrid():
    def __init__(self, grid_size):
        self.grid_num_particles = ti.field(int)
        self.grid2particles = ti.field(int)
        self.snode = ti.root.bitmasked(ti.ij, grid_size)
        self.snode.place(self.grid_num_particles)
        self.snode.bitmasked(ti.k, max_num_particles_per_cell).place(self.grid2particles)

    @ti.kernel
    def usage(self):
        cnt = 0
        for I in ti.grouped(self.snode):
            if ti.is_active(self.snode, I):
                cnt+=1
        usage =  cnt/(grid_size[0]*grid_size[1])
        print("Grid usage: ", usage)

sp = SparseGrid(grid_size=grid_size)
grid_num_particles = sp.grid_num_particles
grid2particles = sp.grid2particles

只要把原本的grid_num_particles和 grid2particles数据结构改成这样，就可以使用稀疏数据结构了。后面的计算完全不改变任何代码。

总结

总结一下：

我们可以对稀疏数据结构进行如下封装

@ti.data_oriented
class SparseGrid():
    def __init__(self, type_dict, shape):
        self.struct_field = ti.Struct.field(type_dict)
        self.snode = ti.root.bitmasked(ti.ij, shape)
        self.snode.place(self.struct_field)
        self.shape = shape
    
    @ti.kernel
    def usage(self)->ti.f32:
        cnt = 0
        for I in ti.grouped(self.snode):
            if ti.is_active(self.snode, I):
                cnt+=1
        usage =  cnt/(self.shape[0]*self.shape[1])
        print("Grid usage: ", usage)
        return usage

经过封装之后的SparseGrid类，可以用来实现稀疏数据结构。使用方法如下：

实例化：

sp = SparseGrid({
                "pos": ti.math.vec3,
                "mass": ti.f32
                }, shape=(10,10))
grid = sp.struct_field

使用grid[1,2].pos = ti.math.vec3(1,2,3) 来访问其中的数据。
使用sp.usage()来查看使用率。

以上所有内容的jupyter notebook:

sparse_grid_tut.ipynb (36.2 KB)

Otis · 2023 年4 月 6 日 14:00

采用稀疏结构计算效率会不会变慢很多