为啥要用稀疏网格?
大家好,我们来讲解一下如何在太极中使用稀疏网格。稀疏网格的作用是节约内存。为什么要特意地去节约内存呢?主要有两个原因:
-
GPU上显存寸土寸金,目前4090才24GB显存。
-
三维网格这种数据结构如果不是稀疏的,很容易有很大的内存开销。
为了说明第二点,我们来粗略地估计一下网格的内存开销有多大。假设我们有一个三维网格,它的大小是1000x1000x1000,每个网格中仅仅存一个32位vec3的数据(比如说就是pos),那么这个网格的内存开销是多少呢?我们来算一下:
1000x1000x1000x4x3 = 12GB
也就是说,我们仅仅存了一个vec3的数据,就需要12GB的内存。这个内存开销是非常大的。
这种情况几乎只出现在三维网格上。如果是存粒子,基本不需要担心内存开销。比如我们要用100万个粒子,也就存一个pos。那么这个内存开销是多少呢?我们来算一下:
100万x4x3 = 12MB
也就是,即使存100万个粒子,也只需要12MB的内存。这个内存开销是非常小的。
并且,对于二维的网格,内存开销也比较小的。同样是分辨率为1000的网格,那么内存开销仅为:
1000x1000x4x2 = 8MB
也就是说,对于二维网格只需要8MB的内存。这个内存开销同样是非常小的。
所以我们说,稀疏数据结构几乎只在1.三维 2.网格 这种数据结构上需要。并且一旦模拟的规模上去了,很容易出现显存不够的情况。所以我们需要稀疏网格。(当然为了简便后面的例子实际用的是二维网格)
接下来我们言归正传,介绍如何在太极中使用稀疏网格。
struct field
首先我们从文档中的最简单的struct field开始,代码来自于官方文档,我们这里只是复制粘贴一下:
import taichi as ti
ti.init()
# vec3 is a built-in vector type suppied in the `taichi.math` module
vec3 = ti.math.vec3
n = 10
# Declares a struct comprising three vectors and one floating-point number
particle = ti.types.struct(
pos=vec3, vel=vec3, acc=vec3, mass=float,
)
# Declares a 1D field of the struct particle by calling field()
particle_field = particle.field(shape=(n,))
particle_field[0].pos = vec3(1,2,3)
print(particle_field[0].pos)
总共有三步
-
用 particle = ti.types.struct定义类型(这里particle是一个struct)
-
用particle.field(shape=(n,))定义field一个sturct field
-
用particle_field[0].pos = vec3(1,2,3) 来使用这个sturct field
封装一下struct field
接下来我们仅仅把这个东西封装为类,其他的都不变。
import taichi as ti
ti.init()
vec3 = ti.math.vec3
n = 10
struct_type = ti.types.struct(
pos=vec3, vel=vec3, acc=vec3, mass=float,
)
@ti.data_oriented
class Particle():
def __init__(self, struct_type, shape):
self.field = struct_type.field(shape=shape)
p = Particle(struct_type, (n,))
p.field[0].pos = vec3(1,2,3)
print(p.field[0].pos)
这个类的实例化需要传入两个参数:struct_type和shape,其中struct_type就是刚才的particle。
封装之后,我们在使用的时候需要两步:
-
通过p = Particle(struct_type, (n,)) 来实例化一个Particle类的对象p
-
通过p.field[0].pos = vec3(1,2,3)来访问field中的元素
其中struct_type仍然是 struct_type = ti.types.struct(pos=vec3, vel=vec3, acc=vec3, mass=float,)
pbf2d的网格封装为struct field
接下来我们试试用这个类来实现官方example中pbf2d.py中的网格数据结构:
一个网格内有两个数据:
-
一个是grid_num_particles,代表当前这个网格里面有几个粒子
-
一个是grid2particles,代表当前这个网格里面的粒子的索引
import taichi as ti
ti.init()
vec3 = ti.math.vec3
grid_size = (10,10)
max_num_particles_per_cell = 5
struct_type = ti.types.struct(
grid_num_particles=ti.i32,
grid2particles=ti.types.vector(max_num_particles_per_cell, ti.i32)
)
@ti.data_oriented
class Grid():
def __init__(self, struct_type, shape):
self.field = struct_type.field(shape=(shape))
g = Grid(struct_type, grid_size)
g.field[0,0].grid_num_particles = 2
g.field[0,0].grid2particles[0] = 17
print(g.field[0,0].grid_num_particles)
print(g.field[0,0].grid2particles[0])
我们还可以稍微改一下写法,用一个dict设定struct的类型,然后用ti.Struct.field()来创建struct field。这两种是完全一样的,只不过写法不同。
import taichi as ti
ti.init()
vec3 = ti.math.vec3
grid_size = (10,10)
max_num_particles_per_cell = 5
type_dict = {
"grid_num_particles": ti.i32,
"grid2particles": ti.types.vector(max_num_particles_per_cell, ti.i32)
}
@ti.data_oriented
class Grid():
def __init__(self, type_dict, shape):
self.field = ti.Struct.field(type_dict, shape=(shape))
g = Grid(type_dict, grid_size)
g.field[0,0].grid_num_particles = 2
g.field[0,0].grid2particles[0] = 17
print(g.field[0,0].grid_num_particles)
print(g.field[0,0].grid2particles[0])
我们只不过把struct_type换为了type_dict,并且把struct_type.field(shape=(shape))换为了ti.Struct.field(type_dict, shape=(shape))。其他的都一样。
稀疏网格(重点)
接下来,我们尝试将Grid改为SparseGrid。也就是利用taichi的sparse data structure来实现。
sparse data structure的用法大致如下:
-
首先设定snode, 就是设定内存的排布方式。
-
挂载实际的field。
import taichi as ti
ti.init()
max_num_particles_per_cell = 5
grid_size = (10,10)
type_dict = {
"grid_num_particles": ti.i32,
"grid2particles": ti.types.vector(max_num_particles_per_cell, ti.i32)
}
@ti.data_oriented
class SparseGrid():
def __init__(self, type_dict, shape):
self.field = ti.Struct.field(type_dict)
self.snode = ti.root.bitmasked(ti.ij, shape)
self.snode.place(self.field)
sp = SparseGrid(type_dict=type_dict, shape=grid_size)
grid = sp.field
grid[0,0].grid_num_particles = 2
grid[0,0].grid2particles[0] = 17
print("grid[0,0].grid_num_particles=", grid[0,0].grid_num_particles)
print("grid[0,0].grid2particles[0]=", grid[0,0].grid2particles[0])
在上面例子的__init__函数中,我们增加了两行
self.snode = ti.root.bitmasked(ti.ij, shape)
self.snode.place(self.field)
首先我们建立一个snode。我们指定了bitmasked这种方式,这代表taichi会为我们分配一个bit来判断这个网格是否被使用。其他方式还有dense,pointer和dynamic。这里我们就只用bitmasked。然后ti.ij表示了我们的内存的排布方式。因为我们的网格是二维的,所以我们用ti.ij来表示。其他的还包括ti.i, ti.j, ti.ijk等等。
然后在snode上面挂载field数据。我们只需要用place这个API即可。请记住field是一个struct field。
最后我们给sp.field 起了个别名叫grid。这样,在使用的时候,我们只需要用grid[0,0].grid_num_particles = 2
这样的写法来访问其中的数据。这种写法非常符合我们的直觉。
看一下使用率
当然,我们使用sparse data的主要目的就是为了节约内存。那么有没有方法查看网格的使用率呢?为此,我们需要增加一个usage函数。在这个函数中,我们用struct-for遍历所有的网格,然后用is_active()函数来判断当前网格是否被使用。
import taichi as ti
ti.init()
max_num_particles_per_cell = 5
grid_size = (10,10)
type_dict = {
"grid_num_particles": ti.i32,
"grid2particles": ti.types.vector(max_num_particles_per_cell, ti.i32)
}
@ti.data_oriented
class SparseGrid():
def __init__(self, type_dict, shape):
self.field = ti.Struct.field(type_dict)
self.snode = ti.root.bitmasked(ti.ij, shape) # grid_snode
self.snode.place(self.field)
self.shape = shape
@ti.kernel
def usage(self)->ti.f32:
cnt = 0
for I in ti.grouped(self.snode):
if ti.is_active(self.snode, I):
cnt+=1
usage = cnt/(self.shape[0]*self.shape[1])
print("Grid usage: ", usage)
return usage
sp = SparseGrid(type_dict=type_dict, shape=grid_size)
grid = sp.field
grid[0,0].grid_num_particles = 2
grid[0,0].grid2particles[0] = 17
print("grid[0,0].grid_num_particles=", grid[0,0].grid_num_particles)
print("grid[0,0].grid2particles[0]=", grid[0,0].grid2particles[0])
sp.usage()
在这部分代码里,我们只是增加了usage这个成员函数。用struct-for访问的时候,会自动跳过没有激活的网格。而range-for则会激活所有的网格。我们遍历并加和所有激活的网格,然后除以总的网格数,就可以得到使用率了。
我们这里可以看到使用率是0.01左右,也就是说,我们的网格只有1%的网格被使用。因为我们只激活了一个网格,所以使用率是1/100。
于是我们节约了剩下的99%的内存。但是由于taichi中所有内存都是预分配的。一旦ti.init(),所有内存都已经被分配完毕。所以从系统占用率上来看,我们看不出节约了内存。但是程序内部可以使用节约下来的内存。
但是我们的程序还有一点美中不足,那就是一旦激活了网格,就一直激活。当然,大部分情况下我们只需要这样就行了。因为去激活和熄灭网格是需要额外的计算量的。
但是假如你希望比如隔个500步就去重新熄灭再激活一下,也是可以的。只需要使用sp.snode.deactivate_all()
即可。这样会熄灭所有这个snode的网格。然后再重新激活需要的网格即可。我们这里就不再赘述了。
用稀疏网格改写一下完整的pbf2d
接下来我们就去用上面学到的技巧去改写pbf2d这个taichi的官方example。我们啥都不改,除了把网格的数据结构改成了sparse data structure。也就是grid_num_particles和grid2particles这两个数据结构。
# Macklin, M. and Müller, M., 2013. Position based fluids. ACM Transactions on Graphics (TOG), 32(4), p.104.
# Taichi implementation by Ye Kuang (k-ye)
import math
import numpy as np
import taichi as ti
ti.init(arch=ti.gpu)
screen_res = (800, 400)
screen_to_world_ratio = 10.0
boundary = (screen_res[0] / screen_to_world_ratio,
screen_res[1] / screen_to_world_ratio)
cell_size = 2.51
cell_recpr = 1.0 / cell_size
def round_up(f, s):
return (math.floor(f * cell_recpr / s) + 1) * s
grid_size = (round_up(boundary[0], 1), round_up(boundary[1], 1))
dim = 2
bg_color = 0x112f41
particle_color = 0x068587
boundary_color = 0xebaca2
num_particles_x = 60
num_particles = num_particles_x * 20
max_num_particles_per_cell = 100
max_num_neighbors = 100
time_delta = 1.0 / 20.0
epsilon = 1e-5
particle_radius = 3.0
particle_radius_in_world = particle_radius / screen_to_world_ratio
# PBF params
h_ = 1.1
mass = 1.0
rho0 = 1.0
lambda_epsilon = 100.0
pbf_num_iters = 5
corr_deltaQ_coeff = 0.3
corrK = 0.001
# Need ti.pow()
# corrN = 4.0
neighbor_radius = h_ * 1.05
poly6_factor = 315.0 / 64.0 / math.pi
spiky_grad_factor = -45.0 / math.pi
old_positions = ti.Vector.field(dim, float)
positions = ti.Vector.field(dim, float)
velocities = ti.Vector.field(dim, float)
# grid_num_particles = ti.field(int)
# grid2particles = ti.field(int)
particle_num_neighbors = ti.field(int)
particle_neighbors = ti.field(int)
lambdas = ti.field(float)
position_deltas = ti.Vector.field(dim, float)
# 0: x-pos, 1: timestep in sin()
board_states = ti.Vector.field(2, float)
ti.root.dense(ti.i, num_particles).place(old_positions, positions, velocities)
# grid_snode = ti.root.dense(ti.ij, grid_size)
# grid_snode.place(grid_num_particles)
# grid_snode.dense(ti.k, max_num_particles_per_cell).place(grid2particles)
nb_node = ti.root.dense(ti.i, num_particles)
nb_node.place(particle_num_neighbors)
nb_node.dense(ti.j, max_num_neighbors).place(particle_neighbors)
ti.root.dense(ti.i, num_particles).place(lambdas, position_deltas)
ti.root.place(board_states)
type_dict = {
"grid_num_particles": ti.i32,
"grid2particles": ti.types.vector(max_num_particles_per_cell, ti.i32)
}
@ti.data_oriented
class SparseGrid():
def __init__(self, type_dict, shape):
self.field = ti.Struct.field(type_dict)
self.snode = ti.root.bitmasked(ti.ij, shape) # grid_snode
self.snode.place(self.field)
self.shape = shape
@ti.kernel
def usage(self)->ti.f32:
cnt = 0
for I in ti.grouped(self.snode):
if ti.is_active(self.snode, I):
cnt+=1
usage = cnt/(self.shape[0]*self.shape[1])
print("Grid usage: ", usage)
return usage
sp = SparseGrid(type_dict=type_dict, shape=grid_size)
grid = sp.field
@ti.func
def poly6_value(s, h):
result = 0.0
if 0 < s and s < h:
x = (h * h - s * s) / (h * h * h)
result = poly6_factor * x * x * x
return result
@ti.func
def spiky_gradient(r, h):
result = ti.Vector([0.0, 0.0])
r_len = r.norm()
if 0 < r_len and r_len < h:
x = (h - r_len) / (h * h * h)
g_factor = spiky_grad_factor * x * x
result = r * g_factor / r_len
return result
@ti.func
def compute_scorr(pos_ji):
# Eq (13)
x = poly6_value(pos_ji.norm(), h_) / poly6_value(corr_deltaQ_coeff * h_,
h_)
# pow(x, 4)
x = x * x
x = x * x
return (-corrK) * x
@ti.func
def get_cell(pos):
return int(pos * cell_recpr)
@ti.func
def is_in_grid(c):
# @c: Vector(i32)
return 0 <= c[0] and c[0] < grid_size[0] and 0 <= c[1] and c[
1] < grid_size[1]
@ti.func
def confine_position_to_boundary(p):
bmin = particle_radius_in_world
bmax = ti.Vector([board_states[None][0], boundary[1]
]) - particle_radius_in_world
for i in ti.static(range(dim)):
# Use randomness to prevent particles from sticking into each other after clamping
if p[i] <= bmin:
p[i] = bmin + epsilon * ti.random()
elif bmax[i] <= p[i]:
p[i] = bmax[i] - epsilon * ti.random()
return p
@ti.kernel
def move_board():
# probably more accurate to exert force on particles according to hooke's law.
b = board_states[None]
b[1] += 1.0
period = 90
vel_strength = 8.0
if b[1] >= 2 * period:
b[1] = 0
b[0] += -ti.sin(b[1] * np.pi / period) * vel_strength * time_delta
board_states[None] = b
@ti.kernel
def prologue():
# save old positions
for i in positions:
old_positions[i] = positions[i]
# apply gravity within boundary
for i in positions:
g = ti.Vector([0.0, -9.8])
pos, vel = positions[i], velocities[i]
vel += g * time_delta
pos += vel * time_delta
positions[i] = confine_position_to_boundary(pos)
# clear neighbor lookup table
for I in ti.grouped(grid):
grid[I].grid_num_particles = 0
for I in ti.grouped(particle_neighbors):
particle_neighbors[I] = -1
# update grid
for p_i in positions:
cell = get_cell(positions[p_i])
# ti.Vector doesn't seem to support unpacking yet
# but we can directly use int Vectors as indices
offs = ti.atomic_add(grid[cell].grid_num_particles, 1)
grid[cell].grid2particles[offs] = p_i
# find particle neighbors
for p_i in positions:
pos_i = positions[p_i]
cell = get_cell(pos_i)
nb_i = 0
for offs in ti.static(ti.grouped(ti.ndrange((-1, 2), (-1, 2)))):
cell_to_check = cell + offs
if is_in_grid(cell_to_check):
for j in range(grid[cell_to_check].grid_num_particles):
# p_j = grid2particles[cell_to_check, j]
p_j = grid[cell_to_check].grid2particles[j]
if nb_i < max_num_neighbors and p_j != p_i and (
pos_i - positions[p_j]).norm() < neighbor_radius:
particle_neighbors[p_i, nb_i] = p_j
nb_i += 1
particle_num_neighbors[p_i] = nb_i
@ti.kernel
def substep():
# compute lambdas
# Eq (8) ~ (11)
for p_i in positions:
pos_i = positions[p_i]
grad_i = ti.Vector([0.0, 0.0])
sum_gradient_sqr = 0.0
density_constraint = 0.0
for j in range(particle_num_neighbors[p_i]):
p_j = particle_neighbors[p_i, j]
if p_j < 0:
break
pos_ji = pos_i - positions[p_j]
grad_j = spiky_gradient(pos_ji, h_)
grad_i += grad_j
sum_gradient_sqr += grad_j.dot(grad_j)
# Eq(2)
density_constraint += poly6_value(pos_ji.norm(), h_)
# Eq(1)
density_constraint = (mass * density_constraint / rho0) - 1.0
sum_gradient_sqr += grad_i.dot(grad_i)
lambdas[p_i] = (-density_constraint) / (sum_gradient_sqr +
lambda_epsilon)
# compute position deltas
# Eq(12), (14)
for p_i in positions:
pos_i = positions[p_i]
lambda_i = lambdas[p_i]
pos_delta_i = ti.Vector([0.0, 0.0])
for j in range(particle_num_neighbors[p_i]):
p_j = particle_neighbors[p_i, j]
if p_j < 0:
break
lambda_j = lambdas[p_j]
pos_ji = pos_i - positions[p_j]
scorr_ij = compute_scorr(pos_ji)
pos_delta_i += (lambda_i + lambda_j + scorr_ij) * \
spiky_gradient(pos_ji, h_)
pos_delta_i /= rho0
position_deltas[p_i] = pos_delta_i
# apply position deltas
for i in positions:
positions[i] += position_deltas[i]
@ti.kernel
def epilogue():
# confine to boundary
for i in positions:
pos = positions[i]
positions[i] = confine_position_to_boundary(pos)
# update velocities
for i in positions:
velocities[i] = (positions[i] - old_positions[i]) / time_delta
# no vorticity/xsph because we cannot do cross product in 2D...
def run_pbf():
prologue()
for _ in range(pbf_num_iters):
substep()
epilogue()
def render(gui):
gui.clear(bg_color)
pos_np = positions.to_numpy()
for j in range(dim):
pos_np[:, j] *= screen_to_world_ratio / screen_res[j]
gui.circles(pos_np, radius=particle_radius, color=particle_color)
gui.rect((0, 0), (board_states[None][0] / boundary[0], 1),
radius=1.5,
color=boundary_color)
gui.show()
@ti.kernel
def init_particles():
for i in range(num_particles):
delta = h_ * 0.8
offs = ti.Vector([(boundary[0] - delta * num_particles_x) * 0.5,
boundary[1] * 0.02])
positions[i] = ti.Vector([i % num_particles_x, i // num_particles_x
]) * delta + offs
for c in ti.static(range(dim)):
velocities[i][c] = (ti.random() - 0.5) * 4
board_states[None] = ti.Vector([boundary[0] - epsilon, -0.0])
def print_stats():
print('PBF stats:')
# num = grid_num_particles.to_numpy()
# avg, max_ = np.mean(num), np.max(num)
# print(f' #particles per cell: avg={avg:.2f} max={max_}')
num = particle_num_neighbors.to_numpy()
avg, max_ = np.mean(num), np.max(num)
print(f' #neighbors per particle: avg={avg:.2f} max={max_}')
print(sp.usage())
def main():
init_particles()
print(f'boundary={boundary} grid={grid_size} cell_size={cell_size}')
gui = ti.GUI('PBF2D', screen_res)
while gui.running and not gui.get_event(gui.ESCAPE):
move_board()
run_pbf()
if gui.frame % 20 == 1:
print_stats()
render(gui)
if __name__ == '__main__':
main()
上面的代码和原始的pbf2d相比,只是将 grid_num_particles和grid2particles改为了我们改写API。对于grid_num_particles[i,j],我们将其改为了grid[i,j].cell_num_particles,对于grid2particles,我们将其改为了grid[i,j].cell2particles[offs]。其他的都没有改变。
不用struct field但使用sparse grid的pbf2d
实际上,假如不使用struct field的话,我们还可以连 grid_num_particles和 grid2particles这两个API都不用改变,比如下面的写法更加灵活:
import taichi as ti
ti.init()
max_num_particles_per_cell = 5
grid_size = (10,10)
@ti.data_oriented
class SparseGrid():
def __init__(self, grid_size):
self.grid_num_particles = ti.field(int)
self.grid2particles = ti.field(int)
self.snode = ti.root.bitmasked(ti.ij, grid_size)
self.snode.place(self.grid_num_particles)
self.snode.bitmasked(ti.k, max_num_particles_per_cell).place(self.grid2particles)
@ti.kernel
def usage(self):
cnt = 0
for I in ti.grouped(self.snode):
if ti.is_active(self.snode, I):
cnt+=1
usage = cnt/(grid_size[0]*grid_size[1])
print("Grid usage: ", usage)
sp = SparseGrid(grid_size=grid_size)
grid_num_particles = sp.grid_num_particles
grid2particles = sp.grid2particles
只要把原本的grid_num_particles和 grid2particles数据结构改成这样,就可以使用稀疏数据结构了。 后面的计算完全不改变任何代码。
总结
总结一下:
我们可以对稀疏数据结构进行如下封装
@ti.data_oriented
class SparseGrid():
def __init__(self, type_dict, shape):
self.struct_field = ti.Struct.field(type_dict)
self.snode = ti.root.bitmasked(ti.ij, shape)
self.snode.place(self.struct_field)
self.shape = shape
@ti.kernel
def usage(self)->ti.f32:
cnt = 0
for I in ti.grouped(self.snode):
if ti.is_active(self.snode, I):
cnt+=1
usage = cnt/(self.shape[0]*self.shape[1])
print("Grid usage: ", usage)
return usage
经过封装之后的SparseGrid类,可以用来实现稀疏数据结构。使用方法如下:
- 实例化:
sp = SparseGrid({
"pos": ti.math.vec3,
"mass": ti.f32
}, shape=(10,10))
grid = sp.struct_field
- 使用
grid[1,2].pos = ti.math.vec3(1,2,3)
来访问其中的数据。 - 使用
sp.usage()
来查看使用率。
以上所有内容的jupyter notebook:
sparse_grid_tut.ipynb (36.2 KB)