Differentiable Direct Volume Renderer Gradient Issues

Hi guys! I’m now experimenting Taichi to implement a differentiable DVR renderer, but it seems I cannot get gradients right. Can you help me find out where I break the computation graph? Thanks a lot!

For those of you who don’t know DVR, here is a brief introduction:

  1. Imagine you are shooting many rays from your view point through pixels. And there is a 3D semi-transparent box in the space. So, some of the rays will go through the box, during which the color of a ray is accumulated somehow(details later). So, the big picture is, for each pixel, there is a corresponding ray, and the color the a pixel is the color of the corresponding ray that is calculated if the ray intersects with the 3D box. The general idea of DVR is similar to ray tracing.
  2. So, how is the color of a ray calculated? First, if a ray does not intersect with the box, then the color of it is a predefined value(e.g., black). If a ray does intersect with the box, then we calculate the entry point and the exit point. From the entry point to the exit point, a ray steps into the box incrementally with a fixed step size. For each step in this process, we have a ray’s position, and based on the position, we “query” the box to see what is the value for that position. The values in the box come from simulations or real measurements, but it does not matter much here. Besides, we have a lookup table that is called transfer function, which maps a value to a RGBA color. So, the calculation for one step is first we get the position, query the box to get a value and then lookup the transfer function to get a color. Now, we have a color for one step, so then we can accumulate colors in many steps, then the accumulated color is the final color of a ray, which is the color of the corresponding pixel.

In my simple settings, the values in the box are scalars and the transfer function is just a 1D array. Now, what I want to do is to use back propagation to “learn” a transfer function. The experiment is to first render reference images given a predefined transfer function, then initialize another random transfer function and use this one to render “predicted” images. The loss is just L2 loss between reference images and predicted images. So, the logic is straightforward. Here is my kernel that “predicts” images with a randomly initialized transfer function.

def sample_transfer_function_differender(scalar: float):
    length = ti.static(differender_transfer_function.shape[0])
    # index into the 1D array
    val = length * scalar
    low, high, frac = low_high_frac(val)
    val_low = differender_transfer_function[low]
    val_high = differender_transfer_function[high]
    # linear interpolation
    return tl.mix(val_low, val_high, frac)

def DVR_differender_pass():
    # static preparations
    I_ambient = ti.static(tl.vec3(ambient))
    I_diffuse = ti.static(tl.vec3(diffuse))
    I_specular = ti.static(tl.vec3(specular))
    shape = ti.static(tl.vec3(data_field.shape[0], data_field.shape[1], data_field.shape[2]))
    for x, y in differender_pixels:
        for step in range(max_marching_step_field[x, y]):
            if compositing_color_field[x, y].w > opacity_threshold:  # for early stopping
                tex_position = tex_position_field[x, y]
                ray_direction = ray_dir_field[x, y]
                unnormalized_tex_pos = tex_position * shape
                # get the scalar value for one position
                scalar = sample_volume_trilinear(unnormalized_tex_pos.x, unnormalized_tex_pos.y,
                # "lookup" the transfer function
                src_color = sample_transfer_function_differender(scalar)
                opacity = src_color.w
                # some weighting, from (R, G, B, A) to (R*A, G*A, B*A, A)
                new_src = tl.vec4(src_color.xyz * opacity, opacity)
                # shading, can be consider as calculating a weighting term for new_src
                normal = get_normal(unnormalized_tex_pos.x, unnormalized_tex_pos.y, unnormalized_tex_pos.z)
                dir_dot_norm = ray_direction.dot(normal)
                diffuse_color = max(dir_dot_norm, 0.0) * I_diffuse
                v = -tex_position.normalized()
                r = tl.reflect(-ray_direction, normal)
                r_dot_v = max(r.dot(v), 0.0)
                pf = pow(r_dot_v, shininess)
                specular_color = I_specular * pf
                # weighting the new_src by a shading color
                shading_color = tl.vec4(I_ambient + diffuse_color + specular_color, 1.0) * new_src
                # compositing, accumulating the color and store it in compositing_color_field
                compositing_color_field[x, y] = (1.0 - compositing_color_field[x, y].w) * shading_color + \
                                                compositing_color_field[x, y]
                # advance into the box, one more step
                tex_position_field[x, y] += ray_direction * step_size
        # calculate the L2 loss
        l2loss[None] += (compositing_color_field[x, y].xyz - reference_pixels[x, y]).norm_sqr()
        # assign colors to pixels
        differender_pixels[x, y] = compositing_color_field[x, y].xyz

It seems to me that the computation graph is intact, but I got all zeros when I print the gradients.
For the full source code, you can see it here. It’s straightforward if you understand the basics of DVR.