告别理论猜想:用实际代码推导Gaussian Splatting的2D协方差与3σ渲染原理
从数学到代码:深度解析Gaussian Splatting的2D协方差与3σ渲染原理
1. 三维高斯分布的屏幕空间投影
在计算机图形学中,将三维高斯分布投影到二维屏幕空间是一个关键步骤。这个过程涉及多个坐标系的转换和矩阵运算,我们首先需要理解基础数学原理。
1.1 三维高斯分布的表示
一个三维高斯分布可以用均值向量μ和协方差矩阵Σ来描述:
import torch import numpy as np # 三维高斯参数示例 mu = torch.tensor([1.0, 2.0, 3.0]) # 均值向量 scale = torch.tensor([0.5, 1.0, 1.5]) # 缩放参数 rotation = torch.tensor([1.0, 0.0, 0.0, 0.0]) # 旋转四元数 # 从缩放和旋转计算3D协方差矩阵 def compute_cov3d(scale, rotation): # 构建缩放矩阵 S = torch.diag(scale) # 四元数转旋转矩阵 q = rotation / torch.norm(rotation) r, x, y, z = q[0], q[1], q[2], q[3] R = torch.tensor([ [1-2*(y*y+z*z), 2*(x*y-r*z), 2*(x*z+r*y)], [2*(x*y+r*z), 1-2*(x*x+z*z), 2*(y*z-r*x)], [2*(x*z-r*y), 2*(y*z+r*x), 1-2*(x*x+y*y)] ]) # 构建3D协方差矩阵 M = S @ R Sigma = M.T @ M return Sigma cov3d = compute_cov3d(scale, rotation)1.2 相机投影与坐标变换
将三维点投影到二维屏幕空间需要经过视图变换和投影变换:
def world_to_camera(points, view_matrix): """世界坐标到相机坐标变换""" homogeneous = torch.cat([points, torch.ones(points.shape[0], 1)], dim=1) return (homogeneous @ view_matrix.T)[:, :3] def project_to_screen(points, proj_matrix): """相机坐标到屏幕坐标投影""" homogeneous = torch.cat([points, torch.ones(points.shape[0], 1)], dim=1) projected = homogeneous @ proj_matrix.T projected = projected / projected[:, 3:4] # 齐次坐标归一化 return projected[:, :2]2. 2D协方差矩阵的计算
2.1 EWA Splatting理论基础
根据Zwicker等人的EWA Splatting理论,2D协方差矩阵可以通过以下步骤计算:
- 计算视图变换后的点坐标
- 构建雅可比矩阵J
- 计算最终的2D协方差矩阵
def compute_cov2d(mean, focal_x, focal_y, tan_fovx, tan_fovy, cov3d, view_matrix): # 视图变换 t = world_to_camera(mean.unsqueeze(0), view_matrix).squeeze(0) # 处理极端情况 limx = 1.3 * tan_fovx limy = 1.3 * tan_fovy txtz = t[0] / t[2] tytz = t[1] / t[2] t_x = min(limx, max(-limx, txtz)) * t[2] t_y = min(limy, max(-limy, tytz)) * t[2] # 构建雅可比矩阵 J = torch.tensor([ [focal_x/t[2], 0, -(focal_x*t_x)/(t[2]**2)], [0, focal_y/t[2], -(focal_y*t_y)/(t[2]**2)], [0, 0, 0] ]) # 视图矩阵的旋转部分 W = view_matrix[:3, :3] # 完整变换矩阵 T = W @ J # 3D协方差矩阵 Vrk = torch.tensor([ [cov3d[0], cov3d[1], cov3d[2]], [cov3d[1], cov3d[3], cov3d[4]], [cov3d[2], cov3d[4], cov3d[5]] ]) # 2D协方差矩阵计算 cov2d = T.T @ Vrk @ T cov2d[:2, :2] += 0.3 # 低通滤波 return cov2d[:2, :2]2.2 特征值与渲染半径
计算协方差矩阵的特征值可以确定高斯分布的伸展方向和程度:
def compute_radii(cov2d): # 计算特征值 a = cov2d[0, 0] b = cov2d[0, 1] c = cov2d[1, 1] mid = 0.5 * (a + c) det = a * c - b * b lambda1 = mid + torch.sqrt(torch.clamp(mid * mid - det, min=0.1)) lambda2 = mid - torch.sqrt(torch.clamp(mid * mid - det, min=0.1)) # 3σ原则确定渲染半径 radius = torch.ceil(3 * torch.sqrt(torch.max(lambda1, lambda2))) return radius3. 代码实现与优化
3.1 CUDA核心实现
在实际的Gaussian Splatting实现中,上述计算会在CUDA内核中高效执行:
__device__ float3 computeCov2D( const float3& mean, float focal_x, float focal_y, float tan_fovx, float tan_fovy, const float* cov3D, const float* viewmatrix) { // 视图变换 float3 t = transformPoint4x3(mean, viewmatrix); // 处理极端情况 const float limx = 1.3f * tan_fovx; const float limy = 1.3f * tan_fovy; const float txtz = t.x / t.z; const float tytz = t.y / t.z; t.x = min(limx, max(-limx, txtz)) * t.z; t.y = min(limy, max(-limy, tytz)) * t.z; // 雅可比矩阵 glm::mat3 J = glm::mat3( focal_x / t.z, 0.0f, -(focal_x * t.x) / (t.z * t.z), 0.0f, focal_y / t.z, -(focal_y * t.y) / (t.z * t.z), 0, 0, 0); // 视图矩阵旋转部分 glm::mat3 W = glm::mat3( viewmatrix[0], viewmatrix[4], viewmatrix[8], viewmatrix[1], viewmatrix[5], viewmatrix[9], viewmatrix[2], viewmatrix[6], viewmatrix[10]); // 完整变换 glm::mat3 T = W * J; // 3D协方差矩阵 glm::mat3 Vrk = glm::mat3( cov3D[0], cov3D[1], cov3D[2], cov3D[1], cov3D[3], cov3D[4], cov3D[2], cov3D[4], cov3D[5]); // 2D协方差 glm::mat3 cov = glm::transpose(T) * Vrk * T; // 低通滤波 cov[0][0] += 0.3f; cov[1][1] += 0.3f; return {cov[0][0], cov[0][1], cov[1][1]}; }3.2 渲染优化技巧
在实际渲染中,我们使用多个优化策略:
- 平铺渲染:将屏幕划分为16×16的瓦片(tile)
- 视锥剔除:提前剔除不可见的高斯分布
- 深度排序:按深度对高斯分布进行排序
- 提前终止:当像素的不透明度接近1时停止计算
__global__ void renderCUDA( const uint2* ranges, const uint32_t* point_list, int W, int H, const float2* means2D, const float* colors, const float4* conic_opacity, float* final_T, uint32_t* n_contrib, const float* bg_color, float* out_color) { // 确定当前线程处理的像素 uint32_t pix_id = blockIdx.y * gridDim.x * BLOCK_SIZE + blockIdx.x * BLOCK_SIZE + threadIdx.y * BLOCK_X + threadIdx.x; // 初始化变量 float T = 1.0f; // 透射率 float3 C = {0, 0, 0}; // 累积颜色 // 处理分配给当前tile的高斯分布 for (int i = ranges[blockIdx.x + blockIdx.y * gridDim.x].x; i < ranges[blockIdx.x + blockIdx.y * gridDim.x].y; i++) { const int gaussian_id = point_list[i]; // 计算当前高斯对像素的贡献 float2 xy = means2D[gaussian_id]; float2 d = {xy.x - pix.x, xy.y - pix.y}; float4 con_o = conic_opacity[gaussian_id]; // 计算高斯权重 float power = -0.5f * (con_o.x * d.x * d.x + con_o.z * d.y * d.y) - con_o.y * d.x * d.y; if (power > 0.0f) continue; float alpha = min(0.99f, con_o.w * exp(power)); if (alpha < 1.0f/255.0f) continue; // 更新透射率和颜色 float test_T = T * (1 - alpha); if (test_T < 0.0001f) break; for (int ch = 0; ch < 3; ch++) C[ch] += colors[gaussian_id * 3 + ch] * alpha * T; T = test_T; } // 写入最终结果 if (pix.x < W && pix.y < H) { out_color[pix.y * W + pix.x] = C.x + T * bg_color[0]; out_color[pix.y * W + pix.x + W*H] = C.y + T * bg_color[1]; out_color[pix.y * W + pix.x + 2*W*H] = C.z + T * bg_color[2]; } }4. 调试与可视化
4.1 中间结果可视化
在开发过程中,可视化中间结果对于调试至关重要:
def visualize_covariance(cov2d, mean, size=100): """可视化2D协方差矩阵""" # 生成网格 x, y = np.mgrid[-size:size+1, -size:size+1] pos = np.dstack((x, y)) # 计算高斯分布 inv_cov = np.linalg.inv(cov2d) z = np.exp(-0.5 * np.einsum('...k,kl,...l->...', pos-mean, inv_cov, pos-mean)) # 绘制 plt.imshow(z, cmap='viridis') plt.contour(x, y, z, 8, colors='w', alpha=0.3) plt.show()4.2 常见问题排查
- 协方差矩阵不对称:检查矩阵计算中的转置操作
- 渲染出现空洞:确认3σ半径计算是否正确
- 性能瓶颈:使用NVIDIA Nsight工具分析CUDA内核
提示:在调试时,可以逐步验证每个变换步骤的结果,确保从3D到2D的投影过程符合预期。
5. 数学推导与代码对应
5.1 雅可比矩阵的物理意义
雅可比矩阵J描述了屏幕空间坐标对相机空间坐标的变化率:
J = ∂(u,v)/∂(X,Y,Z) = [f_x/Z, 0, -f_x·X/Z²] [ 0, f_y/Z, -f_y·Y/Z²]其中f_x和f_y是焦距,Z是深度值。
5.2 3σ原则的实现
在代码中,我们通过计算特征值来确定高斯分布的伸展程度:
λ = (a + c ± √((a - c)² + 4b²)) / 2 radius = 3 * √max(λ1, λ2)这个半径确保了高斯分布的99.7%能量被包含在渲染区域内。
6. 性能优化实践
6.1 内存访问优化
// 使用共享内存减少全局内存访问 __shared__ float2 shared_means[BLOCK_SIZE]; __shared__ float4 shared_conic_opacity[BLOCK_SIZE]; // 协作加载数据 if (threadIdx.x < BLOCK_SIZE) { shared_means[threadIdx.x] = means2D[point_list[range.x + i * BLOCK_SIZE + threadIdx.x]]; shared_conic_opacity[threadIdx.x] = conic_opacity[point_list[range.x + i * BLOCK_SIZE + threadIdx.x]]]; } __syncthreads();6.2 并行排序策略
# 使用CUB库进行高效排序 from cub import DeviceRadixSort sorter = DeviceRadixSort() d_keys_out = torch.empty_like(d_keys_in) d_values_out = torch.empty_like(d_values_in) # 按tile ID和深度排序 sorter.sort_pairs(d_keys_in, d_keys_out, d_values_in, d_values_out)7. 扩展与应用
7.1 动态细节级别
根据视图距离动态调整高斯分布密度:
def adjust_lod(gaussians, distance): # 根据距离调整细节级别 lod_factor = torch.clamp(distance / 10.0, 0.1, 1.0) return gaussians[::int(1/lod_factor)]7.2 实时编辑支持
通过修改协方差矩阵实现形状编辑:
def stretch_gaussian(cov3d, direction, factor): # 沿特定方向拉伸高斯分布 stretch = torch.outer(direction, direction) * (factor - 1) return cov3d + stretch