I came up with a pretty cool idea.
I render captures of voxels parcel (albedo and depth map) from a bunch of view points.

I restyle the images using stable diffusion "make it look cyberpunk and shit but dont move anything".

Then I convert the images (which I have the camera intrinsics for) back into a gaussian field. Boom - gaussian splat voxels.com.