Neat. I've been pursuing the idea of software occlusion culling for a bit now. I started a library I call SoftOcclude, based on intels work, to accomplish exactly this. Repo Link: https://github.com/andr3wmac/SoftOcclude
In the library, occludees are software rasterized to create a depth buffer completely on the CPU side without stalling the GPU. Then occluders are bounding box rasterized and tested against the depth buffer on the CPU ( same method as his ). Really, the only difference is where you get the depth buffer to test against. It would be trivial to adapt the library to use a supplied depth buffer instead of rasterizing one itself.
If anyone is interested in pursing this on the T3D side, let me know, I can help. I'm working on the library for Torque 6 usage, but it's very universal. Eventually it will be SSE optimized and threaded, which should be able to produce even better numbers than what he's getting, even if you're using the previous frames depth buffer instead of rasterizing a new one.