Masters' Project
Parallel
volume rendering using multiple VolumePro
boards
Student:
Advisor:
Abhijeet Ghosh
Prof. Arie E. Kaufman
Aim: - Implementing distributed volume rendering using multiple VolumePro 500 boards over a high-speed Myrinet interconnect and parallel rendering using multiple VolumePro 1000 boards on a single PC.
Motivation:
-
Distributed Rendering:
A single VolumePro 500 board can render only 2563 voxels
in real time (30 frames per second). To obtain real time frame rates
for larger volumes, experiments have been carried out using multiple VP
500 boards on parallel PCI slots of a single PC. But the drivers of the
board are unable to distribute the load properly to the boards and
hence the obtained speedup does not scale with number of boards. The
programming API for VP 500, Volume Library Interface (VLI 2.0) does not
allow the programmer to access each board individually for proper load
distribution. The motivation behind this project is to be able to
explicitly distribute the rendering tasks to boards on 2 separate PCs
in order to obtain better speedup. We use a high-speed point-to-point
Myrinet interconnect (100MB/s) for this purpose to achieve high frame
rates over the network.
Parallel Rendering:
Since October 2001, the VolumePro-1000 has become commercially
available. This new board has better supports for multi-board
rendering, with its new XY-image ordered rendering algorithm. Based on
the results of the previous attempts at multi-board rendering, the new
VLI 3.0 has been redesigned to allow the programmer a way to specify a
particular board for rendering a sub-volume. We utilize this feature to
partition the image plane and render using multiple VP 1000 boards in
parallel PCI slots on 1 PC.
Resources:
-
H/W:
- VolumePro 500 & 1000, Pentium
III PC
with 512 MB RAM, Myrinet LANi interconnect.
S/W: - Volume Library Interface (VLI 2.0, 3.0), Visual C++ 6.0, OpenGL 1.2, Myrinet GM API 1.3.
VolumePro 500
![]()
![]()
VolumePro 500 PCI board 4 VP500 boards in parallel PCI slots of 1 machine
Distributed
Rendering Algorithm
We have one PC set up as the master node and another as the slave. The basic procedure is as follows: -
Each rendering node (master & slave) has
VolumePro board(s) in its PCI slots.
The volume is divided into 2 equal halves at the
master node.
One half is transferred to the remote node
(slave).
Rendering context (model view matrix, lighting
information, color LUTs) are transferred to the slave once per frame.
Post-warped image of the half volume is
transferred to the master once per frame.
Hexagonal textures of the baseplanes (silhouette
of a cubic volume is a hexagon) representing the corresponding halves
are composited at the master node using alpha blending by the texture
mapping graphics hardware.
Important - The application (volume) has to be large enough to justify rendering over this networked set-up. The intuition is that for certain applications distributed rendering including the network transmission times and synchronization delays will be faster than rendering on a single machine.
CT_UNC_head rendered with
distributed
rendering - compositing of textures lead to a 'visible
crack' at the seams.
CT_UNC_head dataset (2563)
VolumePro 1000 Rendering
1000 million tri-linearly interpolated,
Phong-shaded samples/second.
Ability to embed opaque and translucent surfaces.
8-, 16-, and 32-bit voxels with up to four
customizable fields.
XY image order rendering.
Multi-pass volume rendering.
Classification and interpolation in either order.
Up to 8192 (8K) voxels in any dimension in one
pass.
Support for super volumes and multi-board
rendering
Space-leaping and early-ray termination.
High
Quality Visualization with XY image ordered Rendering Algorithm


CT_CAROTID_MGH 12 bit dataset rendered using VP 1000
Parallel Volume Rendering with multiple VP 1000 boards
First create separate volume objects (of the
same volume) and use these to load the volume on to the available board
(one for every board).
Create image & depth buffers for every
volume object.
Partition the image buffer in application memory
(to be displayed by OpenGL) into equal ranges, one range for every
board.
For every frame generated, make calls to Render
function with the appropriate image & depth buffers for every
volume object.
For every frame generated, make calls to Unload
function to copy pixel values from the corresponding board into that
range of the image buffer.
Make call to Drawpixels() for drawing the image with OpenGL.

F-15 (213x300x82): Top half (in
red)
rendered by board 1 and bottom half (in black) by board 2
Bibliography: -
H. Pfister, J. Hardenbergh, J. Knittel, H.
Lauer, and L. Seiler, The VolumePro Real-Time Ray-Casting System,
Proceedings of SIGGRAPH 99.
Volume Library Interface User’s Guide –
Copyright RTViz.
VolumePro 1000 Programmer’s Guide –
Copyright RTViz.
The GM Message Passing System – Copyright
Myricom, Inc.
OpenGL Programming Guide – OpenGL ARB.