Académique Documents
Professionnel Documents
Culture Documents
A Joshi S Ismail
Dept. Of Electrical & Computer Engineering Faculty of Engineering
The University of the West Indies Multimedia University
St. Augustine, Trinidad & Tobago Cyberjaya, Malaysia
Ajay.Joshi@sta.uwi.edu
T
L S Ng A Taqa
Department of Artificial Intelligence Computer Science Department
University of Malaya Mosul University, Iraq
Malaysia
ES
Abstract— The goal of the project is to create a
Multiprocessor system capable of rendering a 3D
model into an MPEG-4 stream. This paper
for features like 3D objects and positional sound. It
is starting to gain acceptance in the industry and is
the basis for the widely used DivX codec.
During the course of the research, a
outlines the design, software architecture and Multiprocessor system will be created that can
hardware setup for the system. Preliminary render a 3D model into an MPEG-4 stream. It is
expected that applying parallel computing
success in the previous setup[1] helped us gain principals will speed up rendering, thus improving
A
experience as well as motivation for this highly the usefulness and efficiency of the MPEG-4
standard.
optimized and powerful second version.
Authors are interested in systems that have
Keywords-hpc; gpu; cluster; parallel system; practical, user-centered applications. Since MPEG-4
is a streaming format, it allows for interactivity.
IJ
model. The design of cluster is planned to be like interaction through input devices, input from
scalable upwards or downwards depending on the sensors and input from data files. Interactivity will
need of processing. This helps in making efficient allow the system to function as more than a static
use of power. All nodes can be used as standalone visualization tool. It can potentially be used for
systems. 4 nodes will have different processing applications such as simulation, modeling and game-
capabilities, in sense that, individual nodes will be playing.
specialized in handling different processing
Various software development tools and
requirements but at the same time can easily compilers can be employed depending on the need.
contribute to achieve massive processing required by For current project, more is mentioned in the
certain applications when used as a cluster node.
software setup section.
System is designed to operate in two modes, one
under Linux with OpenMosix with OpenGL or B. Physical setup of cluster
Nvidia’s Common Unified Device Architecture
At the hardware level, the whole system is
(CUDA) for GPUs and second with Windows 7
configured as a small cluster. Currently four
Ultimate 64 bit edition and OpenGL or CUDA
specialized nodes will be used .
toolkit.
T
As common with networks or clusters, the nodes
A. System Architecture are connected to a Gigabit switch. The switch is
capable of connecting eight nodes, existing setup
As mentioned earlier, the system is planned to be allows to add 4 more nodes. This is acceptable for a
scalable. Each node will have its own resources as
small cluster. If resources become available for a
far as CPU, RAM, Storage etc are concerned. Also,
ES
each node will employ multiple extremely high-end
graphics processors. Project was able to jump-start
quickly due to support from Nvidia Corporation for
supplying ultra high-end graphics hardware. Entire
system with 4 nodes will have total of 46 CPU cores
& 4320 GPU cores giving a combined Single
large cluster, the switch will have to be upgraded.
For this initial phase, these nodes can be
deployed in a variety of manners depending on the
needs of the software architecture of the system.
One of the most common setups would be to use one
node as a master with the other three as slaves. The
precision peak performance in excess of 24 TFlops. current setup has more than necessary processing
Technically there will be no limit for more power for interactive rendering of high-polygon
expansion if needed. models.
As much as possible, existing libraries and source
A
code will be used for parallelization and rendering. SOFTWARE SETUP
II.
The system will be based on two options, one is Given the hardware setup, there are many
open standards and established protocols with Linux software approaches that can be tried. These
as the base system. These choices will help keep the approaches differ in the level of implementation
cost down and will allow the project to move faster effort, performance profiles, effectiveness and need.
by legally taking advantage of existing work. The
These software approaches share some
IJ
generate output in the form of an MPEG-4 stream. show that this form of parallelization only provides
performance benefits for animation sequences of
A. Parallelization through animation frames more than 1000 frames.
Parallelization through animation frames is With a frame rate of 50 to 60 frames per second
achieved through the use of an OpenMosix [3] (standard for interactive games), this means that 15
cluster. OpenMosix is an application that to 20 seconds worth of animation need to be
automatically parallelizes applications based on rendered at one time. This means that the system can
processes. Suppose a rendering application is run on only respond to input 3 or 4 times a minute. This is
the cluster. The application is multi-threaded and acceptable for applications with low interactivity,
spawns multiple processes. OpenMosix will assign but unacceptable user interactivity. The frame by
the processes to different machines in the cluster. frame approach also limits the system’s ability to
OpenMosix performs load balancing to make sure utilize the full potential of the MPEG-4 standard.
that each machine is used optimally.
The approach uses Blender’s existing animation B. Parallelized rendering library
rendering engine. The engine renders blender files This software approach works by replacing the
T
into animation frames. The engine is multi-threaded underlying 3D rendering library with a parallelized
and is ideal for parallelization on an OpenMosix version. This technique works because a large
cluster [4]. number of applications do not use their own
The idea is to schedule the rendering engine to rendering engines. Instead they rely on generic
rendering libraries like DirectX, OpenGL or
run at fixed intervals. At the beginning of each
CUDA.
ES
interval, any input from user interaction is captured.
The appropriate changes are made the animation
sequence. Then the rendering engine is called to
generate the sequence. The rendering engine spawns
processes which in turn are fed to OpenMosix.
OpenMosix assigns these processes to the various
machines in the cluster. At the end of the interval, all
OpenGL is of primary interest to this project
because it is an open standard with open source
implementations. It has strong industry support with
many popular graphics cards supporting it in
hardware. The Blender GUI and rendering engine
utilize OpenGL.
the frames are collected and fed to an MPEG-4 The WireGL project [5], at the Stanford
component to be turned into an MPEG-4 stream.
University Computer Graphics Lab, is a good
(See “Figure 1. Example of OpenMosix cluster”)
example of a parallelized rendering library. WireGL
implements the exact same API calls as OpenGL.
A
Therefore any program designed for OpenGL can
run in a distributed environment using WireGL.
WireGL uses a sort first parallelization algorithm
along with some other optimizations for memory
and band-width management. It was initially
IJ
However, this approach still does not fully utilize Instruction Multi Data parallelization with each node
the capabilities of MPEG-4. running its own set of data. A polygon based
rendering approach will be used as opposed to ray
It is important however to note that the
tracing.
current work will also permit to utilize Matlab and
related HPC toolkit to be used with high efficiency At this point, it is planned that a rendering library
for processing & simulation. approach similar to WireGL will be used. The Mesa
[8] implementation of OpenGL will be used as the
C. Custom rendering engine initial code base. Mesa is open source, mature and
well documented and is therefore an ideal starting
A custom rendering engine is the end goal of this
point.
project. The basic structure of the system will
involve a PVM driven Linux based cluster. The Unlike WireGL, the system will not use camera
interactivity is driven by Python scripts embedded space partitioning. A sort-middle algorithm is
into the Blender files. proposed for the initial implementation. Since the
focus is on rendering and not on network
Parallel Virtual Machine (PVM) [6] is a software
management, much of the 3D modeling data will be
package that permits a heterogeneous collection of
T
stored on each individual machine & GPU
networked computers to be used as a single large
hardware, distributed before rendering begins. This
parallel computer. At its’ heart, it is a C API that can
will reduce network traffic. However this approach
be used to simplify distributed computing.
assumes that each machine has a reasonable amount
Python [7] is an interpreted, interactive, object- of memory and sufficient polygon processing power.
oriented programming language. Python’s It will be very interesting to see the performance
interpreted nature makes it a good match forES
Blender. Blender provides a growing collection of
Python APIs for a variety of purposes. Python
scripts are available for controlling animation,
textures and most other aspects of the 3D model.
There is even an API for game logic.
boost, if achieved, by application of multi multi-core
GPU hardware.
One interesting aspect of this setup will be
experimenting with how to best utilize the newer
features of the MPEG-4 standard, mainly ability to
specify 3D-objects. It will be possible to study the
The goals of the custom rendering engine should balance between performing work on the cluster and
be consistent with the goals of the project. on the output device. It may be possible to attempt
Therefore, the focus here is on creating a stable different divisions of labor for different output
source code base that can be used for developing devices.
A
and testing new rendering algorithms (See “Figure 2.
Architecture of custom rendering engine”).
CONCLUSION
III.
After the text edit has been completed, the paper
is ready for the template. Duplicate the template file
IJ
T
[2] http://www.chiariglione.org/mpeg/standards/mpeg- [14] S. Upstill, The Renderman Companion, Addison-Wesley,
4/mpeg-4.htm Reading, MA, 1989.
[3] http://www.blender.org [15] B Schneider, “Parallel Rendering on PC Workstations”,
[4] http://openmosix.sourceforge.net/ I nternational Conference on Parallel and Distributed
[5] http://spot.river-styx.com/viewarticle.php?id=12 Processing Techniques and Applications (PDTA98), Las
[6] G. Humphreys, I. Buck, M. Eldridge, and P. Hanrahan, Vegas, NV, 1998.
“Distributed rendering for scalable displays”, SC2000: [16] M. Berekovic, P. Pirsch, "An Array Processor
ES
High Performance Networking and Computing, ACM
Press and IEEE Computer Society Press, Dallas
Convention Center, Dallas, TX, USA, November 4–10
Architecture with Parallel Data Cache for Image
Rendering and Compositing,"cgi, p. 411, Computer
Graphics International 1998 (CGI'98), 1998
A
IJ