10 IJAEST Vol No.4 Issue No.2 High Performance System For The Interactive 054 058

A Joshi, et al.
/ (IJAEST) INTERNATIONAL JOURNAL OF ADVANCED ENGINEERING SCIENCES AND TECHNOLOGIES

Vol No. 4, Issue No. 2, 054 - 058
High performance system for the Interactive

rendering of a 3D Model into MPEG-4
System Architecture
A Joshi S Ismail
Dept. Of Electrical & Computer Engineering Faculty of Engineering
The University of the West Indies Multimedia University
St. Augustine, Trinidad & Tobago Cyberjaya, Malaysia
Ajay.Joshi@sta.uwi.edu
T
L S Ng A Taqa
Department of Artificial Intelligence Computer Science Department
University of Malaya Mosul University, Iraq
Malaysia
ES
Abstract— The goal of the project is to create a
Multiprocessor system capable of rendering a 3D
model into an MPEG-4 stream. This paper
for features like 3D objects and positional sound. It
is starting to gain acceptance in the industry and is
the basis for the widely used DivX codec.
During the course of the research, a
outlines the design, software architecture and Multiprocessor system will be created that can
hardware setup for the system. Preliminary render a 3D model into an MPEG-4 stream. It is
expected that applying parallel computing
success in the previous setup[1] helped us gain principals will speed up rendering, thus improving
A
experience as well as motivation for this highly the usefulness and efficiency of the MPEG-4
standard.
optimized and powerful second version.
Authors are interested in systems that have
Keywords-hpc; gpu; cluster; parallel system; practical, user-centered applications. Since MPEG-4
is a streaming format, it allows for interactivity.
IJ
Introduction Therefore, the project will try to incorporate

interactivity by taking input and applying it to the
Parallel computing means designing and 3D model, thus changing the MPEG-4 stream.
implementing programs that run on two or more Building upon the original application[1]
processors/cores. Parallel computing is often used philosophy, & in order to make effective use of
to speed up the rendering of graphical imaging. research time, the project will use ultra high-end
Parallel computing systems are used to generate components wherever necessary. For the 3D model,
high frame rate imagery for movies, games and it is proposed but not limited to the use of blender
simulations. files. The goal of this project is to create a purely
The MPEG-4 standard focuses on adding experimental working cluster of initially 4 nodes to
functionality to current standards and has support perform parallel rendering of an interactive 3D
ISSN: 2230-7818 @ 2011 http://www.ijaest.iserp.org. All rights Reserved. Page 54

A Joshi, et al. / (IJAEST) INTERNATIONAL JOURNAL OF ADVANCED ENGINEERING SCIENCES AND TECHNOLOGIES
Vol No. 4, Issue No. 2, 054 - 058
model. The design of cluster is planned to be like interaction through input devices, input from
scalable upwards or downwards depending on the sensors and input from data files. Interactivity will
need of processing. This helps in making efficient allow the system to function as more than a static
use of power. All nodes can be used as standalone visualization tool. It can potentially be used for
systems. 4 nodes will have different processing applications such as simulation, modeling and game-
capabilities, in sense that, individual nodes will be playing.
specialized in handling different processing
Various software development tools and
requirements but at the same time can easily compilers can be employed depending on the need.
contribute to achieve massive processing required by For current project, more is mentioned in the
certain applications when used as a cluster node.
software setup section.
System is designed to operate in two modes, one
under Linux with OpenMosix with OpenGL or B. Physical setup of cluster
Nvidia’s Common Unified Device Architecture
At the hardware level, the whole system is
(CUDA) for GPUs and second with Windows 7
configured as a small cluster. Currently four
Ultimate 64 bit edition and OpenGL or CUDA
specialized nodes will be used .
toolkit.
T
As common with networks or clusters, the nodes
A. System Architecture are connected to a Gigabit switch. The switch is
capable of connecting eight nodes, existing setup
As mentioned earlier, the system is planned to be allows to add 4 more nodes. This is acceptable for a
scalable. Each node will have its own resources as
small cluster. If resources become available for a
far as CPU, RAM, Storage etc are concerned. Also,
ES
each node will employ multiple extremely high-end
graphics processors. Project was able to jump-start
quickly due to support from Nvidia Corporation for
supplying ultra high-end graphics hardware. Entire
system with 4 nodes will have total of 46 CPU cores
& 4320 GPU cores giving a combined Single
large cluster, the switch will have to be upgraded.
For this initial phase, these nodes can be
deployed in a variety of manners depending on the
needs of the software architecture of the system.
One of the most common setups would be to use one
node as a master with the other three as slaves. The
precision peak performance in excess of 24 TFlops. current setup has more than necessary processing
Technically there will be no limit for more power for interactive rendering of high-polygon
expansion if needed. models.
As much as possible, existing libraries and source
A
code will be used for parallelization and rendering. SOFTWARE SETUP
II.
The system will be based on two options, one is Given the hardware setup, there are many
open standards and established protocols with Linux software approaches that can be tried. These
as the base system. These choices will help keep the approaches differ in the level of implementation
cost down and will allow the project to move faster effort, performance profiles, effectiveness and need.
by legally taking advantage of existing work. The
These software approaches share some
IJ
second will be based on Win 7 Ultimate 64 bit

edition along-with the suitable clustering and commonalities. All the software systems are built
message passing interface. This will allow the usage with the assumption that the 3D model is stored in a
of Windows based High performance computing Blender file. Blender [2] is an open source
tools to be employed. application for 3D modeling, similar to Cinema 4D
and Maya. It is a mature program with a rich feature
The project proposes two additions to a standard set, strong API support and solid documentation.
distributed rendering system, interactivity and output There are many Blender files, with a variety of
into an MPEG-4 stream. These features aim to add different types of models, readily available for
functionality and usability to the system. testing.
Interactivity will let the model to evolve based on All the proposed software systems are expected
the external input, where input can take many forms to encompass some form of interactivity and
Sponsors:Nvidia Corporation & rndrepository.com

Vol No. 4, Issue No. 2, 054 - 058
generate output in the form of an MPEG-4 stream. show that this form of parallelization only provides
performance benefits for animation sequences of
A. Parallelization through animation frames more than 1000 frames.
Parallelization through animation frames is With a frame rate of 50 to 60 frames per second
achieved through the use of an OpenMosix [3] (standard for interactive games), this means that 15
cluster. OpenMosix is an application that to 20 seconds worth of animation need to be
automatically parallelizes applications based on rendered at one time. This means that the system can
processes. Suppose a rendering application is run on only respond to input 3 or 4 times a minute. This is
the cluster. The application is multi-threaded and acceptable for applications with low interactivity,
spawns multiple processes. OpenMosix will assign but unacceptable user interactivity. The frame by
the processes to different machines in the cluster. frame approach also limits the system’s ability to
OpenMosix performs load balancing to make sure utilize the full potential of the MPEG-4 standard.
that each machine is used optimally.
The approach uses Blender’s existing animation B. Parallelized rendering library
rendering engine. The engine renders blender files This software approach works by replacing the
T
into animation frames. The engine is multi-threaded underlying 3D rendering library with a parallelized
and is ideal for parallelization on an OpenMosix version. This technique works because a large
cluster [4]. number of applications do not use their own
The idea is to schedule the rendering engine to rendering engines. Instead they rely on generic
rendering libraries like DirectX, OpenGL or
run at fixed intervals. At the beginning of each
CUDA.
ES
interval, any input from user interaction is captured.
The appropriate changes are made the animation
sequence. Then the rendering engine is called to
generate the sequence. The rendering engine spawns
processes which in turn are fed to OpenMosix.
OpenMosix assigns these processes to the various
machines in the cluster. At the end of the interval, all
OpenGL is of primary interest to this project
because it is an open standard with open source
implementations. It has strong industry support with
many popular graphics cards supporting it in
hardware. The Blender GUI and rendering engine
utilize OpenGL.
the frames are collected and fed to an MPEG-4 The WireGL project [5], at the Stanford
component to be turned into an MPEG-4 stream.
University Computer Graphics Lab, is a good
(See “Figure 1. Example of OpenMosix cluster”)
example of a parallelized rendering library. WireGL
implements the exact same API calls as OpenGL.
A
Therefore any program designed for OpenGL can
run in a distributed environment using WireGL.
WireGL uses a sort first parallelization algorithm
along with some other optimizations for memory
and band-width management. It was initially
IJ
designed for tiled display systems, so essentially, it

divides the work among the various machines in the
cluster by partitioning the camera space into tiles.
Figure 1. Example of OpenMosix cluster Each machine is assigned a tile to render. All the
tiles are then collected and reassembled. For this
project, as a final step, the reassembled tiles are built
into an MPEG-4 stream.
This approach is among the easiest to implement
because it requires no changes to the rendering When tested on a medium sized cluster (16-32
engine to allow it to work. However, there is a nodes), WireGL was able to achieve good frame
reasonably high communication overhead for using rates, even for high-interactivity applications. One of
an OpenMosix cluster. Current implementations the test applications was a high-interactivity game.

Vol No. 4, Issue No. 2, 054 - 058
However, this approach still does not fully utilize Instruction Multi Data parallelization with each node
the capabilities of MPEG-4. running its own set of data. A polygon based
rendering approach will be used as opposed to ray
It is important however to note that the
tracing.
current work will also permit to utilize Matlab and
related HPC toolkit to be used with high efficiency At this point, it is planned that a rendering library
for processing & simulation. approach similar to WireGL will be used. The Mesa
[8] implementation of OpenGL will be used as the
C. Custom rendering engine initial code base. Mesa is open source, mature and
well documented and is therefore an ideal starting
A custom rendering engine is the end goal of this
point.
project. The basic structure of the system will
involve a PVM driven Linux based cluster. The Unlike WireGL, the system will not use camera
interactivity is driven by Python scripts embedded space partitioning. A sort-middle algorithm is
into the Blender files. proposed for the initial implementation. Since the
focus is on rendering and not on network
Parallel Virtual Machine (PVM) [6] is a software
management, much of the 3D modeling data will be
package that permits a heterogeneous collection of
T
stored on each individual machine & GPU
networked computers to be used as a single large
hardware, distributed before rendering begins. This
parallel computer. At its’ heart, it is a C API that can
will reduce network traffic. However this approach
be used to simplify distributed computing.
assumes that each machine has a reasonable amount
Python [7] is an interpreted, interactive, object- of memory and sufficient polygon processing power.
oriented programming language. Python’s It will be very interesting to see the performance
interpreted nature makes it a good match forES
Blender. Blender provides a growing collection of
Python APIs for a variety of purposes. Python
scripts are available for controlling animation,
textures and most other aspects of the 3D model.
There is even an API for game logic.
boost, if achieved, by application of multi multi-core
GPU hardware.
One interesting aspect of this setup will be
experimenting with how to best utilize the newer
features of the MPEG-4 standard, mainly ability to
specify 3D-objects. It will be possible to study the
The goals of the custom rendering engine should balance between performing work on the cluster and
be consistent with the goals of the project. on the output device. It may be possible to attempt
Therefore, the focus here is on creating a stable different divisions of labor for different output
source code base that can be used for developing devices.
A
and testing new rendering algorithms (See “Figure 2.
Architecture of custom rendering engine”).
CONCLUSION
III.
After the text edit has been completed, the paper
is ready for the template. Duplicate the template file
IJ
by using the Save As command, and use the naming

convention prescribed by your conference for the
name of your paper. In this newly created file,
highlight all of the contents and import your
prepared text file. You are now ready to style your
paper; use the scroll down window on the left of the
MS Word Formatting toolbar.
Figure 2. Architecture of custom rendering engine

The proposed architecture will be based on Multi

Vol No. 4, Issue No. 2, 054 - 058
2000, pp. 60-60.

ACKNOWLEDGMENT [7] http://www.csm.ornl.gov/pvm/pvm_home.html
Authors wish to acknowledge support received [8] http://www.python.org
from the departments of their respective schools. We [9] http://www.mesa3d.org
[10] R Samanta, J Zheng, T Funkhouser, K Li, and J P Singh
wish to thank our families for their enormous “Load Balancing for Multi Projector Rendering
patience and infinite help during our peak Systems”,SIGGRAPH/Eurographics Workshop on
development period. Also, we wish to thank our Graphics Hardware, Los Angelos, California - August,
Academic partners Nvidia Corporation for hardware 1999.
donation and rndrepository.com for providing free [11] A. Bilas, J. Fritts, and J. P. Singh. “Real-Time Parallel
online resources. MPEG-2DecodinginSoftware.”Proceedings of
International Parallel Processing Symposium, 1997.
[12] ] Y.Chen, C.Dubnicki, S.Damianakis, A.Bilas, and K. Li.
REFERENCES “UTLB: A Mechanism for Translations on Network
Interface.” In Proceedings of ACM Architectural Support
[1] A. Joshi and S. Ismail, “Experimental parallel architecture for Programming Languages and Operating Systems
for rendering 3D model into MPEG-4 format” .Published (ASPLOS-VIII),pp193-204, October 1998.
in the Journal ‘Proceedings of World Academy of [13] T.W.Crockett. “An Introduction to Parallel Rendering.”
Science, Engineering and Technology’, Volume 38,
February 2009, (ISSN 2070-3740) pages 63-65 Parallel Computing,Vol 23,pp819-843, 1997.
T
[2] http://www.chiariglione.org/mpeg/standards/mpeg- [14] S. Upstill, The Renderman Companion, Addison-Wesley,
4/mpeg-4.htm Reading, MA, 1989.
[3] http://www.blender.org [15] B Schneider, “Parallel Rendering on PC Workstations”,
[4] http://openmosix.sourceforge.net/ I nternational Conference on Parallel and Distributed
[5] http://spot.river-styx.com/viewarticle.php?id=12 Processing Techniques and Applications (PDTA98), Las
[6] G. Humphreys, I. Buck, M. Eldridge, and P. Hanrahan, Vegas, NV, 1998.
“Distributed rendering for scalable displays”, SC2000: [16] M. Berekovic, P. Pirsch, "An Array Processor
ES
High Performance Networking and Computing, ACM
Press and IEEE Computer Society Press, Dallas
Convention Center, Dallas, TX, USA, November 4–10
Architecture with Parallel Data Cache for Image
Rendering and Compositing,"cgi, p. 411, Computer
Graphics International 1998 (CGI'98), 1998
A
IJ

10 IJAEST Vol No.4 Issue No.2 High Performance System For The Interactive 054 058

Transféré par

Informations du document

Description originale:

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

10 IJAEST Vol No.4 Issue No.2 High Performance System For The Interactive 054 058

Transféré par

Droits d'auteur :

Formats disponibles

A Joshi, et al.

/ (IJAEST) INTERNATIONAL JOURNAL OF ADVANCED ENGINEERING SCIENCES AND TECHNOLOGIES

High performance system for the Interactive

Introduction Therefore, the project will try to incorporate

ISSN: 2230-7818 @ 2011 http://www.ijaest.iserp.org. All rights Reserved. Page 54

second will be based on Win 7 Ultimate 64 bit

Sponsors:Nvidia Corporation & rndrepository.com

ISSN: 2230-7818 @ 2011 http://www.ijaest.iserp.org. All rights Reserved. Page 55

designed for tiled display systems, so essentially, it

ISSN: 2230-7818 @ 2011 http://www.ijaest.iserp.org. All rights Reserved. Page 56

by using the Save As command, and use the naming

Figure 2. Architecture of custom rendering engine

ISSN: 2230-7818 @ 2011 http://www.ijaest.iserp.org. All rights Reserved. Page 57

2000, pp. 60-60.

ISSN: 2230-7818 @ 2011 http://www.ijaest.iserp.org. All rights Reserved. Page 58

Vous aimerez peut-être aussi