Electrical and Computer Engineering Dept.
Computing Architectures for
Virtual Reality
Computer
(rendering
pipeline)
System architecture
Computing Architectures
The VR Engine
Definition:
A key component of the VR system which reads its input devices,
accesses task-dependent databases, updates the state of the
virtual world and feeds the results to the output displays.
It is an abstraction – it can mean one computer, several
co-located cores in one computer, several co-located computers,
or many remote computers collaborating in a distribute simulation
Computing Architectures
The real-time characteristic of VR requires a VR engine
which is powerful in order to assure:
fast graphics and haptics refresh rates (30 fps for graphics and
hundreds of Hz for haptics);
low latencies (<100 ms to avoid simulation sickness);
at the core of such architecture is the rendering pipeline.
within the scope of this course rendering is extended to include
haptics
Computing Architectures
The Graphics Rendering Pipeline
The process of creating a 2-D scene from a 3-D model is
called “rendering.” The rendering pipeline has three
functional stages. The speed of the pipeline is that of its
slowest stage.
Application Geometry Rasterizer
The Graphics Rendering Pipeline
Old rendering pipelines were done in software (slow)
Modern pipeline architecture uses parallelism
and buffers. The application stage is implemented in software,
while the other stages are hardware-accelerated.
Modern pipelines also do anti-aliasing for points, lines
or the whole scene;
Aliased polygons Anti-aliased polygons
(jagged edges)
How is anti-aliasing done? Each pixel is subdivided
(sub-sampled) in n regions, and each sub-pixel has a color;
The anti-aliased pixel is given a shade of green-blue
(5/16 blue + 11/16 green). Without sub-sampling the
pixel would have been entirely green – the color of
the center of the pixel (from Wildcat manual)
More samples produce better anti-aliasing;
8 sub-samples/pixel
16 sub-samples/pixel
From Wildcat “SuperScene” manual
http://62.189.42.82/product/technology/superscene_antialiasing.htm
Ideal vs. real pipeline output (fps) vs. scene complexity
(Influence of pipeline bottlenecks)
HP 9000 workstation
Computing Architectures
The Rendering Pipeline
Application Geometry Rasterizer
The application stage
Is done entirely in software by the CPU;
It reads Input devices (such as gloves, mouse);
It changes the coordinates of the virtual camera;
It performs collision detection and collision
response (based on object properties) for haptics;
One form of collision response is force feedback.
Application stage optimization…
Reduce model complexity (models with less polygons –
less to feed down the pipe);
Low res. Model Higher resolution model
~ 600 polygons 134,754 polygons.
Application stage optimization…
Reduce floating point precision (single precision
instead of double precision)
minimize number of divisions
Since all is done by the CPU, to increase
speed a multi-core architecture is recommended.
Computing Architectures
The Rendering Pipeline
Application Geometry Rasterizer
Rendering pipeline
The geometry stage
Is done in hardware;
Consists first of model and view transforms
(to be discussed in Chapter 5);
Next the scene is shaded based on light models;
Finally the scene is projected, clipped, and
mapped to the screen coordinates.
The lighting sub-stage
It calculates the surface color based on:
type and number of simulated light sources;
the lighting model;
the reflective surface properties;
atmospheric effects such as fog or smoke.
Lighting results in object shading which makes
the scene more realistic.
Computing architectures
Iλ = Iaλ Ka Odλ +
fatt Ipλ [Kd Odλcosθ + Ks Osλcosnα]
where: Iλ is the intensity of light of wavelength λ;
Iaλ is the intensity of ambient light;
Ka is the surface ambient reflection
coefficient;
Odλ is the object diffuse color;
fatt is the atmospheric attenuation factor;
Ipλ is the intensity of point light source of
wavelength λ;
Kd is the diffuse reflection coefficient;
Ks is the specular reflection coefficient;
Osλ is the specular color;
The lighting sub-stage optimization…
It takes less computation for fewer lights
in the scene;
The simpler the shading model, the less
computations (and less realism):
Wire-frame models;
Flat shaded models;
Gouraud shaded;
Phong shaded.
The lighting models
Wire-frame is simplest – only shows polygon
visible edges;
The flat shaded model assigns same color to all
pixels on a polygon (or side) of the object;
Gouraud or smooth shading interpolates colors
Inside the polygons based on the color of the edges;
Phong shading interpolates the vertex normals
before calculating the light intensity based on the
model described – most realistic shading model.
Computing architectures
Wire-frame model
Flat shading model
Gouraud shading model
The rendering speed vs. surface polygon type
The way surfaces are described influences rendering speed.
If surfaces are described by triangle meshes, the rendering will
be faster than for the same object described by independent
quadrangles or higher-order polygons. This is due to the
graphics board architecture which may be optimized to render
triangles.
Example the rendering speed of SGI Reality Engine.
SGI Onyx 2 with Infinite Reality
Computing Architectures
The Rendering Pipeline
Application Geometry Rasterizer
The Rasterizer Stage
Performs operations in hardware for speed;
Converts 2-D vertices information from the
geometry stage (x,y,z, color, texture) into pixel
information on the screen;
The pixel color information is in color buffer;
The pixel z-value is stored in the Z-buffer (has
same size as color buffer);
Assures that the primitives that are visible from
the point of view of the camera are displayed.
The Rasterizer Stage - continued
The scene is rendered in the back buffer;
It is then swapped with the front buffer which
stores the current image being displayed;
This process eliminates flicker and is called
“double buffering”;
All the buffers on the system are grouped into the
frame buffer.
Testing for pipeline bottlenecks
If CPU operates at 100% – then the pipeline is
“CPU-limited” (bottleneck in application stage);
If the performance increases when all light
sources are removed, then the pipeline is
“transform-limited” (bottleneck in geometry stage);
If the performance increases when the resolution
of the display window, or its size are reduced
then the pipeline is “fill-limited” (bottleneck in
rasterizer stage).
Transform-limited
(reduce level of detail)
Fill-limited
(increase realism)
The Pipeline Balancing
Single buffering
Application Geometry Rasterizer
(75%) (75%) (100%)
Double buffering, balanced pipeline
Application Geometry Rasterizer
(90%) (95%) (100%)
Computing Architectures
The Haptics Rendering Pipeline
The process of computing the forces and mechanical textures
Associated with haptic feedback. Is done is software and in
hardware. Has three stages too.
PC graphics architecture – PC is King!
Went from 66 MHz Intel 486 in 1994 to 3.6 GHz Pentium IV
today;
Newer PC CPUs are dual (or quad) core – improves performance
by 50%
Went from 7,000 G-shaded poly./sec (Spea Fire board) in 1994 to
27 Mil G-shaded poly/sec. (Fire GL 2 used to be in our lab);
Today PCs are used for single or multiple users, single
or tiled displays;
Intensely competitive industry.
PC bus architecture – just as important
Went from 33 MHz “Peripheral Component Interface”
(PCI) bus to 264 MHz “Accelerated Graphics Port”
(AGP4x) bus, and doubled again in the AGP8x;
Larger throughput and lower latency since address bus
lines decoupled from data lines. AGP uses “sideband” lines
Intel 820/850
chipset
Graphics
Accelerator
(memory +
processors
AGP 8x rate ~ 2 GBps
unidirectional
533 MHz x 32 bit/sec
PCI transfer rate ~ 133 MBps
33 MHz x 32 bit/sec
PCI Express rate ~ 4 GBps
bidirectional
yesterday’s PC system architecture
PC system architecture for the VR Teaching Lab
PC system architecture for VR Teaching Lab
Product Port of budget
PC 1.7 GHz NA 48%
(Fire GL2)
Polhemus Com 1 37%
Fastrack
5DT glove Com 2 10 %
Stereo FireGL2 3%
Glasses
FF joystick USB 2%
Java/Jave3D NA 0%
VRML NA 0%
Stereo glasses Fire GL 2
connector
Passive coolers
AGP bus connector
Fire GL 2 architecture
Fire GL 2 features:
27 Million G-shaded/sec., non-
textured polygons/sec;
Fill rate is 410 M Pixels/sec.;
supports up to 16 light sources;
has a 300 MHz D/A converter
Stereo glasses
connector Fire GL X3 256
Passive coolers
DVI-I video output AGP bus connector
Fire GL X3-256 architecture
24-bit pixel processing, 12 pixel pipes
dual 10-bit DAC and dual DVI-I connections
does not have Genlock
anti-aliased points and lines
quad-buffered stereo 3D support (2 front and 2 back buffers)
NVIDIA Quadro FX 4000
500 MHz DDR
Memory
Graphics processor
Unit (GPU)
NVIDIA Quadro FX 4000 architecture
dual DVI-I connections
32-bit pixel processing, 16 pixel pipes
has Genlock
anti-aliased points and lines
quad-buffered stereo 3D support
FireGL X3-256 vs. NVIDIA
Quadro vs 3DLabs
CPU Evolution to Multi-Core
Places several processors on a single chip.
It has faster communication between cores than between separate
processors
Each core has its own resources (L1 and L2 caches) unlike multi-
threads on a single core.
It is more energy efficient and results in higher performance
Multi-core details
AMD64 x2 Architecture
Guts of Native Quad Core (Next Gen)
Graphics Benchmarks
Benchmark established by independent
organization;
Allow comparison of graphics cards
performance based standardized
application cases.
Can be application-specific like
SPECapc (Application Performance
Characterization)
Or general-purpose for OpenGL
architectures like SPECviewperf
for OpenGL-based systems
Accelerator boards viewperf 10.0 benchmark
SPECviewperf™ is a portable OpenGL performance
benchmark;
program written in CSPECviewperf reports performance in frames
per second.
There are 8 tests done at frame resolution of 1280x1024:
3ds Max –modeling, simulation and rendering
CATIA (DX) -CAD design application.
EnSight(DRV)- a 3D visualization package.
Maya- an animation application.
ProEngineer – CAD software
TCVIS radiosity application for large data sets.
Solidworks -3D CAD design
Unigraphics – a digital electronic and mechanical engineering design
Accelerator boards viewperf comparison:
comparison
Updated regularly at www.spec.org
SPECviewperf™ uses a geometric mean formula to
determine scores:
Geometric mean (fps) = (test1 weight 1) (test2 weight 2) …
…. (testN weight n)
Boards Viewperf 10.0 comparison
Introduction to Water cooling
www.tomshardware.com
State of Watercooling
• Water cooling has been performed by
consumers as early as 2006.
• While niche, the field is popular enough for
parts to be mass-produced.
• Parts are usually bought off specialty websites.
• Becoming more popular as more people
construct their own systems
• Self-contained pre-made water systems are
sold retail. (Corsair H70 and Domino ALC)
• Transfers heat from
Waterblock
EK Supreme High Flow – Full Copper
CPU/GPU to water
• Usually made of copper,
which has better heat
transfer characteristics.
• Contact surfaces
machined to mm
GTX 580 with MSI Waterblock
precision - maximize
contact with CPU/GPU
• Sometimes nickel-plated
to protect against
corrosion
Radiator
• Stores heat to be HW Labs Black Ice Xtreme GTX 360
conducted out of loop
• Multiple sizes, usually in
increments of 120mm.
• Usually copper cores
surrounded by large fin
XSPC RX 240
assemblies.
• Designed to work with
certain types of fans,
either performance or
quiet fans.
Reservoir and Pump
• Pump moves water through Swiftech MCP 655 Pump
loop.
• Can be fitted with
aftermarket “Tops” that offer
more fitting options and
higher flow rate.
EK Spin Bay Reservoir
• Reservoir provides easy way
to measure water level and
fill and empty loop.
• Can be replaced by T-line
Coolant, Tubing, Kill coils and Fans
• Coolant of choice is distilled water. Dyes in colored
coolants often separate from liquid and collect in
waterblocks and radiators.
• Tubing is laboratory grade, meant to bend at sharp angles.
Often coated with anti-bacterial compounds.
• Kill coils are anti-microbial strips of pure silver
• Fans are conventional 120mm fans, ranging from 30 to 260
cubic feet per minute, with accompanying increase in noise
levels (14dBA- 60dBA)
• Push-pull configuration - radiator sandwiched by fans
facing the same direction; 3-4o C decrease in temperature.
Results
Cooler Clock Speed Idle Temperatures
Stock 2.66ghz 34°C
TRUE* (Air) 2.66ghz 31°C
TRUE *(Air) 3.8ghz 42°C
EK HF Supreme (Water) 2.66ghz 28°C
EK HF Supreme (Water) 4.5ghz 38°C
*Thermalright Ultra-120 eXtreme
Problems
• Mixing metals (aluminum and copper) in the loop will
cause galvanic corrosion
• Fungal growth in the loop
• Leaks
• Expensive ($400+ for a high-quality loop)
• Space
The Nintendo Wii
•First to introduce a form of player interaction –
accelerometer and IR tracking
•Contains solid-state accelerometers and
gyroscopes.
• Tilting and rotation up and down, left and right
and along the main axis (as with a screwdriver).
• Acceleration up /down, left /right, toward the
screen and away.
•Dramatically improved interface for video
games.
• Innovative controller, integrates vibration
feedback.
•Uses Bluetooth technology, 30 foot range.
•Can send a signal up to 15 feet away. Up to 4
Wii Remotes connected at once.
Wii Remote
• It uses two batteries, and has to be worn with a wrist strap.
• It is wireless, which is unencuberring for the patient
• Needs to be turned on and set a “neutral position” – adapts
to patient ROM
Wii Remote - Nunchuk
• It plugs into the Wii remote extension port
• Allows bi-manual interaction, but is wired to the remote
(reduces arm movement range) and poses safety hazards
New wireless Analog
Nunchuk control
stick
Wireless
transmitter
Wii Zapper
•Wii remote and Nunchuk connected in one “gun-like”
frame.
•For shooting games, makes aiming more accurate, but
slower
Wii (use of shoulder instead of wrist)
Nunchuck
remote
Plastic
frame
The Wii Motion Plus
Has a pass-through expansion port, allowing other
expansions such as the Nunchuk or Classic Controller to be
used simultaneously with the device.
Wii Motion Plus was released in 2009 with the Wii Sports
Resort games
The Wii Motion Plus
It incorporates a pair of resonating gyroscopes which
measure rotation. Sensor is an InvenSense's IDG-1004
integrated dual-axis gyroscope (www.ivensense.com)
PS3 Specs
PS3 CPU: Cell Processor
- Developed by IBM.
- Cell Processor
- PowerPC-base Core @ 3.2GHz
- 1 VMX vector unit per core
- 512KB L2 cache
- 7 x SPE @ 3.2GHz
- 7 x 128b 128 SIMD GPR’s
- 7 x 256KB SRAM for SPE
- 1 of 8 SPE’s reserved for redundancy
- total floating point performance: 218 GFLOPS
Cell Processor Architecture
The PowerPC core present in the system is a general-
purpose 64-bit PowerPC processor that handles the Cell
BE's general-purpose workload (or, the operating
system) and manages special-purpose workloads for the
SPEs.
PlayStation 3 use of the multi-core processor
(IEEE Spectrum 2006)
Screenshot -Gran Turismo
Madden Nextgen Demo
PlayStation Move
PlayStation Eye camera,
charging bay, gyroscope,
accelerometer, Bluetooth
transmitter, vibration motor, and
MEMS compass.
The X-Box 360
Aims at a balance between hardware
software and service
Has a flexible design by abandoning the
nVidia-only deal of the xBox
Uses a multi-core design– like having
three PowerPC CPUs running at 3.2 GHz
Each of the three cores can process two
threads at-a-time (like 6 conventional
processors
Each core has a SIMD unit - exploits real-
time graphics data parallelism
The X-Box 360
The GPU has a Unified Shader Architecture, meaning one unit
that does both geometry and rasterization stage (vs. separate vertex
and pixel shaders)
The Arbiter retrieves commands from the Reservation Stations
and delivers them to the appropriate Processing Engine
The xBox 360 has several Arbiters and 48 ALUs
The X-Box 360
The GPU has embedded 10 MB DRAM for use as a frame buffer
Resolution up to 1920x1080 with full-screen anti-aliasing
The GPU has the memory controller connecting to the 3 cores at
22 GB/sec
Renders 500 million triangles/sec and fill rate of 16 Gsamples/sec
Kinect Background
Kinect uses three cameras to capture objects in 3D
space
– Infrared light emitter
– Depth Sensor
– RGB Camera 640x 480
Array of four Microphones Input
– Designed to find location of voice
– 16hz sampling rate
USB connector with additional power supply.
Sensor Range limit of 1.2-3.9 m
Background cont. -Uses
Angular field of view 57° horizontally 43°
vertically
– motorized to pivot up and down
Play area of 6 square meters
Video output of sensors 30 Hz frame rate
Cost $150
Benefits
Gesture recognition
User awareness
–facial recognition
Voice recognition
Independent of lighting
conditions.
Multiple users
Limitations
Blind spots
-sitting down issues
-only in front of the user
30hz frame rate
-may not be fast enough for some applications.
Space limitation
-Passive user cap
-Movement
Useful only for users who have full range of motion.
Feature Xbox 360 PlayStation 3 Nintendo Wii
Processor 3.2 GHZ 3.2 GHz with 8 729 MHz IBM
PowerPC with 2 cores with 5 execution
dual-thread cores units
GPU ATI 500 MHz NVIDIA 550 ATI 243 MHZ
MHz
Video memory 21.6 GBps 22.4 GBps 3.9 GBps
bandwidth
HDTV output yes yes No
Hard Drive 20 GB 20 -60 GB None (512 MB
flash included)
Ethernet 100 Mbps 1 Gbps None
http://www.winsupersite.com/showcase/xbox360_vs_ps3.asp
Distributed VR architectures
Single-user systems:
multiple side-by-side displays;
multiple LAN-networked computers;
Multi-user systems:
client-server systems;
pier-to-pier systems
hybrid systems;
Single-user, multiple displays
(3DLabs Inc.)
Side-by-side displays.
Used is VR workstations (desktop), or in large
volume displays (CAVE or the “Wall”);
One solution is to use one PC with graphics
accelerator for every projector;
This results is a “rack mounted” architecture,
such as the MetaVR “Channel Surfer” used in
flight simulators or the Princeton Display Wall
Side-by-side displays.
Another (cheaper) solution is to use one PC
only; with several graphics accelerator cards
(one for every monitor). Windows 2000 allows
this option, while Windows NT allowed only
one accelerator per system;
Accelerators need to be installed on a PCI bus;
Genlock..
If the output of two or more graphics pipes is
used to drive monitors placed side-by-side, then
the display channels need to be synchronized
pixel-by-pixel;
Moreover, the edges have to be blended, by
creating a region of overlap.
(Courtesy of Quantum3D Inc.)
Problems with non-synchronized displays...
CRTs that are side-by-side induce fields in
each other, resulting in electronic beam
distortion and flickers – need to be shielded;
Image artifacts reduce simulation realism,
increase latencies, and induce “simulation
sickness.”
Problems with non-synchronized CRT displays...
(Courtesy of Quantum3D Inc.)
Synchronization of displays:
software synchronized – system commands that frame
processing start at same time on different rendering pipes;
does not work if one pipe is overloaded – one image finishes
first
CRT
Buffer
Application Geometry Rasterizer
Synchronization command CRT
Buffer
Application Geometry Rasterizer
Synchronization of displays:
frame buffer synchronized – system commands that frame
buffer swapping starts at same time on different rendering pipes;
does not work because swapping depends on electronic gun
refresh - one buffer will swap up to 1/72 sec before the other.
CRT
Buffer
Application Geometry Rasterizer
Synchronization command CRT
Application Geometry Rasterizer
Buffer
Synchronization of displays:
video synchronized – system commands that CRT vertical
beam starts at same time; one CRT becomes the “master”
does not work if horizontal beam is not synchronized too (one
line too many or too few).
Master CRT
Buffer
Application Geometry Rasterizer
Synchronization command
Application Geometry Rasterizer
Buffer Slave CRT
Synchronization of displays:
Best method is to have software + buffer + video
synchronization of the two (or more) rendering pipes
Master CRT
Buffer
Application Geometry Rasterizer
Synchronization command
Synchronization command Synchronization command
Application Geometry Rasterizer
Buffer Slave CRT
Synchronization of displays:
Best method is to have software + buffer + video
synchronization of the two (or more) rendering pipes
Master Display
Buffer
Geometry+Rasterizer
Application
GPU
Synchronization command
Synchronization command Synchronization command
Geometry+Rasterizer
Application GPU
Buffer Slave Display
Video synchronized displays (three PCs)
release done
(Digital Video Interface- Video out)
Wildcat 4210
(Courtesy of Quantum3D Inc.)
Genlock
Used to synchronize output of graphics card and
connected displays to external synchronization
source.
Ex: Used to
synchronize cameras
with CRT displays so
that scan lines would
not be visible.
Option card for NVIDIA Quadro FX
graphics cards to add multiple display
synchronization.
Frame Synchronization
Frame synchronization is the process of synchronizing
display pixel scanning to a synchronization source. When
several systems are connected, a sync signal is fed from a
master system to the other systems in the network, and
the displays are synchronized with each other.
For proper display synchronization you need:
– Frame Lock Synchronization
– Swap Synchronization
Frame Lock Synchronization
Uses hardware to synchronize the frames on each display.
Very critical for stereo viewing
Synchronizing application buffer swaps across multiple
systems.
Stereo Image Unsynchronized
How is Refresh rate controlled
DDC/CI (Display Data Channel / Command Interface)
– Transmits data about monitor specifications to graphics
hardware and allows hardware to switch monitor
settings.
Vertical Blanking signal
– Controls vertical blanking interval, time difference
between end of one line and beginning of next and the
end of image to beginning of next image.
– Controls monitor refresh rate.
Graphics hardware is still designed with CRT
characteristics in mind.
Until hardware architecture is redesigned for modern
displays, OLEDS and LEDS it still work the way
CRTs did when it comes to synchronization.
http://www.nvidia.com/page/quadrofx_gsync.html
http://www.techpubs.sgi.com
Graphics and Haptics Pipeline Synchronization:
Has to be done at the application stage to allow decoupling of
the rendering stages (have vastly different output rates)
Haptic
Graphics pipe Interface
and Controller
(embedded
Haptics pipe Pentium)
Pentium II
Dual-processor
Host computer
Haptic
Interface
Physics Processing Unit (PPU)
First PPU made by Ageia Inc., called “PhysX
PhysX available as an add on card (see above).
Helps the CPU do computations related to material
properties (elasticity, friction, density);
Better fog effects and more realistic clothing simulation
Better fluid dynamics simulation and collision effects;
Cost $160
Co-located Rendering Pipelines (older design)
Another, cheaper, solution is to use a single
multi-pipe graphics accelerator;
one output channel for every monitor,
Separate geometry and rasterizer chips .
Wildcat II 5110
Wildcat II 5110
Co-located Rendering Pipelines (newer design)
Wildcat4 7210 features:
38 Million Gouraud-shaded, Z-buffered triangles/sec/
400 Megapixel/sec texture fill rate
32 light sources in hardware
Independent dual display support, 2 GPUs
1529x856 frame-sequential stereo @ 120 Hz.
Nvidia GTX 590:
has two GPUs
Can drive three side-by-side
monitors (DVI ports)
Has a mini-display port
PCI-express 2.0 motherboard
Computing architectures
PC Clusters
multiple LAN-networked computers;
used for multiple-PC video output;
used for multiple computer collaboration
(when computing power is insufficient on a
single machine) – older approach.
32 rendering PCs
Chromium networking
architecture
Frame refresh rate
comparison
Princeton display wall using eight LCD rear projectors
Princeton display wall: eight 4-way Pentium-Pro SMPs with
E&S graphics accelerators. They drive 8 Proxima 9200
LCD projectors. (1998)
Clusters of PlayStations
200 PS3 used by hackers to crack SSL
VRX Rack - Ciara Technologies
256 Xeon processors and 1.T TerraBytes of DDR Memory
Best price/performance ratio, Lynux and Windows OS
Ciara VRX
Google Server Farms
Massive computations done on “server farms”
Google owns about 1 Million servers around the world
Massive installation requiring massive energy (including
for cooling the servers)
That is why server farms are located close to water
sources, and cheaper energy sources.
Cloud Computing in the 21st century
Instead of installing all application software locally, and
worrying about backup, updates, slow graphics, etc., the
only thing the user would need on the local machine is a
web browser.
Everything (graphics rendering, databases, security and
encoding) are done on servers and much less powerful and
cheaper client computers will be needed.
the AMD Fusion Render Cloud
Drawbacks of cloud computing
More expensive data storage - cloud server will have at
least twice the storage capacity as the corresponding
individual machines it replaces
The communication is not instantaneous. It is affected by
network quality, the time of day (affects network
congestion), the Internet communication protocol used,
and of course the amount of data transmitted.
Due to network delays, the simulation response is not
instantaneous, and may affect interaction and immersion
into the virtual world, depending on user and application.
Supercomputers
• A supercomputer is a computer which aims to optimize
processing power beyond the scope of commercial PCs
• Modern supercomputers are made by forming
“computer clusters”.
• Supercomputers are used for calculation intensive tasks
such as nuclear detonation simulations, quantum
physics, and weather forecasting.
• The fastest supercomputer was the “Jaguar” until ”the
“Tianhe-I” edged in front of it in 2010.
The RoadRunner by IBM. The first supercomputer to break the
Petaflop (10^15 flops per second) barrier on May 28th 2008
No. 3 The Jaguar by Cray. The fastest supercomputer of
2009 and 2010 until succeeded by the Tianhe-I
No. 2 The "Tianhe-I" ( 天河一号 meaning River in Sky) by
NUDT. 14,336 central processing units, several thousands
nVidia GPUs.
In Nov 2001 the faster supercomputer became the K computer
(Japan)
•is more than 96,000 times faster than your PC (Intel core i7)
•it comprises 864 computer racks, 88,000 CPUs.
•it remains surprisingly energy efficient
In June 2012 the faster supercomputer became the IBM
Sequoia computer (US)
•is 1.55 times faster than K, and does 16 petaflops
•it has 96 racks with 1.5 million of CPUs, compared to 864 racks
and 88,000 CPUs.
•it is the most energy efficient
•http://postbulletin.com/news/stories/display.php?id=1500158
Cheap Supercomputers
• Having the most powerful computer as a PC would be
nice, but it’s not feasible, so we try to make powerful
computers at an affordable cost
• Form a computer cluster with multiple “off the shelf”
devices
• A famous example of this was the US air force combining
1760 PS3’s (168 separate graphical processing units and
84 coordinating servers) to form a supercomputer.
• Known as the Condor Cluster, it’s only 5-10% the cost of
an equally powerful system.
• Also uses 10% of the power of a comparable system.
The “Condor Cluster” by the US Air Force.
Capable of 500 Tera Flops per second.
Computing architectures
Multi-User distributed remote system
architecture:
Multiple modem-networked computers;
multiple LAN-networked computers;
multiple WAN-networked computers;
what is the network topology and influence on
number of users?
IBM RoadRunner
Cost: $133million
Power: 2.35MW
Speed: 1.042 PetaFlops
444.94 Megaflops/W
Cost per hour at $0.11/KWh: $258
Cost per year: $2.2 million
Cray Jaguar
Cost: $104million
Power: 7.6MW
Speed: 2.33(peak) PetaFlops
306 Megaflops/W
Cost per hour at $0.11/KWh: $836
Cost per year: $7.2million
Tianhe-1A
Cost: $88million
Power: 4.04MW
Speed: 2.56 PetaFlops
640 Megaflops/W
Cost per hour at $0.11/KWh: $440
Cost per year: 3.8 million
K
Cost: Unavailable
Power: 9.89MW
Speed: 10.51 PetaFlops
1062.6 Megaflops/W
Cost per hour at $0.11/KWh: $1087
Cost per year: 9.5 million
Condor Cluster(couldn’t find
much information)
Cost: $2million
Power: 10% < average
Speed: 500 TeraFlops
Cost per hour at $0.11/KWh: ~$25
Cost per year: ~$200,000
Comparison
Comparison
Power Efficiency
1200
1000
800
600 MegaFlops/W
400
200
R oadR unner
C ra y J a g u a r
T ia n h e -1 A
0
K
IB M
Server-mediated
communication;
Unicast mode;
Sever is bottleneck
Server
on allowable number
of clients
Client 1 Client 2 … Client n
(adapted from “Networked Virtual Environments”
Singhal and Zyda, 1999)
Server-mediated
Client 2,1 Client 2,2 Client 2,n
communication;
Allows more clients to be
networked over LANs;
Server 2
LAN LAN
Server 1 Server 3
LAN
Client 1,1 Client 1,2 Client 1,n Client 3,1 Client 3,2 Client 3,n
(adapted from “Networked Virtual Environments”
Singhal and Zyda, 1999)
Pier-to-pier communication;
Allows more clients to be networked over LANs;
Can use broadcast or multicast
Reduces network traffic, BUT.. More vulnerable to
viruses, and does not work well over WAN.
LAN
Multicast packets
Area of interest
management
AOIM 1 AOIM 2 AOIM 3 AOIM n
User 1 User 2 User 3 User n
(adapted from “Networked Virtual Environments”
Singhal and Zyda, 1999)
Hybrid network using multiple servers communicating through
multicast – allows deployment over WAN - no broadcasting allowed
WAN
Unicast packets Unicast packets
Proxy Proxy Proxy Proxy
Server 1 Server 2 Server 3 Server n
LAN Multicast packets Multicast packets LAN
User 1,1 User 1,2 User 1,n
User n,1 User n,2 User n,n
For very large DVEs current WAN - do not support multicasting
(adapted from “Avatars in Networked Virtual Environments”
Chapin, Pandzic, Magnenat-Thalman and Thalman, 1999)
Example of distributed Virtual Environment
(connection between Geneva and Lausanne in Switzerland
Cybertennis