Back
in 1996, a new company known as 3dfx changed the
world when it introduced the Voodoo1 Graphics
chip, one of the first add-in hardware graphics
accelerators for the personal computer. Graphics
on the PC would never be the same, as the dedicated
graphics processor gave developers their first
real 3d rendering tools, and gave birth to modern
rendering techniques that all of us now take for
granted.
With
hardware acceleration, game authors could increase
resolution, while inserting new lighting effects,
textures, shadows and reflectivity, all of which
were shown off to an amazed gaming world in GLQuake.
Hard core gamers rushed to buy a pair of Voodoo2
cards (the first SLI solution), and enjoyed double
frame rates, and a max resolution increase from
8x6 to 10x7. Soon, those without hardware acceleration
were left behind, as OpenGL and DirectX5 imposed
new hardware requirements, and presented developers
and gamers with even more to play with.
But this is 2006! I was in diapers when GLQuake
was released, what does this have to do with multi-core
technology?
Well, folks, hold onto your hats, because on Wednesday
Valve’s Gabe Newell likened the arrival
of 2 and 4 core CPUs to the release of the first
GPU, and stated quite clearly his company’s
commitment to leveraging the new technology to
its fullest – a process Newell called “painful
and expensive” but also “critical.”
As we saw during Valve’s presentation, that
process is already well underway at Valve’s
Bellevue, Washington facility.
The
Present State of Gaming
We
have all borne witness to the breakneck pace at
which hardware vendors have been pushing graphics
cards to the limits. Just over three years have
passed since cards like the ATi 9800 Pro debuted,
but here we are today with triple the fill rate
and memory bandwidth of that solution, and six
times the number of parallel shaders, to say nothing
of the multi-GPU solutions now becoming commonplace.
However,
as games have become prettier and prettier, the
behind the scenes operations that add to realism
and immersion have lagged behind. As Brian Jacobson,
Valve Senior Software Engineer and one of our
hosts put it, “I can render a photorealistic
person, but I can’t make it act like a person.”
Serial
Code Paths
For
single threaded applications, game code runs “in
serial,” meaning that a series of queries
and calculations takes place before each frame
is rendered, and that each query or calculation
in the series depends on input from the prior
query or calculation to do its part, and therefore
must wait its turn. An example might be as follows:
-
Build
asset lists (textures, sprites, lighting sources)
-
Build object lists (crates, doors, walls, ammo
packs)
-
Update animation (from physics modeling, user
input, NPC actions, environmental actions and
changes)
-
Compute shadows
-
Draw frame
These
calculations take up substantial processing power
to complete, and must be completed three times
for a single scene you might see in Half Life
2 – once to create the player’s point
of view, once for the world as it is reflected
in water, and once again for the view through
any video monitors on the level! This sequence
represents a ton of operations, and any inefficiency
in the process can produce lag in the game.
The
rendering speed and detail level available in
today’s powerhouse GPUs can make the world
within the game look real with ever-increasing
ease, but what developers want is to find ways
to make the world ACT real. With the visual component
nearing its peak, the simulation component is,
according to Valve software engineers, what will
take games to the next level. . .allowing greater
interactivity with the environment (imagine leaving
accurate footprints in dew-covered grass), better
AI (elimination of unnatural NPC behaviors), and
more realistic particle effects (imagine smoke
swirling and eddying behind an object passing
through it). All these things are easy to draw
– but the CPU power needed to make the necessary
calculations simply isn’t available on a
single core machine.
Multi-Threaded
Strategies
The
first problem faced by those entering the world
of multi-core optimized code is that a decade
of effort has been spent writing serial code for
single cores. As
some of our recent Kensfield benchmarks here at
DH showed, software only realizes
gains from multiple cores when it is written to
efficiently execute multiple threads. The basic
theory of multiple threads across multiple cores
is obvious. . .multiple cores can run instructions
in parallel, and divide a large task into smaller
ones. Less obvious, from the programmer’s
standpoint, is how to divide that work.
Course
Threading
“Course
Threading” has one significant advantage
over other multi-threaded alternatives, and that
is simplicity. Under this model, each subsystem
within the game runs on its own core. For example,
rendering, AI and sound subsystems would each
have their own core. The threads would have to
be synchronized of course, adding to the potential
of delay and game lag, but that is an issue faced
anytime parallel instructions are executed.
In
their first forays into multi-threaded code, Valve
experimented with the “course” approach,
as it lent itself well to the bifurcated structure
of the Source game engine. As you are probably
aware, Source uses a server/client model, in which
the game Client is responsible for user input,
rendering, and graphics simulation, while the
Server side manages AI, physics and game logic.
The first step then, was to give each half of
the game its own core. In “contrived”
maps, this arrangement led to near perfect results,
with the multithreaded code realizing double frame
rates over its serial counterpart. However, in
real world maps, this advantage fell to a 1.2x
increase. . .while the client core was pegged
at near 100% CPU utilization, the server core
spent too much time at idle, only using 20% of
its potential, reflective of the work load of
the two engine components.
Fine
Threading
As
opposed to the heavy hand of course threading,
fine threading attempts to take a more educated
approach to dividing workload by spreading identical
operations across cores. In the game environment,
this approach is well suited to looping operations
which continually perform the same functions,
such as lookups which update the positions of
objects within a map. In early experimentation,
fine threading showed good scalability (meaning,
two cores were twice as fast, and four twice as
fast as that), and moderate difficulty in coding.
Hybrid
Threading
After
examination of the models described above, it
became clear that different portions of Source
engine code were better suited to different threading
strategies, thus the birth of “hybrid threading.”
Some systems, such as sound, do well simply isolated
on their own core. Others, such as the looping
lookup function discussed above, are more efficient
running parallel across multiple cores. Additionally,
using a combination of course and fine threaded
processes allows the developers to get closer
to the holy grail of game coding. . .100 percent
efficiency, with no unused CPU cycles.
|
|
Multi-threading
at Work
It
took a while, but here is the nugget many of you
were probably seeking. . .what will these optimizations
do for me, and when? Well, you will be pleased to
know that multi-core optimizations will be delivered
over Steam prior to the release of Episode 2! According
to Valve, these optimizations should result in immediate
performance improvements for dual and quad core
rigs. Keep an eye on Driver Heaven, and we will
provide benchmarks for you as soon as they become
available.
While
the present is exciting, the future is more so.
As I alluded to above, Valve engineers are busily
seizing upon the opportunity to improve the calculations
behind the eye candy, in order to make the Source
universe, and Half Life in particular, more immersive.
Multi
core machines will afford two major areas of improvement
that are already being explored on Valve’s
test beds as you read this. The first is in the
area of AI. With more computing power on tap, AI
programmers like Valve’s Tom Leonard will
improve AI behaviors, making NPCs better tacticians,
and better at adapting to and using their environments.
Consider a combine soldier “looking for cover”
– the program must query the world, taking
into account line of sight, available structures,
and changes in the environment. The process is computationally
intensive, and must be done for every NPC “trying
to take cover.” With more CPU power to burn,
Tom & his colleagues will make AI smarter, better
at adaptation, and less likely to kill immersion
by doing something, well. . .stupid!
The
second area being explored is particle effects.
As many of you have likely noticed, particle effects
in games look pretty, but often are completely divorced
from the universe of the game. Smoke and dust hang
in the air, but don’t react to forces acting
upon it, like a hand passing through it, or wind
blowing into an open window. With multithreaded
code, we will be seeing this change in the near
future (but no, not in Episode 2).
The
Demos
One
of the real treats of Hardware Day 2006 was to see
some of these effects in action, and see them on
a Quad Core Kentsfield, sporting an X1950XTX and
huge flat panel display! We were shown two demos,
one each on AI and particle effects. The AI demo
was really fascinating, and really showed the potential
within Source that is about to be unleashed.
In
the AI demo, the player first walked into a room
where hundreds of orange bugs swarmed, each one
of course running an AI routine which dictated its
behaviors. This was impressive in itself for the
sheer number of creatures, but then in another room
(Collision Detection), the same swarm navigated
over and through a dozen or so static obstacles,
and in the third (Advanced Collision Detection),
over obstacles that were influenced by their movement.
Boxes toppled with typical Source/Havoc realism,
and you could clearly see the bugs reacting and
adapting to the changes they had wrought. We were
told that the calculations needed to keep a swarm
of this size moving would bring a Pentium 4 to its
knees, but the Kentsfield pumped out the data as
quickly as it could be drawn – AWESOME.

The
above graph show results using Valve’s Particle
Simulation Benchmark, and shows the raw power and
effective scaling of the Kentsfield setup. The demo
was provided to us for additional testing, so I
have some included short captures of the effects
in action below. They speak for themselves, loud
and clear! (click below for WMV Videos - problems
streaming? right click save as to desktop).

Rainy |

Fountain |

Puffy |

Implosion |
Final
Thoughts
I
was a bit skeptical at first of Valve’s position
that multi-core CPUs would usher in a new era in
gaming, as the first graphics accelerator had, but
after listening to the Valve engineers, seeing the
excitement on their faces, and experiencing the
demos first-hand, I am a believer. The coming years
should see come truly amazing advances in the immersion
of gaming, and it seems that Valve is ready to make
the investments needed to lead the way.
However,
one thing did occur to me as I listened to Tom Leonard
outline his goals for advancement of AI. I thought
back to the release of Half Life 2, and I remembered
how the Source Engine, unlike that of iD’s
Doom 3, was very forgiving of less-than-bleeding-edge
hardware. Sure, someone with a 9800 Pro had to turn
down visual settings to get playable frame rates,
but the game still looked very, very good, and the
game play experience was the same across a variety
of hardware.
However,
if Valve leverages the power of multi-core, and
quad core in particular, to improve AI routines,
single core users will surely be left out in the
cold when those advanced features are turned off.
In other words, lack of computing power will substantively
change the game experience for those who don’t
upgrade, because NPCs won’t be doing the same
things.
That
is a change in posture from the initial release
of HL2, and may send a lot of people running to
the computer store.
Thanks
to Gabe Newell, Doug Lombardi and the rest of the
Valve staff for hosting DriverHeaven at this fascinating
event |