""
 

 

ATI introduced Stream Computing on Friday at an event in San Francisco. Stream Computing is not a product per se, but a class of applications that will run on a GPU instead of a CPU, and luckily for ATI, they will do it many times faster than the latest and greatest CPU's out there.

Dave Orton kicked things off with an overview of the whole concept, and he put forward quite a realistic outlook on what it can and can not do. Far from being the best thing for every task out there, Orton started out saying how it may or may not apply to your workload. If the problem you have will map to a GPU, you can see speedups from 10x to 40x, a massive increase. Doing the same math purely on a CPU would take a very expensive computer.

A modern GPU is built from clusters of shaders, both pixel and vertex for DX9, adding in geometry for DX10. Some of the problems map well to pixel shaders, others to vertex shaders, but since neither does exactly the same type of math, you can't use both efficiently. The R600 class of GPUs will bring unified shaders to the mix, this means each shader can do it all.

If you look at it from a strict system architecture point of view, a modern GPU is a cluster of very fast micro cores connected with a frighteningly high speed interconnect and supported with hugely high speed memory. This is how Stream Computing sees a GPU, many mini math engines, a current R580 class GPU has 375 Gigaflops of theoretical performance available with 64Gbps of memory bandwidth. The next generation will have over 500 Gigaflops on tap, but for some reason, ATI would not go into detail.

The name Stream refers to the concept of pulling a stream of data in, processing it and streaming it out as one flow. If your problem supports it, and you program things right, you will end up with a nonstop flow of answers to the data you feed in. If you do it wrong, you get data shuttled back and forth, memory thrashing, and all sorts of inefficient use of compute power. All of these things, while many times faster than a CPU, are not in competition with it, you need a CPU as a controller and to do the many things a GPU simply can not accomplish. It really is a synergistic relationship. It may lessen the need for more CPUs, but won't eliminate it.

The concept of Stream Computing is a cute technical paradigm, but without real world applications, what is the point? Well, as it turns out, Stream has several sets of problems that work well on it now, and will show immediate benefits as soon as you put them on the GPU.

Dave Orton listed the main classes of problems as Scientific Computing, Climate Research, Homeland Security, Wall Street Risk Assessment, Seismic Modeling and Search for the enterprise side. This is where Stream is aimed, but it also maps very well to several consumer applications, game physics being the major one.

Scientific Computing is one of the big targets, and it can reach the top end of the 10-40x speed increase range. If coded right, HPC is a huge win for the whole Stream concept, in one of the later demos at the launch ATI showed a real world 30x speedup on the first release of code.

Then comes climate research, you may have noticed all the talk about hurricanes, tornadoes, wind shear, and the difficulty of forecasting the weather much farther out than 48 hours. The whole climate field is a combination of a lot of science, huge compute resources and a little black magic. We are good at the macro things, we can tell you that there is a hurricane at position XYZ, but what is a little more nebulous is where it will land, how it will turn, and other things that may kill you or level your neighborhood. This calls for an almost bottomless well of CPU power, and luckily maps to Stream Computing well with about a 20x speedup.

Homeland security is another obvious one, things like fingerprint and facial recognition are touted by politicians as the next big thing even if many of them can not tell you how it will actually make you more secure. Searching a database of millions or billions of fingerprints takes horsepower, and that translates into banks of computers or long waits. If you are sitting in a security line at an airport, the simple phrase 'please wait sir, our computers are quite slow today' can be akin to torture. Facial recognition takes orders of magnitude more power than fingerprint recognition, but luckily Stream brings massive speedups there too.

Wall Street is another major target, and let me say this bluntly, stock traders are crazy. They have gone from basing value on how a company does, to how it may do, to how people think that others may think it will do. It is a huge series of what if questions based on monumentally large data sets.

If you spot a gap in the price a stock is at against where it will move, you may have seconds to act with millions of dollars at stake. If you are a percentage point faster than the next guy at figuring that trade off, your payouts can be bigger than hitting the lottery. This crowd will throw cubic dollars at a few percent speedup, 10x is unheard of. Can you ask for more than rich clients willing to spend almost anything for an advantage?

Sure you can, and another one of these is the oil and gas industry. They are another group in a cutthroat competition with others in the same industry to find the next big oil field. With oil in the $70 a barrel range, finding large field is worth more than many medium sized countries. If it takes compute power to model the earth, they will buy as much of it as they can get their hands on.

Last comes things like search, not pressing F3 in windows, but Google. Search is a nearly perfect Stream Computing application, you pull lots of data in one side, pattern match it, and send it out the other side. Google is currently compute bound, or so many people say, and is delaying new services until they can build out more rack space. A 2x increase in speed would be invaluable to Google, 20x is mana from heaven.

Dave Orton also had another example of how search could benefit from Stream. Imagine you have all your digital pictures on your PC, maybe 10 years worth. You know you have this picture of yourself with Bob near a boat, but was it in the summer of 98 or 99, and what was the file name again? With Stream, the compute power needed to answer the query 'find me all pictures of Bob' becomes available. You could do this on a CPU, but if it takes a minute with stream, it may be half an hour on a CPU, and you are not going to use it much with that kind of lag.

The science of Stream Computing has two major components that change how we look at much of computing. The first is - does the problem fit the GPU, IE can you phrase it is such a way that it works with the accelerated math at your fingertips. It may involve looking at the problem from a new angle, or more likely using different algorithms. If there is an algorithm which is 5x slower than the best in class, but you can map that one to a GPU for a 30x speedup, it is still a clear win.

The other hurdle is how you split the problem up into GPU and non-GPU chunks. As ATI keeps pointing out, if you do it badly, you can end up wasting any advantage you might gain by going to the GPU. Luckily there are tools that are finally hitting the market which will ease a lot of the burden there. Stream is in it's infancy, but several companies are putting it to very good use right now.

With that in mind Dave Orton turned things over to four partners who are using or are about to use Stream Computing functionality. They are Folding@Home, Peakstream, Microsoft and Havok. They are all doing radically different things in the same way with the same silicon.

Vijay Pande, Associate Professor of Chemistry at Stanford University was the first, you may recognize him from the Folding@Home project. You probably have seen F@H in action at one time or other, it models how proteins fold up in 3d space. Proteins a long chains of amino acids, and they interact with each other on an atomic level to fold into various twisted knots. If it folds one way, you have one set of functionality, folds another way, you get a completely different effect from the same protein. Imagine there are millions of ways to fold a given protein, and orders of magnitude more ways in which each of those folds can happen.

This level of compute power is completely unattainable by modern clusters, but Folding@Home makes a virtual machine far larger than any you could hope to buy now. About 2 million PCs participate in the program, and it is about the equivalent of a 200,000 CPU supercomputer. One slide they put up with geographic locations of people running F@H, it mapped almost perfectly to areas of there world where there was electricity.

Luckily this problem maps very well to Stream computing, and Vijay Pande is seeing a 20-40x speedup with a copy of F@H ported to the X1900. Problems that did take 30 years to solve can now be done in one year giving hope to sufferers of diseases like Alzheimers. In a demo, the copy running on the GPU was clicking off many frames a second while the non GPU version was having problems getting a single FPS.

The beta of Folding@Home for ATI GPUs will be released on October 2, and they are aiming to turn the project into a Petaflop computer by the end of the day. If enough people join in, they think a 10 Petaflop computer may be possible to hit. This is all much more than theoretical however, Vijay said to expect some results from the program in a few months, all of you who have been contributing will have something worthwhile to show for your efforts.

The stage was then turned over to Peakstream and its VP of marketing, Michael Mullaney. Peakstream makes tools so you can write Stream Computing code, debug and optimize it. There are several parts to the Peakstream toolset, the two most important are the virtual machine and the profiler.

Code written for the Peakstream VM can run either on the CPU or the GPU, it is more of a tool to write an application for and forget, the underlying code hopefully does it all for you. With any luck, the compilers can make all the hard choices and you just fire and forget. If not, there is always the profiler to lean on.

In addition to all the things a normal profiler would do, spot slow and inefficient code points, the Peakstream profiler does one thing of critical importance, it can spot thrashing of data between the host and the GPU. Earlier, I pointed out that this was one of the big “no-no’s” to getting performance out of Stream Computing code, when you are using cycles to shuttle things back and forth or worse yet sitting around waiting, you are not computing. I can see how anyone writing GPU code would need this tool.

One example of this which Michael Mullaney was talking about - oil and gas exploration using seismic waves. The concept is simple, you set off an explosive on the surface, and it sends out shock waves. The rate at which they propagate out depends on the density of the material they are moving through. With strategically placed listening devices, you can literally map out the subsurface structures with striking detail.

The data isn't all that hard to collect, but turning several pings into an accurate 3D model of the world beneath your feet takes a huge amount of number crunching. The data supplied is from the company Hess, a large oil and gas exploration outfit. As you can see, the code running on the GPU with Peakstream tools ran 16x faster than on the CPU. If you have a cluster of 1024 CPUs crunching away day and night on this code, you can knock that down to 64 X1950s and save yourself an immense amount of money, space and electricity.

Many of these apps are ravenous in their appetite for flops, Mullaney said that ExaFLOPS were not nearly enough, ZetaFLOPS could be put to good use. Because of this, it probably won't make Hess's data center any smaller, it will just increase the work it can do. I don't think you will hear them complain though.

There were a few other examples mentioned in passing. One is a homeland security application where they listen to conversations in public places and pick out keywords from the stream. Scary big brother stuff, but probably a lot closer than you think.

Another topic was about as far from the data center as you can imagine, an undisclosed mobile military signal processing application. Instead of using a huge bulky laptop and having it crunch the numbers with the speed of a sloth, the military can do it on a much smaller laptop in less time. When you are in a foxhole in a far off land, bullets whizzing over your head, speed is more than a theoretical problem.

HPC is about a $9 Billion market, and peakstream is uniquely positioned to provide a huge increase in performance to the sector. When people are floored by the 30-40% speed increase of Conroe over an A64, imagine if you could show them a 10x boost? The fanboys would not know how to express their joy on the forums.

On that upbeat note, he turned things over to Chas Boyd of Microsoft who works in the Graphics Platform Unit. There were no big announcements from MS this time around, just a few hints of things to come. MS is actively coding for the GPU, and this will show up more and more in Vista. The UI, Aero Glass is a good example of how it will work, you don't know that it runs on the GPU, nor should you really care, it just works.

Another example is an upcoming photo editing program from MS. It doesn't do anything that the older versions did not, and certainly does not threaten Photoshop, but it brings ease of use to the genre. If you have ever applied a complex filter to a picture, it happens pretty quickly in the preview window, and takes a little longer to apply to the full picture. With Stream Computing, MS can put slider bars up on the right and those filters happen in real time as you drag the slider. This would be unheard of if you had a five second lag at each step.

The other interesting demo involves the ever popular speeding up of sorting algorithms. They had a demo of a tree on a grassy hill, with each leaf and blade of grass rendered correctly. To do this, and to prevent polygons from passing through each other like when you see the arm of a bad guy poking through the door you are about to open, you need to sort all the polygons in real time.

With a humanoid character in a building, this isn't much of a trick, even if many developers can't seem to get it right. Doing it with all the blades of grass in a field is a trick, or at least puts an unacceptable burden on the CPU if it can be done at all. Stream Computing can harness the GPU to do the repetitive heavy lifting here, and it makes animations of a tree rotating on grass possible.

MS is legitimizing the concept for mainstream use. Don't expect to see miracles or a modern game running on an G965, just look for smoother transition, perkier effects, and things that had lag happen now. Ease of use is the key here.

Last up was the one you have been waiting for, physics, the first killer app of the Stream genre, and who better to show it of than Havok? They had three demos, two repeats of their Computex demos, and a new one based on shooting cannon balls into a Lego fortress.

Jeff Yates of Havok started off with a brief history of physics in computing, starting with pong, moving on to simple objects that bounced around in a semi-real fashion. From there, it is on to the future immersive world of full physics simulation and Lord of the Rings style group combat. The hope is that Stream Computing can get you there quicker than any other technique.

Without an official announcement, he pointed out that HavokFX would run on ATI cards, and they did quite well at the physics game. Everything in a game is starting to move toward using physics at the core, and Havok is there to help. OK, this may be a little biased as they are a physics middleware company, but I can see their point.

A CPU can handle many objects in a game, but not a flow of boulders bouncing down a hill needing tens of thousands of interactions here and there. Stream on a GPU can simulate from 1000 to 10,000 objects, and Jeff Yates says a single GPU is worth about 1000 CPUs in this regard. I can just see the next Intel physics demo a Spring IDF, the 1001 CPU cluster for gaming, who needs a GPU anymore?

In addition to the brick wars game, each castle had thousands of blocks, they showed off cloth simulations in real time. This isn't particularly hard to do, nor does it add much to gameplay, but as far as immersion in a game world goes, it makes a big difference.

Everyone is getting into the physics on a GPU game, from game developers to ATI and Nvidia, it is only a matter of time and a bit of experimentation before it becomes pervasive. The whole GPU physics vs PPU card is still an open question with no clear leader in sight.

From Left to Right: Chas Boyd of Microsoft, Jeff Yates of Havok, Vijay Pande of Stanford University, Michael Mullaney of Peakstream, and Dave Orton of ATI.

With that, all of the players came back on stage for a Q&A session. Most of the questions focused around two topics, convergence of GPUs and IEEE compliant floating point ops, and the making of a GPU with the graphics functions cut out to be a Stream co-processor. The short answer to the IEEE floating point question was no, they are not IEEE compliant, nor will they be soon, but are definitely moving in that direction. The answer to the non G GPU was a more emphatic no wrapped in a no comment. Basically, the whole appeal of Stream Computing is you take something that is already there and put it to wider use. A specialized co-processor is not generalized, nor can it be assumed to be there, so it probably won't happen.

Overall, the day went well. ATI was not giving out specifics, nor were they saying anything that was not already out there in the press. What they did do, and did it quite effectively, was to point out that this whole Stream Computing concept is out there, has serious momentum, and is only gaining ground. There are multiple companies using it in currently available products, and the list is growing longer every day.

Stream provides direct and measurable benefits to many classes of users as long as your problem fits the paradigm. It can give you an order of magnitude speed boost in an era of incremental advances, something that it is hard to overstate the importance of. If your app doesn't fit, R600 is just around the corner, and unified shaders with DX10 may widen the applicability range more than many people expect.

 

 

Navigation:
 
Visit DriverHeaven

Copyright ©2002-2006 DriverHeaven.net, All rights reserved.

TechHeaven design based on BlackTeal adapted by craig5320 & Zardon. Additional artwork/DH logo by Zardon. Coding Zardon.
DH logo & Artwork may NOT be used without express permission of the Administration Team, protected under Copyright Law.

DriverHeaven.net Reviews
Style By: vBSkinworks