ati Interview with Maurice Ribble & Roger Descheneaux

 


Driverheaven: What are you job descriptions within ATI?

Roger: I'm a Software Engineer working on OpenGL device drivers. I mostly work on developing drivers for future products.

Maurice: My official title is Software Engineer, but that isn't very descriptive. An actual job description would be that I work on the OpenGL driver team at ATI and solve whatever problems come up. Each person has their specialization and I focus mostly on performance bugs, game optimizations, and other performance related tasks. Being more descriptive than that is difficult because the exact details of what I do change from day to day. That's one of the things I like about this job.

Driverheaven: How long have you worked for ATI?

Roger: I've been at ATI since January of 2001. I worked on the original Radeon 9700 drivers. Before I was at ATI I worked at IBM, also developing OpenGL drivers.

Maurice: I've been working at ATI for about 2.5 years. I came straight out of college so I spent a lot of my first year just coming up to speed.

Driverheaven: Do you enjoy working there?

Roger and Maurice: Sure, and not just because we get to play games at work. We get to work on cutting-edge products with the coolest hardware around, with interesting projects that flex our problem-solving skills, and of course, having ATI's huge pool of smart people whose brains we can pick when we get stuck helps too. What's not to like? :-)

Driverheaven: Tell us a normal working day in your life.

Roger: Work? It's all like play to me. :-)

When I come in to work in the morning I usually integrate all the changes that people have made to the driver since I went home. The OpenGL driver team is very much a cross-site effort. We have active development in Silicon Valley; Marlboro, MA (where I work); Orlando, FL; and Germany. I rebuild the driver and read my email while I'm doing that. Then I usually start straight to work on writing code or fixing bugs.

Part of what I'm doing depends on the development cycle. Early in the hardware lifecycle I'm learning about future hardware and reading specs, talking to the hardware team, and so on. Later I'm planning the design and writing code, and then I'm fixing bugs and working on performance. There's never a clear line- all the work overlaps to some extent- but I usually concentrate on one thing or another.

The OpenGL team is large, of course, and different people work on different things. This is just the sort of thing I work on.

Maurice: Get in in the morning check my email, and look at the web. If my email or a graphics enthusiast site brings up any oddities in performance I'll take a quick look at that. Otherwise I'll go and start working on one of the tasks on my list of things to do.

Driverheaven: Is it true you guys did this in your spare time?

Roger: Yes. We were talking one day about neat things to put into the driver, and we had this idea for the SMARTSHADER concept. It seemed like a pretty neat thing to do, so I talked to my team lead about it. His response was, "It seems interesting, but we can't justify working on that when there's more important development going on." We didn't disagree with that. SMARTSHADER is kind of a neat toy, but that's all it is. Still, we kind of got hooked on the idea. So I asked, "But if I work on it in my spare time, can we still put it in the driver?" My team lead agreed to that.

Spare time for us meant weekends. The SMARTSHADER techniques, both the original ones and the programmable one, were done mostly late Friday nights and Saturday afternoons.

Driverheaven: What inspired you to do this?

Roger + Maurice: I think part of it was that this was the first time we could. The Radeon 9700 had power to spare in the pixel shaders. We were getting Quake 3 scores in the hundreds of frames per second. We also had PS2.0 class shaders, and we just started getting ideas, thinking, "Wouldn't it be nice if we could do this? Hey, and that? And that?" We had the power and the functionality, and we wanted to put it to use.

Roger: As for working on it nights and weekends in our spare time- I think every programmer gets an itch he wants to scratch from time to time. Sometimes you just get an idea in your head and you just have to write the code. I'm just glad ATI let us do it. :-)

Driverheaven: How much time in development has this taken?

Roger: I'd say about a month of weekends for each set of work. It wasn't something we did all at once- just a bit here and there. We seldom both worked on it at once. I think I started both projects, worked on it for a while, got a little sick of it, Maurice worked on it for a while, then I went back to it. We talked to each other about different effects that we'd like.

Maurice: I've probably spent 70 hours on it.

Driverheaven: Will programmable smartshader ever be supported officially?

Roger: That's something for marketing to decide. I kind of doubt it, though. It's kind of fun, but I don't think enough people are interested enough in it to justify spending a lot of engineering cycles developing and supporting it.

Maurice: Personally I hope not. It's currently one ugly hack (albeit an isolated hack) in our driver. For something to be supported I think it should cover corner cases and it's not possible for smartshader to work with every app. If it's officially supported we'd get bugs saying app xyz doesn't work with it and what would we say if it's not possible for app xyz to work with smartshader? It's a cool feature, but not something that should be able to create bugs that take time away from more important driver work.

Driverheaven: How easy is it to write a shader?

Roger + Maurice: The original ones were a pain. We designed pseudo-code algorithms, then hard-coded them by writing code to individually write all the bits in all the hardware registers. It was messy.

The new programmable ones were a lot easier. We built it on top of the ARB_fragment_program framework and just added a scripting language. The scripting language was designed to do exactly what we wanted, so it is the right tool for the job. Maurice converted a dozen shaders from the old method to the new method in a few hours.

There are still a lot of things which could be better. Our error reporting currently isn't that great, which makes them tough to debug.

Driverheaven: Do you think this tool can educate people about shaders?

Roger + Maurice: No doubt it will educate some people about shaders. Some people will be interested in how the effects work and will read the shaders, and maybe they'll start tweaking them just for fun. It most likely won't keep hordes of people up late studying shader programming, but some people will find it interesting enough to learn something.

Some of the shaders are complicated, but some of them are remarkably simple. The "Black & White" shader, for example, is pretty straightforward.

Driverheaven: Do you think its a tool that game developers may find a use for? Possibly to apply radeon only effects to games?

Roger: SMARTSHADER is really a way for the end user to hook his shader into a game. For game developers, the easiest thing to do is to just add the shader code as a post-processing step to the game. SMARTSHADER is a way for people who didn't develop the game and who don't have the source to add their own tweaks to it.

Maurice: OpenGL is a pretty standardized API, and I don't think developers should go around the API and rely on nonstandard behavior.

Driverheaven: What can an end user do with programmable smartshader?

Maurice: End users should be able to do most post processing effects. Some complex things might need to get broken up into multiple passes and become too slow to use interactively, but they can still be done. I'm sure there are some sweet effects people can come up with. I think imagination is one of the biggest limiting factors on what can be done with this.

Roger: The end user can do anything we did in the non-programmable SMARTSHADER. We ported all of our internal shaders to the programmable method (and added a few more besides). The end user can allocate and free buffers, read in textures, perform simple logical operations, and apply shaders to buffers.

Driverheaven: Is there any effect that you would like to see attempted by end users that you havent had time to do?

Roger: We always talked about a Matrix-like effect, with the glowing green symbols scrolling down the screen. That was hard, though, and required some artistic ability. I can write code, but I'm pretty bad in the art department. :-)

It always seemed like you could add other things, like a logo or a clock to the screen by just texturing onto the final image. I don't think we provide the time of day. Maybe we should add that to the language. :-)

Maurice: The old-movie effect could be improved a lot, too, by making the image look dirty and texturing bits of imperfection onto the image. The Matrix has lots of different ASCII effects and some of them would be extremely difficult, and probably not look very good without an artist to tweak each frame like they do in the movie. That said even if it usually doesn't look very good it would still be cool to see someone try. I'm sure there are people out there in the field of image processing that could come up with lots more cool ideas.

Driverheaven: When designing smartshader/the tool were you ever concerned that it may be used to cheat within games? For example if someone found out how to apply a transparency shader to wall textures - they could see where other players were. Have you "anti cheats" hard coded into this?

Roger: We thought about this quite a bit. We don't have anti-cheats as such, but we did avoid putting in features which would aid cheating. For example, programmable SMARTSHADER lets you perform operations on a buffer and to use a buffer as a texture. It won't let you use the depth buffer, though. Since the depth buffer contains information not visible to the player, you could use that information to cheat.

Maurice: We don't believe transparent walls are possible since people only have access to the final frame buffer. It would be like someone taking a picture of the outside of a house with a camera, and then trying to see through the walls by using Photoshop.

Driverheaven: Was there anything you would have liked to add or will be adding to programmable smartshader ?

Maurice: It would have been nice to clean things up a little as far as how the smartshader scripting language works, and maybe add some better error logs. We probably won't be adding much if anything in the future unless the response to this is very strong.

Roger: I just thought of the clock thing a few minutes ago. :-) We have a timer, but I don't think we have a time-of-day clock.

I also considered adding simple rendering effects, like letting people draw points, lines, and triangles. That turned out to be more work than we really had time for, though. It's a lot to put into a little scripting language that you did in a few weekends.

Driverheaven: Can you run through one of the Smartshader Effects, and detail to our users what each line in the code means?

Roger + Maurice: Sure. Here's the Green ASCII shader. It uses a lot of the functionality of the scripting language.

This program converts the image on the screen into ASCII characters.

The algorithm generally works like this:

1. Divide the screen into a series of 8x10 blocks, and average the brightnesses of all the pixels in that box.
2. Decide which character should replace the pixels in that box based on the brightness of the box. Dimmer boxes get dimmer characters, like ‘.’, and brighter boxes get brighter characters which have a lot of pixels lit.
3. For each 8x10 pixel region on the screen, replace the pixels in that region with the pixels in the font we looked up.

shader asciiPixelShader =
Make a smartshader fragment program called asciiPixelShader. The value in the string is a valid ARB_fragment_program. It will be compiled when it is defined, and later, when it is applied to a surface, the program will be run for every pixel on that surface, doing the same thing for each pixel.

"!!ARBfp1.0
A fragment program starts with open quotes and ends when you close them. The ARB_fragment_program extension states you need to start with ARBfp1.0.

#grayscale factors
PARAM const0= {0.30, 0.59, 0.11, 1.0};

Common values for converting to grayscale. Later on we are going to use this constant to multiply the red component by 0.3, the green component by 0.59, and the blue component by 0.11; and then add them all together. These numbers add up to 1.0 so that the brightness of the final image is the same as the color version. The actual values for this constant are derived from how easily the cones in our eyes perceive these different colors. The 1.0 is for alpha and it doesn't matter what that value is.

# numChar-1, charWidth
PARAM const1 = {63.0, 8.0, 0.0, 0.0};

The texture map we saved in ascii8x10.raw has 64 ascii characters in it and each character is 8 pixels wide by 10 pixels high. Characters are ordered horizontally from dimmest on the right to the brightest character on the left. 63 is the number of characters - 1, and 8 is the width of a character.

# windowWidth/charWidth, windowHight/charHeight
PARAM const2 = program.local[0];

This takes a modifiable constant and allows it to be used in the program. The values for this are loaded below. The reason this isn't hard coded like the other three constants for this programs is because these constants take into consideration the window width and height which aren't known at the time you write the shader.

# charWidth, charHeight
PARAM const3 = {8.0, 10.0, 0.0, 0.0};

8.0 is the width of each ascii character and 10.0 is the height of each character.

# 1/asciiTexWidth, 1/asciiTexHeight
PARAM const4 = {0.001953125, 0.1, 0.0, 0.0};

The first constant is one divided by the width of the ascii texture (1/512). 0.1 is one divide by the height of the ascii texture (1/10).

# charWidth/( trunc(windowWidth/charWidth)*charWidth ), charHeight/( trunc(windowHight/charHeight)*charHeight )
PARAM const5 = program.local[1];

Another set of constants that are defined later because they depend on window width and height which aren't known at the time you write this script.

TEMP temp0;
TEMP temp1;
TEMP temp2;

These are temporary variables that will get used to store intermediate results inside the fragment program.

OUTPUT oColor = result.color;
Tell the fragment program the output color for this shader will be the final value in the oColor variable.

MUL temp2, fragment.texcoord[0], const2;
Later in this script we say the source of texture 0 is a downsampled version of the back buffer. Doing that also sets up the texture coordinates for texture 0 so that they vary from 0.0 to 1.0 depending on which fragment of texture 0 is currently being processed. Some example texture coordinates would be the top left pixel would have a texcoord of (0,0), the middle pixel would have a value of (0.5, 0.5), and the top right pixel would have the value (0,1). So what this instruction does is multiply this texture coordinate for our current position in processing texture 0 by winWidth/charWidth and winHeight/charHeight and then stores that in temp2. This is done so we can breakup the back buffer into 8x10 chunks that can be replaced by characters later.

These are vector operations. This multiply will multiply the first component of the texture coordinate by the first part of the constant, the second part by the second constant, and so on. The result temporary is a 4-component vector.

FRC temp0, temp2;
This takes gets the fractional part of the results in temp2 and saves that in temp0.

SUB temp2, temp2, temp0;
This subtracts that fractional value in temp0 from the blockify value in temp2 and stores that in temp2. This gets us pointing to the upper left corner of each ascii character block.

MUL temp2, temp2, const5;
Multiply the previous results by the value in const5. Necessary because down sampled back buffer doesn't line up if the width and height of the original window aren't evenly divisible by the ascii character width and height.

MUL temp0, temp0, const3;
Multiply the fractional part stored in temp0 by the characters width and height and save that in temp0.

TEX temp1, temp2, texture[0], 2D;
Do a texture lookup into the down sampled back buffer for the current fragment. This gets us the pixel value at each location from the back buffer.

DP3 temp1, temp1, const0;
Take the dot product of temp1 and const0. This transforms the current fragments color into grayscale and saves it in temp1. This grayscale value it used to determine which ascii character to use (remember in the ascii texture the left most character is the dimmest one and the gradually get brighter until you are at the rightmost and brightest character).

If you’re not familiar with it, a dot product will multiply the first component of temp 1 by the first component of const 0, the second by the second, the third by the third, and add up the result. So temp1 was the pixel from the frame buffer with red, green, and blue components, and we’ve converted it to the brightness of the pixel in the frame buffer.

The next four instructions use the brightness that we just computed to find the start of the character we’re looking up in the list of characters. We know how bright it is, so we skip over to the right until we find the start of the letter which is the proper brightness.

MUL temp1, temp1, const1.rrrr;
Multiply the grayscale value by number of characters. We really only care about the green value stored in temp1 which contains the number of characters. Later it is convenient to have this value in the red component so here we smear that green value across all the values in temp1.

FRC temp2, temp1;
Get the fractional part of the previous multiply.

SUB temp1, temp1, temp2;
Subtract the fractional portion from the temp1.

MUL temp1, temp1, const1.gggg;
Now multiply the whole number by the width of a character.

ADD temp0.r, temp1, temp0;
Add back on that fractional fudge offset to fix problems if the down sampled back buffer isn't already in a size that matches up nicely with my ascii texture map.

MUL temp0, temp0, const4;
Now multiply our texture coordinates by some the const necessary to get things back in the range of 0.0 to 1.0 which is what the next texture lookup instruction expects. This is the final texture coordinate that we want to lookup in the ascii texture.

TEX oColor.g, temp0, texture[1], 2D;
Do the lookup into the ascii texture and by output only the green channel of the character we looked up.

END";
End the fragment program (don't forget the closing quote).

shader copyPixelShader =
"!!ARBfp1.0
OUTPUT oColor = result.color;
TEMP pixel;
TEX pixel, fragment.texcoord[0], texture[0], 2D;
MOV oColor, pixel;
END";

This simple fragment program looks up the current fragment and passing the exact same color though. It could be used for making a copy of the surface or as we us it here to perform down sampling of a surface.

surface temp1 = allocsurf(width/2, height/2);
surface temp2 = allocsurf(width/4, height/4);
surface temp3 = allocsurf(width/8, height/8);
surface temp4 = allocsurf(width/8, height/10);

This allocates four new surfaces. What we are going to use them for is to down sample the back buffer to a buffer that is 1/8th the width and 1/10th the height of the original buffer. These dimensions were chosen so that each pixel in the downsampled buffer would represent exactly one 8x10 block of characters in the final image. Remember that the characters in the ascii texture are 8x10 pixels. The reason for multiple passes to down sample the image is because with linear sampling you can only blend neighbor pixels together so anytime you down sample by more than 2x per dimension per pass you will be getting slightly wrong results. Things would probably look fine if you did everything in one pass, but we chose to be slow and correct rather than fast in this case.

surface ascii = allocsurf(8*64, 10);
This surface will be the ascii texture. It there are 64 characters stored horizontally. Each character is 8 pixels wide by 10 pixels high.

The “allocsurf” command in the Programmable SMARTSHADER language will create a temporary work area for you to use. The values in these work areas will remain constant unless the window is resized. “Width” and “height” are constants provided by Programmable SMARTSHADER which match the width and height of the window you’re in.

load_texture(1, 8*64, 10, 1, "ubyte", "ascii8x10.raw");
This loads the ascii texture into texture unit 1. We specify its width as 8*64 pixels, its height as 10 pixels, its depth as 1 pixel, and its type is unsigned byte. Then the last parameter is the name of the texture.

texture[0].magfilter = "linear";
Set texture filtering to linear. This averages neighbor pixels together which is what we want for this down sampling pass.

texture[0].source = backbuffer;
destination temp1;
apply copyPixelShader;

This down samples the original to by 2x. We bind the back buffer surface to texture 0, set the destination to temp1, which was a surface we allocated which was one fourth the size of the original (half the width, and half the height), and apply the “copyPixelShader” shader to it. Programmable SMARTSHADER will provide the texture coordinates of each pixel in the destination surface in texture[0] to be used by the shader program. The shader does a lookup using texture[0] using the back buffer as the source. Since the linear filter is on, it will average this pixel with the ones around it. Then it will store the result to that location in the destination buffer. This has the effect of making a smaller image with each pixel in it the average of the pixels in the original image.

texture[0].source = temp1;
destination temp2;
apply copyPixelShader;

Now we are down sampled by 4x.

texture[0].source = temp2;
destination temp3;
apply copyPixelShader;

Now we are down sampled by 8x.

texture[0].source = temp3;
destination temp4;
apply copyPixelShader;

Finally made it to the target of being down sampled 8x in the horizontal and 10x in the vertical.

asciiPixelShader.constant[0] = {width/8, height/10, 0, 0};
Save the window width divided by character width in constant[0] and window height divided by character width. Both of these are used above in the asciiPixelShader fragment program.

asciiPixelShader.constant[1] = {8/(trunc(width/8)*8), 10/(trunc(height/10)*10), 0, 0};
Here we are saving (charWidth/(trunc(windowWidth/charWidth)*charWidth)) and (charHeight/(trunc(windowHeight/charHeight)*charHeight)) to constant1. The trunc function truncates the floating point math that happens inside the parentheses.

texture[0].magfilter = "nearest";
Set the filter back to nearest since we no longer want to average pixels together.

texture[0].source = temp4;
Set the smallest down sampled buffer to be texture 0 on the fragment program used.

destination backbuffer;
Set the destination to be the back buffer so the user can see all this work we have done.

apply asciiPixelShader;
Apply the asciiPixelShader fragment program.

Driverheaven: Thank you very much for taking the time to talk to Driverheaven readers. Any closing remarks?

Roger: We hope people have fun with the new programmable SMARTSHADER toy. If people have ideas, post them in the forums. We don't talk much, but we pay attention.

Maurice: It's been a pleasure. And too bad you didn't ask about any of our unannounced hardware. We engineers just might have slipped... (Just kidding!)


Missed our other interviews?