Friday, May 18, 2007

Star Trader Communication System + Skeletal Animation

In Star Trader the player will be able to communicate with both player and non-player characters (NPCs) indirectly via "sub-space channels", and directly via the com (communication) screen. When the player initiates a com screen dialogue with another player or a trading station, a little screen will pop up with the other party's avatar and bunch of options underneath the screen for what the player wants to do next. Here's a (very) rough mock-up to illustrate what I mean (source art from Space Quest 6).

This screen represents the space station com screen.

And this screen is a mock-up for the player-to-player/player-to-npc com screen.

Because the player will be looking at an actual person (or uh, alien... thing), I needed to implement some kind of character rendering to accommodate this. Those characters would also need to animate so I would need support for skeletal animation, which my engine did not at the time support. Because of this, over the past few weeks I've been slowly building a prototype of the skeletal animation system which I am almost ready to incorporate into the engine. For this I ended up creating a my own custom Collada importer because I was quite unhappy with the DOM and FCollada libraries, and surprisingly it works quite well! The last major thing I need to do is convert form the Z-Up axis format to Y-Up and optimize the vertex lists.

Now the skeletal animation system is not quite done yet but here's a short video of it in action using the Mona Sax model from Max Payne 2 as a test model:

I had to manually rig and animate the model myself (since only the source model was available in the MP2 SDK) so although it may not be too impressive it's something at least (until someone makes me a better test model :-). Whats in place so far is standard skinning through software or GPU. I'm storing the transforms as quaternions and keyframe interpolation works perfectly but I still need to implement multiple animation channel blending (e.g. for running and waving at same time) and transition blends (e.g. smooth transition from run to jump). I'm not quite sure how I want to do that just yet so it may take me another week or so to completely finish it up.

The Mona Sax model was originally made for a fixed function pipeline with no per-pixel lighting so I had to create some normal and specular maps to go along with the phong lighting I'm using. One thing I did that I am very happy with the results of is storing not only the specular color mask in the rgb channels of the specular map but also the specular power in the alpha channel. What this means is that I can render an entire character in one pass with very diverse levels of shininess. Thus the skin be relatively matte but the eyes nice and shiny. Here's an example:

The eyes and lips have a specular power (shininess) of about 200 or so and the skin is varies around 50-150. I plan to eventually add some pseudo subsurface scattering at some point but right now I'm keeping it pretty simple.

Now one of the other pre-requisites for my character system was some sort of support for facial animation. The most straight forward way to do this would have been to use skeletal animation with a bunch of face joints to animate important muscle groups. There are definitely a number of advantages to this. For starters the animation pipe-line is unified and easily modifiable by an artist. Second, skeletal animation is relatively efficient and quick to do in hardware. Also it's relatively easy to blend multiple animations together to achieve a nearly infinite number of facial expressions.

Skeletal animation however is not necessarily the only and best way to do facial animation. Another option is using morph target animation, which is what I ended up going with. With morph target animation, instead of interpolating between different poses of a skeleton which deforms a mesh, we interpolate, or morph, between multiple geometry shapes. The really nice thing about using morph target animation over skeletal animation is that you can directly model the poses you want to deform to without having to worry about proper joint placement and weighting. For facial animation this is especially nice since you can do things like creases and subtle muscular skin deformations that would take a ton of joints to pull off.

An additional point to think about, and bear in mind I don't have any evidence to directly back this up, I suspect that on especially complex (high polygon) models it can be much more efficient to use GPU accelerated blend pose animation over skeletal animation. Now while I concede that software based blend pose animation will never beat out skeletal animation performance wise, doing the calculations on the GPU offers an incredible opportunity for taking advantage of the GPUs parallel nature. Some further points:
First, that vast amount of joints is potentially more than can be rendered in a single pass since only a certain number of constant registers are available on older vertex shader profiles (like vs 2.0). This might require additional passes as the model must be segmented to accommodate multiple HW accelerated renders.

Second, most GPU skinning implementations do not go beyond 4 joint influences per vertex. With a large number of joints in close proximity, the effect of the surrounding joints for a particular musculature would be incredibly weak, making it very difficult to achieve good results, requiring a lot of tweaking.

Third, if you're blending between 4 joint matrices per-vertex (the average influence count), you're talking 4 matrix adds (at 4 instructions each), plus 4 scalar/vector multiplies (for the weights), plus a matrix multiply for each vertex attribute that must be transformed, so for the position, normal, tangent, 3 matrix multiplies (at 4 instructions for each m4x4). Roughly lets say that's 32 instructions total (16 + 4 + 12).

For GPU accelerated morph target blending, you load up all the targets into vertex buffer, then set the active ones to the appropriate vertex streams with only a max number of simultaneous blend targets active at once (and mapped to the proper vertex attributes). So lets say there were a max of five blend targets possible at one time. Using delta morph targets (with a base target acting as the delta source), to accumulate the result of each morph target requires a single scalar add and multiply, which is 1 instruction (madd). So for 5 morph targets, we're using 5 instructions for position, plus 5 instructions for the normal. The tangent can be derived from the morphed normal and the base tangent via orthonormalization (this takes about 4 instructions) and the binormal is derived via a cross product (as would be in the skeletal animation case as well so we won't count that).

That gives us a total of 14 instructions for morphing over skeletal animation's 32 instructions per-vertex. Although using multiple vertex streams is not as efficient as interleaved vertex arrays, you're also not sending huge lists of joint matrices to the vertex shader (just a small list of morph target weights) which about evens it out. So in the end I think it's pretty obvious that for complex models that require a lot of animation detail, GPU accelerated morph target animation can be just as fast if not much faster than using skeletal animation with a high joint count. There's next to no CPU work involved for blend targets, whereas skeletal animation requires a skeletal hierarchy to be maintained (traversed/accumulated, animated, interpolated, keyframes blended etc...).

Now I'm not advocating switching your engine to using pure blend target animation here, I'm just saying that for certain situations (like facial rendering), blend target animation can be a big win. It should be noted however that older games like Quake 3 actually did use blend target animation ("vertex animation") to animate it's characters, but this required a LOT of blend shape keyframes as the linear interpolation results in horrible artifacts if the poses differ too much. For large animations this requires a lot of memory (and is proportional to the vertex count for that model, so it can be REALLY big for larger models). Also you can't reuse an animation between multiple models as you could with skeletal animation so it's a lot of wasted time for artists.

In the end here's the result of a keyframed blend target animation using 3 poses (base, open mouth, closed eyes):

Of course this is nothing compared to the new face demo nvidia just released. Here's the vid if you haven't checked it out yet:

I don't plan to go quite that extreme with my implementation but it's definitely exciting to see whats possible using current generation hardware. Their lighting implementation appears to be the same one used in the Matrix movies (blurred texture space lighting), but the brunt of what makes that look so nice probably has a lot to do with the high resolution textures they used. Such a shame you need Windows Vista and a DX 10 part to see this on your own computer.

Aright I'm out, later.

p.s. As I was writing this I suddenly noticed a little pop-up bubble next to the "save now" button which says "Now Blogger saves your drafts automatically!" - exactly what I was complaining about last time! Damn I love Google.

No comments: