Bio

Introduction

I'm a Seattle area game and open source developer. Here's my mobygames profile, my linkedin profile, my blog, and my Data Compression Hall of Fame listing. Over the years, I've worked with a lot of interesting and unique people in the PC demo scene, data compression, and video game business. I've spent most of my life coding and working on games, so this bio is heavily biased in this direction.

Early Days

I started programming when I was 9 years old (1985), on a small but powerful 6809E based home computer initially with 16KB of RAM, and got hooked. Starting in BASIC, I quickly moved on to hand-coded machine code, then assembly under Microware's OS/9 once I realized what a pain it was to create opcodes and track jump offsets by hand. (It didn't help that there was a fire in the apartment I was living in, which destroyed my coveted 6809 opcode tables, so I was forced to find an alternative.) A few years later (but to a kid, this delay seemed like a million years) I upgraded to MS's QuickC 2.5 on a Tandy 1000 RLX 286-12 with 1MB of RAM and a 40MB hard drive, which was nirvana. At this point there was no turning back.

Now, I grew up just outside a poor and fairly dangerous area, actually one of the most dangerous cities in the U.S. My father was still fighting the Vietnam war, and my mother was somewhere in space. So programming for me was a means of escape, but I didn't understand this at the time. I didn't really have a plan, I just kept coding and learning from other people's code, and releasing stuff, and somehow just knew and had faith there was a future of some sort to it all.

Graphics and Data Compression

My first compression related project that got any notice from anyone was a modest little GIF decoder/viewer written in Microsoft's QuickBASIC 4.5/PDS, released on FidoNet and on BBS's around 1991-1992. It was slow as molasses (you could watch it write individual pixels across the screen), but it was the first viewer available in QB, and the code was easy to follow and modify. This eventually led to many compression, image file format, image processing, and graphics related projects and experiments. Here's DEGIF6.BAS running in DOSBox after loading a 320x200 .GIF file:

This triggered an interest and fascination with the field of data compression that I still have to this day. The best compression program I knew of or had access to at the time was PKZIP, so naturally I had to first match it, then try to beat it somehow. A few years later, this eventually led to my first shipped product, a ZIP compatible compression library named "Compression Plus" by Elltech Development (now owned by BuCubed Software), I think around 1993. This involved a lot of reverse engineering (basically single stepping through PKZIP in Borland's amazing Turbo Debugger, and studying its output given known inputs) because at the time the Deflate algorithm was brand new and not fully documented (and things like zlib or infozip didn't exist yet). I didn't think about publishing any of this work, because for me the end goal was the product. This codec (compressor-decompressor) was written entirely in 16-bit real-mode asm as a coroutine (which at the time I called "state switching").

I open sourced some of my early LZ77 compression programs written in C to various BBS's. Years later I discovered this work was cited in a paper titled "A New Approach To Dictionary-Based Lossless Compression", by Mesut and Carus.

Skipping ahead a bit, I wrote a new Deflate compatible codec in 32-bit flat mode C and licensed it to a game developer in Dallas named Ensemble Studios. They used it for all savegames (and possibly the game's data archives) in Age of Empires 1 and 2. Skipping ahead again to 2001, Microsoft acquired Ensemble, and in 2005 I went to work for Microsoft in Dallas so I got to work on this code again. I rewrote and optimized the decompressor to be fast as hell on Xbox 360's gimped PPC CPU. I got it up to around 150-200 megabytes/sec. per core. (By comparison, I think zlib at the time only achieved a few dozen MB/sec. on this CPU.) I shared this library (internally nicknamed "eslib") with the guys at Microsoft Game Studios in Redmond WA. This codec was used on Halo Wars, Halo 3 and its sequels, and on Forza 2. All combined, Age1+Age2+Halo3 alone sold in the tens of millions of copies, with my codec feeding them bits.

Anyhow, back to the early 90's. During this time I wrote a lot of little utilities and example programs, which I released to BBS's and on various Fidonet forums. For example, I wrote a Machin Series Pi calculator in all-assembly in '92. It performed various math operations using multiple precision arithmetic, so it could compute Pi up to 36,000 digits on a 286. Somehow its output wound up on the web here. Back in those days, this program took something like 45 minutes to an hour to run on my 286-12 CPU. The last time I tried it on a modern desktop I think it took 2-3 seconds.

During this time I also wrote one of my first rendering programs, in QuickBASIC 4.5, after reading a couple books on offline 3D rendering. It rendered the Utah Teapot with Gouraud Shading and Z-Buffering. It used Matt Pritchard's assembly ModeX library to support 320x240 resolution on a VGA card, and I wrote a separate program to convert the hand-typed patch definitions from a book into a triangle mesh. I still have the source, here it is running in DOSBox:

I released the source to a bunch of other small graphics related programs and experiements to BBS's in those days. I had a particular interest in ray tracing and real-time polygon rasterization. Here's a rough progression:

Looking back, I really wish I had applied this experience to some sort of 3D game. I can imagine all kinds of neat games based off nothing but spinning, shadowed cubes, and in the early 90's this would have been kinda cool. But I guess I was too busy learning and "exploring the space".

Renaissance PC Demo Group

I was involved in a demo group named Renaissance around 1992. I mainly worked with the late Dave Cooper (DJC, aka "White Shadow") on his all-assembly demos. You can see and listen to an early one on Youtube here, with some demo info here. After his death, Dave's work appeared in the Mindcandy DVD (also see here). Our demos where heavily inspired by the Amiga demo scene. They really pushed the 286 CPU and did some bizarre things with the low-level VGA video registers. (Dave was so good he could probably discard with such niceties like assemblers and linkers and just dump the CPU opcodes and datatables of his demos directly from his brain into the final .EXE.) I wrote the assembly MOD audio player software used by several of his demos, including this one.

How I met Dave: He found one of my programs on a local BBS, and back in those days I usually put my contact info (city/phone number) at the top of the source file. I still remember meeting him for the first time and him showing me the assembly source to one of the PC demos he was working on. It looked like a digital form of alchemy, a sort of assembly black magic. He introduced me to the Boxer Text Editor (which I still use to this day), and he loaned me his copy of Michael Abrash's Power Graphics Programming.

To drive home how small the game business is: One day in the early 90's, Dave showed me his new ISA sound card, which was sent to him personally by Tim Sweeney (Epic Megagames). He greets Tim in this demo.

Renaissance was headed up by the infamous Tran (aka Thomas Pytel), creator of the pmode/w DOS extender, Zone 66, and later one of the founders and lead programmers of Paypal. Last I heard, he was last seen hanging out in South America. At one time, Tran tried to talk us into creating a full 16/32-bit DOS replacement (he wanted to completely ditch DOS and just do it all, instead of just extending it), but I think I was too involved in my first big product (PowerView) to have any time to work on this.

DOS Image Viewer Product

Around 1991-95 I wrote a DOS image viewer named "PowerView", which was licensed and distributed by a few multimedia CD-ROM vendors like Walnut Creek. It first shipped on the POV Ray CD-ROM (or here) in 1995. Here's what it looked like:

It was written in C/assembly using Borland C++ 3.1 and TASM, with a custom build system written in QuickBASIC 4.5. Everything was custom coded, including the UI. It was a real-mode app that used Borland's "vroom" dynamic code overlay system, written for 286 CPU's or better. It had an optional protected mode printer module compiled with Watcom C++ 10.5. It could read and write JPEG, PNG, GIF, BMP, LBM, FLI, PIC and a bunch more obscure formats.

If you're a DOS or DOSBox fanatic and enjoy archaeological digs, here's the executable, and here's the full source code. (To build this you need QuickBASIC 4.5 or maybe PDS 7.1, and Borland C++ 3.1.) I can still build this project from source under DOSBox, which for a 20 year old project is pretty cool.

This was the modem/BBS era, and lots of people where downloading individual images or buying CD-ROM's chock full of pics, code archives, games, etc. PowerView could very quickly decompress and view JPEG's on ancient PC's, and it had one of the fastest set of all-assembly decoders around. It could quickly resample, quantize, and dither true color images down to 256 colors, etc. Unfortunately, I didn't realize that Windows 95 was where I should have been devoting my product development efforts, not DOS. Whoops.

As an example of the lengths I went to make PowerView fast, most of the image loaders, decompression, and image processing code was written in all-assembler. For the hell of it, I put up a page showing my best real-mode LZW decoder, written as a coroutine. This decoder was unique in that it processed the LZW stream in two phases. The first phase unpacked the bitwise packed LZW symbols in a "bulk" fashion, and the second phase unpacked the LZW symbols to bytes. By dividing up the decoder this way, each phase was able to keep temporaries entirely in registers. Also, this "bulk" symbol unpacker design was able to process hundreds to thousands of codes at a time, so I created a bunch of hand-optimized loops -- one for 9 bit codes starting at all possible bit offsets, then other sets for 10 bits/symbol all the way through 12 bits. The LZW unpacker was also a stackless design: it tracked the length of each chain, so it wrote the unpacked bytes started from the end and working towards the start of the file. Most if not all LZW decoders I've seen unpack each symbol to a stack first. I know this seems simple and silly today, but back then every cycle still counted, and having really fast code like this was a competitive advantage.

Here's another example of some code from PowerView. This code is the core of PV's fast integer IDCT. It's somewhat interesting, even today, because it uses SIMD-like ops on a CPU that didn't support SIMD. It used single 32-bit adds to process two 16-bit values at a time. Carries where carefully prevented from spilling from the low to high word. This code didn't use multiplies, and if I remember correctly it just added together the various DCT basis functions by reading them from a large number of precomputed lookup tables, using the scaled AC coefficient as an index. It also was ruthlessly optimized at the row and column level to avoid doing any work on zero AC coefficients.

Unfortunately, all this code written in 16-bit real-mode assembly land made porting this viewer to Windows 95 seem like a humongous task, so I moved on to other things.

A bit of PowerView appeared in my first published article/code in the Dec. 1995 issue of Game Developer Magazine. You can read it in PDF format here, and here's the source/exe archive. It was on high quality color image quantization, and was coauthored with Matt Pritchard who wrote the text of the article itself while I concentrated on the code and technical details. I included a new, very efficient C implementation of Wu's color quantization algorithm from Graphics Gems 2 in this article, which was what PV used to convert truecolor images to 8-bit.

Lunch with Michael Abrash

Sometime during the middle of PowerView development, I visited Matt Pritchard (who was writing PV's help system, also one of the first programmers at Ensemble Studios) in Dallas. He somehow talked his way into going out to lunch with Michael Abrash from Id Software. I don't think I made a good first impression, having spent the last 6-7 years locked in a room coding obscure C/assembler routines and stuff and trying to ignore my surroundings as best as possible (because doing otherwise would probably have led to complete madness). But, on the bright side, there was no way he would forget meeting me. (The video game business is a small world: years later I would work with Michael at Valve.)

Now after this point, my life changes forever. I somehow got an interview with a small group of guys making a 1st person 3D game on the PC. The office was in a single room of a 3 bedroom apartment in Hoboken, NJ, of all places. Now this was in 1996, I think Quake was just released, and I had never been to North Jersey. So talk about a reality shift.

First Stab at Game Development - Montezuma's Return

In '96 I started working with the early 80's game designer Rob Jaeger (or on wikipedia here), along with Atman Binstock (lead programmer), and Gary Corriveau (artist/level designer) at Utopia Technologies on a first person 3D game named Montezuma's Return. "Monte" was the loopy and trippy sequel to the old 8-bit game named Montezuma's Revenge. Here's a Youtube video of Montezuma's Revenge here, a video of Montezuma's Return gameplay here, and here's the opening video.

Here's a pic showing the character we nicknamed "George", who for a reason unexplained in the game hated the main character:

I worked on the DOS to Windows 95 port, the DirectDraw/DirectInput code, all of the AI's, a lot of the sound code, the various configuration dialogs and a few of the texture processing tools. We used PowerView's custom DOS installer in the German version of Monte. This shipped in the US in '98.

Now, I mostly worked remotely on this game, from home. Rob was a little paranoid about the title, so for me to get code drops he would ZIP up the source and we would directly connect to each other by phone using modems and fire up ZMODEM to transfer the file data. This was pretty freaking slow, so the data drops where stripped bare with no textures and sometimes outdated levels. So for a very long time I had no idea what the game actually looked like, and when I showed the game to a friend at Ensemble Studios (Matt Pritchard) he thought I had two heads. I still remember Rob yelling at me over the phone to just "make it fun!" while working on the AI code for "George" (go to 2:27 here to see George) and asking him for some direction. That was funny as hell.

Personally, I think Monte was probably at its best and most compelling on maps containing lots of vertical/horizontal ropes. I disliked the fighting mechanics. You can see more video of the game here.

A programmer named John Marco Panettiere wrote some of the original systems for Monte, but by the time the game shipped I'm pretty sure Atman rewrote all of this stuff.

My one regret about Montezuma's Return: After shipping, we should have repackaged it into a new product that contained only the bonus minigame maps, which where the most fun and original things in the whole product. I can't find any videos of the minigames online, they where kind of like some of the crazy bouncy physics puzzles in Portal 2. Monte's music was also pretty cool.

It's not well known, but after Monte shipped Gary and I modded the game to create three new kids games in the "Virtual Playground" series. We finished the games and had GT Interactive (I think) signed up to distribute them. Unfortunately the two owners of the company got in a huge fight, the company collapsed, and so they never got released. The company would be reborn as Sandbox Studios. Here's a pic of one of the games called "Jungle Jokers":

Sandbox Studios/Digital Illusions (DICE)

Utopia Technologies morphed into Sandbox Studios after Monte shipped. The company opened a new office in London, ON and quickly grew to over 20 people. We initially made a bunch of casual, kids, and handheld (Game Boy Advance) games. I wrote a new sprite and game engine, which was used on probably a dozen games, and helped create tools for a few of the handheld games. Here's an incomplete list here.

Probably our most complex (and annoying) game was Matchbox Emergency Patrol PC, a 3rd person perspective car game which used my software and Direct3D 7 renderers, my custom 3DS Max v3.1 exporters and texture/mesh/animation preprocessing tools. The majority of the time I spent on this product went into writing my first (and hopefully last) software renderer, which I honestly didn't really want to write (because I knew GPU's where the "future"). I learned a bunch of things about practical 3D rendering, texture mapping, span buffering, lighting, mesh optimization, etc. while working on this product.

Mattel Interactive absolutely forced us to ship with software rendering support, and we were working under the usual insane deadlines typical of a small game company, so I didn't have much time to make this game look particularly good. It wasn't easy rendering this game on low-end PC's of the era because the city was pretty massive and substantially larger than the indoor Monte maps I was used to. (Atman Binstock wrote Monte's excellent software renderer, which I didn't have access to because the owners of Utopia Technologies got into a legal battle over who owned what.) Looking back, the object "pop-in" in the distance is embarrassing. Anyhow, I worked nonstop on this product for approx. 1 year. I did get time to support both software and hardware T&L in Direct3D 7 (HW T&L was first supported by the brand new Geforce 256), and I added a projected texture shadow under the player's car and per-vertex spot/omni lights at the last minute. I started to add lightmapping to the game, but ran out of time.

Here's a Youtube video of Matchbox. We developed and tested the game under Windows 2000 (not Windows 95/98), so it still runs fine on modern Windows 7 PC's:

Apparently, children loved this game.

Shrek Xbox and Deferred Shading

Shortly after shipping Matchbox we somehow got the deal to do Shrek. Shrek was an Xbox 1 launch title made in less than a year that shipped in 2001, and the first commercial game to use deferred shading. I was able to leverage the real-time software and hardware rendering experience I gained during the creation of Matchbox on this game. Because this was an Xbox-only title, I didn't need to spend any more time messing around with software rendering for crappy PC's. Also, Sandbox was able to hire a lot of very skilled artists who where trained at places like Seneca College, so this game looked a lot better than anything we did before. Around half the maps where created by a very talented artist named Alex Muscat (now an art director at EA on Dead Space), who I worked closely with while creating the rendering engine's albedo/normal map blend materials. Here's a kinda crappy youtube video:

Several amazing programmers where involved in the creation of Shrek, like Atman Binstock (now at Rad Game Tools). The XBox group was pretty excited about the tech in Shrek. Here's a quote from J. Allard (General Manager Xbox Platform) talking about Shrek here:

"We have seen this game at mid-production, and it is very clear that the fundamentals of a great game are all there. We are very excited about Shrek because it is a great example of what Xbox can do," said J Allard, General Manager, Xbox Platform. "I think that we are all going to be blown-away by the end result of this game."

Unfortunately, Shrek turned out to be a pretty, but really really hard and repetitive game. Personally I have no idea how anyone could complete it and not go insane. I wrote the engine's deferred renderer and all of the shaders (and NV2A combiner settings, really), with key help from Atman to get efficient deferred omnidirectional lights to work on Xbox 1 (see here). We also wrote a pretty cool static shadow volume optimization system used to cast shadows efficiently from directional and omnidirectional light sources. We should have published something about all of this pioneering work, but we all went our separate ways after the game shipped so we didn't get a chance. Anyhow, here are some pics:

Various pics of my deferred volumetric fog research (using user defined meshes to control the fogged area), done shortly after Shrek shipped and before I left DICE and started at Blue Shift:

Shrek used exclusively Z-Pass stencil shadow volumes for omni and directional light sources, unlike Doom 3 (which shipped years after Shrek) which used Z-Fail (aka Carmack's Reverse). Now, as pointed out here or here, Z-Pass can be much faster than Z-Fail, but it wasn't robust. Occasionally, the entire screen would flicker. To see an example of this problem, watch this youtube video of an early version of Shrek, starting at the 55 second mark. A week or two before shipping, I changed the game to detect when the camera got too close to any shadow volume (via ray casting), and randomly nudged the camera position a tiny (imperceptible) amount away from the volume faces to avoid the issue enough to ship.

Shrek also used stippling on transparent surfaces like water and glass, which was Atman's idea. I also experimented with half-res G-Buffers and light accumulation buffers on Shrek. In 2004, I extended Atman's transparency approach to support any level of transparency by writing alpha values to a G-Buffer channel and carefully filtering the results in screenspace. All of this was done years before Inferred Lighting (or Inferred Rendering).

The coolest thing that happened while developing Shrek: Two guys from Microsoft flew from Redmond, WA way up to our small offices in snowy London, ON. Seamus Blackley, and Dave McCoy (one of the co-founders of Hidden Path), spent a few hours with us to see what we were doing and offer their assistance. I was totally impressed that Microsoft would go to the trouble of doing this. It really drove home that Microsoft was serious about Xbox.

Blue Shift, Inc.

After Shrek shipped, I realized that living and working in London, ON wasn't really my thing. Also, Digital Illusions (DICE - now part of EA) purchased Sandbox Studios, so everything was in flux.

Luckily, I met a very smart guy named John Brooks, the CTO of Blue Shift, at E3 2001 while showing Shrek and some of our other games at the TDK Mediactive and Mattel Interactive booths on the main floor. John worked on Super Mario's Wacky Worlds, a kinda famous unreleased CDi game, and some of the early Comanche games. The CEO of Blue Shift was John Salwitz, who helped code and design the original arcade games Cyberball and Paperboy. A few months later, he offered me a job at their offices in Palo Alto CA. After Shrek shipped I moved to the heart of silicon valley to work on new rendering tech and help ship their current games.

I did a bunch of work optimizing and improving the look of Sega's World Series Baseball 2K3 on Xbox 1/PS2, which was very well received. I added normal mapping, a new imposter based shadow system, batter self shadowing, much better character lighting (1 direct+3 fill lights), and a new mesh optimizer and "virtual bone" character rendering framework for PS2/Xbox. If you ever played the game, I was the lone and very persistent programmer who added 16:9 720P 60Hz support to WSB 2K3, which wasn't easy (remember this was Xbox 1). It took several extensive CPU/GPU optimization passes to lock the game to 60Hz at this resolution. For what it's worth, I also made the last "Gold" build of WSB 2K3 under a lot of pressure. (I had to quickly fix a bug introduced by me adding 720P!)

While working on WSB, I was involved in the creation and tuning of a sophisticated animation compression system which used high dimensional Vector Quantization. We used this system on WSB 2K3 PS2/Xbox1 to compress our many thousands of motion captured and hand authored animations.

I was brought in at the last minute to fix up and optimize the rendering, shader, and shadowing code for a (somewhat toxic) Xbox 1 game named Toxic Grind. When I started, it was running at 20-30Hz, the framerate would horribly sputter every time you turned your bike around or hit things, and the lighting/shadowing needed a lot of help. It took a few months, but the artists and I brought the game up to a solid 60Hz in time to ship it. Here are some random development pics:





Before I left Blue Shift, I created a really efficient deferred lighting renderer with attribute compression on Xbox 1. This rendering engine was shown running live on a devkit at GDC 2003 and 2004. Here's a Youtube video (more on this in the next section):

Random pics taken while developing Gladiator deferred lighting renderer. I first tested the initial attribute pack/unpack pixel shaders on World Series Baseball 2K3 models:





Unfortunately, in order to cut costs Sega parted ways with Blue Shift after WSB 2K4, which was our primary form of income. Most employees were laid off, and I saw the writing on the wall, so I formed my own company to continue my work in deferred rendering techniques.

Blank Cartridge - Deferred Lighting (aka Light-Prepass) Research

Post-Blue Shift, an ex-Blue Shift coworker (Sean O'Hara, a really good technical artist) and I started Blank Cartridge in 2003. MS was impressed with my work on deferred shading using attribute compression on Xbox 1, which helped us land a deal to make an XNA demo for Laura Fryer's Microsoft ATG (their "Advanced Technology Group" - the software team behind Xbox). I spent a year researching tiled deferred rendering and various irradiance volume lighting and participating media algorithms on prototype Shader Model 2.0 hardware from ATI. I implemented a new deferred rendering engine, which I would later use large parts of on Halo Wars. I had a fully working implementation of what is now known as "Irradiance Volumes" working on an ATI R420 prototype ATG sent me. It used packed SH (Spherical Harmonic) coefficients in several 3D volume textures. It was pretty cool, and something I guess I should have published. I'll put up some pics of this R&D soon.

Here are some random shots taken on an ATI 9800 or an R420 in mid to late 2003. This first sets show my research into jittered PCF sampling on spotlight and dual paraboloid shadow buffers on SM2 hardware. Years later I would apply this experience while working on Portal 2.

This engine actually used tiled deferred shading, way back in 2003. The tiles for each light were computed using a simple conservative rasterizer on the CPU. Sadly, this is something else I should have published but didn't. I was probably the first to implement this approach:

Here are some pics showing my spherical harmonic encoded irradiance volume research in 2003 in the same tiled deferred lighting engine. Indirect lighting was encoded in multiple 3D textures, and the indirect lighting was accumulated using several full-screen quad passes. At the time I thought this method required an impractical amount of video memory per room, so I didn't pursue the idea further until Halo Wars where I computed a simpler version of these volumes using the GPU every frame for direct lighting. On the left is the scene without irradiance volumes, and on the right is the same scene with precomputed indirect lighting applied using SH irradiance volumes:

The following two shots show SH encoded irradiance volumes combined with a 64 sample shadowed participating media effect on a deferred spotlight. To do this in SM2 I traced rays through the spotlight's shadow buffer to compute approximate per-pixel ray occlusion factors. Multiple screenspace passes are used to accumulate the results to work around SM2 shader size limitations:

Other SH encoded irradiance volume shots:

Visualization of a ray marching pass's output (occlusion visualization):

While working on this demo, I had my first encounter with the legendary (some would say infamous) Paul Steed, who was a creative director at Microsoft ATG at the time. There's a funny interview with him here.

Unfortunately, mostly due to limited funding, our XNA demo didn't see the light of day. This R&D and tech would later be used in a prototype at Ensemble named "Wrench", then in a really cool internal demo Ensemble Studios got to show to Bill Gates in 2004 named "SevenDemo". SevenDemo was a physics and graphics demo we put together in about 10 days to demonstrate what the still in development Xbox 360 console would be capable of doing. I was told he was very impressed. A few months after SevenDemo was shown, the Wrench prototype game was canceled (see below), so I rolled onto Age of Empires 3. Later, I used a lot of this tech in what would eventually morph into Halo Wars.

Pics of my later shadow research while working on the Wrench prototype in early 2004. I tried basically every technique I could think of for shadowing a large static world from a directional light. These few shots show my progress with precomputed light occlusion maps (textured using a really simple automatic UV atlas generator), but I also tried stencil shadowing with screenspace bilateral filtering and cascaded shadow mapping.

Ensemble Studios - Video of the "Wrench" Engine Tech Demo

Originally shown to Bill Gates in early 2004. It was running on a prototype AMD card that barely worked after it heated up. More info/background is on Youtube.

Ensemble Studios - Age of Empires 3, Halo Wars

Note that all of the information here is public knowledge, either talked about in magazine interviews, GDC presentations, blogs, marketing stuff, etc. I'm just gathering it all into one place and connecting names in the credits to who actually did the work.

I worked at Ensemble Studios (see here for a great story on how the company started, or here for a history of the company) from 2004-2008, after spending the previous year working at my own company (Blank Cartridge) doing next-gen rendering research. Ensemble was one of the best studios I've ever worked at, and it still amazes me that MS closed the studio with seemingly so little thought to all the talent they spent untold millions recruiting, or the hundreds of millions (more than a billion?) in profits they made off our games. There was an unmatched atmosphere of freedom, teamwork, professionalism, and software engineering standards and confidence at Ensemble that I have yet to see anywhere else. Our builds where generally rock solid, code quality actually mattered, and when something broke we dropped everything to make sure things got fixed so people could be productive. Ensemble's offices where made to look like the deck of the Starship Enterprise, complete with a freaking transporter pad. Ensemble was the real deal, totally first class.

I first worked on the rendering and shader code of a prototype 3rd person car combat project named "Wrench" (also see here). After Wrench was canceled (this kind of game just didn't match our strengths, honestly), I helped modernize and optimize Age of Empire 3's graphics engine. (Age3 looked really good already, but Wrench had rendering and lighting tech that pushed the game even further.) Scott Winsett (one of the original Age of Empires 1 artists, and later one of the founders of Bonfire Studios) and I created the very first publicly shown screenshots (and here) of Age3. Scott and I spent a ton of time working on the art, shadow system, precomputed ambient occlusion map generator, and shaders for this first screenshot which was made completely in-game with no Photoshopping (admittedly, it did run pretty slow on the machines available to us at the time, but it was all in-engine):

Ensemble's Offices

I loved everything about Ensemble's office in Dallas, TX. Totally comfortable 1-3 person offices with a tightly spaced room layout to encourage collaboration and quick informal 2-3 person meetings. The theme was modeled after the Starship Enterprise. If things got too loud in the hallway or whatever you could close your office door and really concentrate. (And nobody thought you were crazy to do this at all. We got a lot of good work done there.) This office was super classy and no expenses were spared to make this place a welcoming, cool and fun place to work. Our studio head (Tony Goodman) had a lot of class and it showed in how he built this office. I regret not taking better/more pictures. It was a purposely dim office space and most of my pics were taken on crappy film cameras. I have some old pics from around the time Age3 was shipped - I'll try to dig up some better pics from coworkers.

Halo Wars

Around the time Age3 was shipping, my manager, the very capable Angelo Laudon (the first employee of Ensemble Studios), was working on a prototype of Age of Mythology that could be played with an Xbox controller. He proved that AoM could be played using a controller without too much friction. This was the very beginning of the project that would lead to Halo Wars (also known as Phoenix), which is still the most successful console RTS ever released. The last I checked here, there have been over 30 million total multiplayer games, and there are still tens of thousands of multiplayer games per day which is pretty damn good for a game released in 2009 by a studio that no longer exists.

There are tons of YouTube videos of Halo Wars. Here's the E3 demo:

We tried to show off the engine's power in our internal/external demos, but the shipping game didn't really utilize all of what it was capable of doing. Here's a shot showing the engine's support for dual paraboloid shadowing mapping which I added in mid 2007, which we used in our first publically shown demo:

In Halo Wars, I oriented the hemispheres vertically, so most shadows where being cast from the bottom (down facing) hemisphere. I actually wrote all this tech originally in 2003, before I started at Ensemble.

We were making an RTS engine, which are traditionally pretty slow when confronted with anything but the usual RTS top->down camera angle. We really wanted the new engine to be capable of more than that. This shot clearly shows the new shadow system I implemented for HW, which used cascaded shadow mapping (1-5 cascades depending on the view), and variance shadow mapping. It also shows off the game's new terrain system.

It's not well known, but for better or worse Halo Wars was literally written from the ground up, except for a few things nobody wanted to rewrite or own, like the game's very solid path finder which came from Age3. For the core engine and renderer, we started with the tech I developed in 2003-2004 for Laura Fryer's MS ATG, who gracefully allowed us to use this code. For middleware, we licensed Granny, Scaleform, Audiokinetic's Wwise, and Havok, while I ported and tweaked one of Microsoft's industrial grade, server class heap systems named Rockall originally from Microsoft Research to Xbox 360. (I was told by several people at MGS that we were the only ones to successfully use Rockall on 360.). For tools, our C# gurus named Andrew Foster (now at Zynga) and Colt McAnlis (now at Google) wrote an entirely new editor in C# using Managed DirectX. Finally, Angelo's team wrote a new RTS simulation engine optimized for Xbox 360, and David Bettner (who later created Words for Friends) created the audio related tools and code.

For the record, Billy Khan (now at Id Software) and I actually did port the Age of Empires 3 engine to Xbox 360 in 2005 before we started the new engine. I think it took 3 months, was the most painful and mind numbing task I ever did there, and ultimately was a waste of time. We somehow got it barely working, but it ran at 3-8 FPS (in Release!), and took like 5 minutes to load due to the random map generator's script processor, which used a custom, but slow, C-like interpreter. I somehow got multiplayer working, which involved rewriting all the ancient Window message based netcode, because X360 didn't support the Win32 message API's. We then attempted to play a few multiplayer games with this ported to X360 version of the Age3 engine, running new maps and custom gamecode created on the PC version by Graeme Devine (the designer of Quake 3) and Angelo.

But actually trying to play the game at this framerate solidly drove home to us that a plain PC port to Xbox 360 would never really work, be fun to work on (few of us on the team really understood the Age3 engine), or be competitive compared to other first-party X360 games. It didn't help that, excluding the multithreaded character skinning system I added to Age3 before it shipped, and the loading screen, Age3 was a completely single threaded game so performance was going to be dreadful even after extensive optimization. There was no way we were going to start with the port as a base, so we completely dumped it and started over.

Now, I'm not saying Age3's engine was bad. Far from it. For starters, Age3's engine was amazingly resilient. You could corrupt almost any data file, delete shit, randomly munge stuff, and it would try (and usually succeed) to soldier on. These properties definitely helped during the port. It was just written in a different era, for relatively powerful single core x86 CPU's, on an OS with tons of RAM and virtual memory. It wasn't written for an in-order CPU, with no HD, limited RAM, and no virtual memory. Also, Age3 also had one of the most amazing looking random map generators, which is tech we lost in the Halo Wars rewrite.

A core team of ~25 engineers, artists, and designers worked seemingly non-stop on this game from late 2005-2007, before it ramped up to consume the entire studio of >100 people. For me, 2006-2007 was a blur, it was almost like that part of my life took place in a different universe. We were totally driven and focused to deliver our studio's first console game, and the first really playable console RTS.

In mid to late 2007 we showed a pretty cool playable demo of early Halo Wars gameplay and art, before the gameplay mechanics got narrowed down. Some of the people who worked on this demo slept in their offices to get this out the door in time. Here's a kinda decent youtube video: part 1 and part 2:

This demo was shown before the artists and I tuned the game's lighting, so it looks kind of dark (not enough fill). The engine was designed to efficiently handle both RTS and 3rd person views (it had a pretty decent terrain and model LOD system, stuff that's rare in RTS's), which this demo exploits in a few places (like the flythrough sequence at the beginning).

Parts of the Halo Wars rendering, lighting, and shadowing engine where reused in one of Dave Pottinger's prototype projects, which was a beautiful and technologically very modern looking 3rd person game. I don't recall its codename anymore (Nova, or Agent?), but it may appear somewhere here.

Looking back, I'm amazed at what we were able to accomplish. The amount of real teamwork that occurred on this project was incredible. Infighting between engineers was minimal, because we had so much to do, and there was zero financial incentive in the organization to constantly dick around with your peers to make them look bad, or endlessly debate things. (I think our hiring process helped a lot here, because everyone got a chance to talk to and vet candidates before new people where brought on board.) We were optimizing for value delivered to our customers and our content production teams, not our personal yearly reviews. This was a place where engineers took true responsibility for their code. This is a bit of a rant here, but to us code quality mattered because we knew that crappy code leads to unproductive/unhappy devs, and ultimately results in delivering buggy, insecure, or slow products to customers. A culture of crappy code leads to dead codebases that nobody understands, and in the limit case turns into a culture where nobody cares about things (because people that do care become quickly overwhelmed with all the broken cruft and give up). We applied a form of the broken window theory to our codebase/dev process: if a tool broke, or a playtest session crashed, the development machine stopped until the problem was fixed. If some code was obviously inefficient or could be readily improved, even things used everywhere like in our base libraries, we fixed it (that's where the "soft" in software comes from). I heard that Halo Wars was one of the most reliable/solid games tested by Microsoft Game Studios up to that time.

Unfortunately, Ensemble's implosion wasn't pretty. After crunching nonstop on a game for >2.5 years, the last thing you want to hear is that the studio you where putting your heart and soul into was being closed. Paul Bettner (network programmer at Ensemble, later founder of Newtoy and co-creator of Words with Friends) made some good points when he said Crunch Culture Killed Ensemble. (To be fair, not everyone was happy with what or how Paul said this.) I was offered positions at the two companies started by ex-Ensemble folks (Robot Entertainment and Bonfire Studios - I still have Robot's stock grant info and Bonfire's offer letter somewhere), but at this point I was so mentally drained and exhausted from the countless multi-month crunch cycles that I had to take a break.

After Halo Wars shipped Colt McAnlis gave a GDC presentation on the Halo Wars terrain renderer and editor, located here.

Misc. Halo Wars Rendering Pics

Here are some random pics of the engine's two layer depth of field system which I added in 2007 for our first public demo. It reused the blurred version of the HDR framebuffer created by the bloom/glow system, so it was very cheap.

Here's a screenshot showing the forward rendering (light list per object) shadowed lighting system I implemented in early 2006:

Back in 2003, before I went to Ensemble, I wrote a system that would automatically subdivide huge triangle soups into smaller submeshes. This was used to improve culling efficiency, and improve the performance of light list per object forwarding rendering. This system would output an AABB tree with overlapping nodes, with the submeshes placed in the leaves. It could also create kd-tree used for efficient ray tracing using the SAH (Surface Area Heuristic) as described here, which we used in our custom normal map creation software.

I added this system to Halo Wars in 2006, and here are some of my early test screenshots. We used this tech on all "large" objects (such as huge city buildings, walls, and entire levels) in the game. On a few maps (like when you play on top of the ship later in the single player game), we didn't use any terrain at all, but just huge automatically subdivided meshes.

HW's terrain engine supported full 3D displacements, which was an idea I had in 2005 after seeing our lead artists (Scott Winsett) messing with ZBrush. I didn't want to reimplement Age3's "cliff" system, after seeing how complex and rigid it was in Age3. The task of implementing our new terrain system was up to one of our new hires, the very capable Colt McAnlis. It also supported LOD using the Xenon GPU's built-in tessellation features. (In the early days, Colt wanted to use a purely CPU-based tessellation algorithm, so I hacked together an X360 sample that I used to convince him how capable the 360 GPU's tessellation unit actually was, and that was that.)

I really wanted to add fully lit particles to the Halo Wars engine before it shipped. I succeeded by having the 360 GPU rasterize two 3D volume textures (I think they where 128x128x16) every frame, and sampling them in the particle renderer's pixel shader. It was an approximate technique, but it really worked well. One volume texture contained the total incoming light at each voxel point (irradiance), and the other contained the weighted average of all the light vectors from all the lights that influenced each voxel (with the total weight in alpha). From this data, it was possible to compute approximate exit radiances at any point/normal. This "light buffering" system was very efficient and scalable, so it was also used to illuminate terrain chunks and meshes by most omni/spot lights in the game. (Traditional forward lights where only used on lights that where shadowed or used cookie masks.)

Anyhow, here are some pics showing this technique in action on smoke particles, with cool nuclear green omni lights spawning at every projectile impact point.

Open Source Compression Software

After Ensemble imploded, I spent a year or so "decompressing". The US economy was also imploding during this time, and there where layoffs everywhere, so it seemed smart to take it easy for a while and see where the industry was going. I got away from computers entirely for a while by building a house with a friend on my property just outside the small town of Quinlan, TX. Inspired by work being done up the street at Id Software, I also spent months researching and testing different texture compression techniques, which ultimately led to this open source project:

Also during this time, I investigated how to mix binary arithmetic and semi-adaptive Huffman codes in the same serial bitstream, really fast Huffman decompression, optimal LZ parsing, and multithreaded LZ compression. This effort ultimately led to this other open-source project:

Valve - Portal 2, DoTA 2, Counter-Strike Global Offensive (CS:GO), VOGL

In late 2009 I started at Valve, where I worked on Portal 2. I helped optimize the game to run well on PS3, but I also worked quite a bit on the X360/PC versions. I spent quite a bit of time optimizing and customizing Portal 2's vertex/pixel shaders on X360 and on PS3 in particular. The contribution I'm most proud of was making sure the game's new dynamic lighting and shadows looked good and ran well on all platforms. You can see some of the results in this video:

I spent a half year or so working on optimizing DoTA 2 to work better on low-end PC's. I also implemented DoTA 2's optimized SSAO implemention. More on this later.

In early 2012 I added support for stable Cascaded Shadow Mapping to the Source Engine, which first shipped in Counter-Strike Global Offensive (CS:GO) PC/X360/PS3. (This wasn't exactly a walk in the park to pull off -- Source is getting really old!) At the highest CSM quality level, CS:GO uses 3 cascades for the world and 1 shadow map exclusively for the view model (hand/gun). The solution falls back to purely lightmapped shadows in the distance. This solution would not have been possible with the assistance of Alex Vlachos, who did an amazing job enhancing vrad so it writes an extra scalar channel to the lightmap textures. This extra channel is used by the CSM pixel shader to control how much light to "subtract" from the precomputed illumination read from the lightmap in dynamically shadowed areas. Here are some pics I took during development or grabbed from the web:

Valve - Linux Versions of Left 4 Dead 2 and Team Fortress 2

During most of 2012 I worked on optimizing several Source Engine titles to run as well (or better!) on Linux (Ubuntu 12.04) using OpenGL vs. Windows 7/8 using Direct3D. I extensively profiled, optimized, and rewrote substantial portions of the Source engine's D3D->OpenGL translation layer originally designed for use on OSX, and helped tune the Source engine to run better in both D3D and GL modes. This was the first time my work was Slashdotted: Is It Time for an OpenGL Gaming Revolution? I gave several talks related to this effort at SIGGRAPH 2012, GTC 2013 and GDC 2013.

Steam Dev Days - My presentation on VOGL, the OpenGL Debugger project I started in early 2013

Fast forward to 33:33. The Metro Last Light GL trace playback/pausing/seeking demo is pretty cool (basically a "DVR" for OpenGL API calls).

Links/Contact Info

  • Luma-Optimized DXT1 and Mipmapped Luma/Chroma DXT Compression

    • My first PVRTC experiments, and a page on my first successfully coded PVRTC .PVR file. I'm currently working on releasing this code.

    • My open source projects on Google Code. I work on this stuff to keep me sane, and to give back in a small way. They use a public domain, MIT, or ZLIB license:

      • rg_etc1 - Fast, high quality ETC1 (Ericsson Texture Compression) block packer/unpacker. ETC1 is a standard OpenGL ES compressed texture format used on many Android devices. As of this writing, rg_etc1 is the only available alternative packer to Ericsson's extremely slow example texture packer in the etc-pack tool.

      • crunch - Advanced DXTn texture compression library. Pretty much makes other DXTc libraries (such as squish, ATI_Compress, or NVidia's texture tools) obsolete (if I may say so). Implements rate distortion optimized DXTn compression, and can also output textures in a very highly compressed file format (.CRN) which is designed to be transcodable to DXTn bits blazingly quickly (either in C++ or Javascript).

      • DDSExport - A Windows app (and an interactive demo of crunch) for creating much more compressible DXTn compressed .DDS files. Probably the first app to demonstrate rate distortion optimized DXTn compression.

      • jpeg-compressor - Small, single source file JPEG compressor class. Also has a small, but capable JPEG decompressor in another single source file. The decompressor supports progressive JPEG's, which is rare for alternative (non-libjpeg) implementations.

      • lzham - Lossless, general purpose data compression library with a ratio similar to LZMA but with much faster decompression. LZHAM started with the question "how do I build a really fast LZ decompressor with a compression ratio the same as LZMA?", and I worked backwards. This coded is used by several large game titles, such as Titanfall and Planetside 2.

      • imageresampler - Neat, general purpose image resampler class. I've been using this resampler on projects for over 20 years now. (A multithreaded version of this lives in crunch.)

      • miniz - Single source file deflate/inflate implementation with a zlib-compatible API. This project contains my single C function inflator coroutine, which is neat, and the deflater's level 0 (real-time mode) is pretty well optimized. Also contains some optional zip archive reading/writing functionality, which I should break out. (I mostly wrote this to learn more about the zlib API. zlib is probably one of the most used compression libs/API's around, so it makes sense to study it.)

      • picojpeg - My kinda crazy (but successful) attempt at writing a JPEG decoder in plain C for a PIC18F microcontroller. Only requires like 1-2KB of RAM to decode baseline JPEG's. This software is used onboard the SkyCube nano-satellite to decode images from its built in hardware JPEG camera. The SkyCube will (hopefully) be successfully launched in late 2013. If the launch is successful this will be my first code used in space. You can see the first (slower) version of picojpeg decoding in real-time on a Arduino microcontroller with a ATmega1280 8-bit CPU here (more background here).

My email address is richgel99 at gmail dot com. Here's my twitter page, and my Google Plus page.