This talk was given at the GDC 2002, on March 24, 2002, in San Jose. Marty's slides can be found here.
Producing Audio for Halo
Creating audio for an interactive entertainment title today is a huge collaborative effort involving many talented and highly skilled people. I would like to thank the following, who worked most directly on the task of bringing Halo's audio to life: Jay Weinland, my right hand man on sound design who was there from the beginning of last year all the way through the final months of crunch, Mike Salvatori, my former partner who co-wrote some of the music and engineered several sessions in Chicago, and Matt Segur, who did a fantastic job programming an audio engine for a box that didn't come into existence until late in the process.
If you're reading this paper, it's probably because you work on the audio side, the design side, or perhaps you would like to work on some side of the game industry. Maybe you're just a fan of Halo. Whatever your reason is, I'd like there to be something interesting for all concerned, so I'm going to throw in some history as well as technical information.
Halo evolved over a number of years, and the audio design for it evolved as well. Some of this evolution had to do with a desire to reach for higher production values, but much of it had to do with how the game itself changed based on the platform and hardware that became available over the course of Halo's development.
A Bit of History
In 1998 we all still lived in Chicago, and I worked as an independent contractor on Bungie's two Myth titles. As we prepared to finish Myth II, Jason Jones, the lead programmer and creator of Halo, showed me some early work he and a few others at Bungie were doing. I saw some vast outdoor/indoor environments, with cool inverse kinematic animations, sci-fi weapons, a cyborg, and a jeep. There were lots of drawings of aliens, other vehicles and even some storyboards for a movie that might be shown at E3. The ability to seamlessly transition from outdoor to indoor was there, as well as a jeep that responded to physics in a way that was realistic and fun at the same time. I began working on vehicle and weapon sounds that could be triggered in the technical demo. At that time we were supporting the current Creative and Aureal sound cards and so we had some basic 3D-surround effects working with 2 speakers or stereo headphones.
At E3 in 1999, Bungie had a closed-door press event that highlighted the tech demo of Halo. The press people who saw the demo understood that if they wanted to see it, they couldn't talk about it yet. The interesting thing about that strategy is that it probably created more buzz than if they were allowed to talk about it.
In July of 1999, Bungie's cinematics director, Joseph Staten, came to me with a request to make a soundtrack for a live demo of the game during Steve Jobs keynote address in New York for Macworld. The Mac version of the Halo engine was running pretty well on Open GL. By this point, the game was supporting a robust scripting language and had more vehicles, structures, and creatures, as well as aliens and humans. The plan was for a two to three minute scripted demo of the game that would run in real-time on the Mac. The only problem was that we had no sound code written for the Mac version, and so it would be played in complete silence. We decided to create a music track that would run for the duration of the demo, loosely synced by simply hitting the "play" button on a CD player at the same time the "enter" key was hit on the Mac. We talked through the story of the script and came up with some general timings (which of course were very loose) and I went back to my studio to write and produce a piece of music that would score the drama of the scene as well as establish a mood and feel for this ancient, mysterious ring artifact found by humans 500 years hence in some unexplored corner of the galaxy.
I felt that I could evoke an ancient and mysterious feeling by starting with some Gregorian monk style chanting over a bed of strange ambient sounds, and then give the action sequences that followed an epic and important feel by getting orchestral strings from the Chicago Symphony to play over a somewhat rock&roll rhythm section. I added an improvised Qawwali chant voice over the top to help reinforce the "alien" nature of the environment. Whether these decisions were the right ones or not doesn't matter. I had two days to write and produce this piece and there simply was no time to ponder or experiment, which is sometimes a good thing. Since this was also a venue that would feature a big screen, a large auditorium, and a gigantic stereo sound system, I wanted to not only capture the mood but also hook the audience. Anything that sounded like "game music" was going to be a disappointment. Plus the track needed to be interesting enough in it's own right so that the audience wouldn't notice that they weren't hearing any sound effects. It seemed to work out pretty well.
For E3 2000, Bungie decided to show a hands-on demo of Halo on the PC. At the same time we wanted to recreate the theatrical atmosphere of the Macworld demo. We needed a longer and more involved story that would present the kind of game-play we were hoping for, and a cinematic feel for the audio. There would be a live demonstration of the game on a PC using the still somewhat primitive sound engine triggering one-off sounds of weapons and vehicles. This would be followed by scripted game sequences that had been captured, edited and played back off a DVD. The script was written, the storyboard created and then we recorded voice actors. The game characters were animated to the edited voice files using the engine. We received captured video of segments of the story that was now steadily growing from its original projected length of six or seven minutes to almost ten minutes. We set up a surround sound Pro-Tools session, and kept adding to it as the video got longer. The music and sound design recorded for the DVD was mixed in 5.1, with the goal of giving the E3 audience as close to a movie theater experience as possible. A small self-contained movie theater with surround speakers was built for the show. We knew at the time that the game could play the way the movie looked, but I knew that we had set the bar pretty high in terms of the audio. There was no such thing as real-time 5.1 surround sound for computer games.
We knew since GDC 2000, that the Xbox would be a pretty cool platform for a game like Halo. However, it came as somewhat of a surprise to find out after the first showing of the DVD at E3 that Microsoft was making an offer to buy Bungie Software, move us all to Redmond and have us build Halo as an Xbox release title. Since very little of the actual game had been finalized yet, it seemed like a great opportunity, and after many discussions, the purchase and move became reality. Just ten days prior to the Microsoft offer, I had joined Bungie as a full-time employee. My decision to move to Redmond with the rest of the Halo team was made easier in part because of my personal desire to see Halo's audio meet or exceed the expectations we had set in the marketing demos produced thus far. With my own understanding that the sound capabilities of the Xbox might possibly include more channels of sound than any sound card currently available, and some sort of real-time surround implementation, my choice was clear. In addition, designing game audio for a single platform with a known set of specifications was something that I had never done before, and being right next to the audio hardware and software folks might also be an advantage. Plus my family was glad that some of our best friends had moved out to Seattle the previous year.
All previous work on Halo was basically broomed once we got the specs on the Xbox and we started over with new code and content development in the fall of 2000. Work on the audio didn't start until February of 2001, since we first had to finish audio production for another Bungie title, Oni, and also build new audio and video studios within the confines of Bungie Studios at Microsoft.
Overview for Halo Audio
One of our first tasks was to define the goals as well as terminology that we all could agree upon. This lead to a document that we often referred to during the development process. Our main goal for the audio in Halo was that it would set the mood, give the player information about what is happening (especially things that can't be seen) and make the world seem more alive and real. Music should provide a dramatic component to game play, like combat and exploration, in addition to underscoring story and cinematic sequences. Dialog should unfold the story, provide characterization and draw the player deeper into the experience. Sound makes it real, music makes you feel.
The Halo tech demos and marketing movies that we had made up to this point had all been examples of linear audio production; pre-mixed, pre-determined duration, and traditional audio post-production techniques. It was time to figure out how to get the same results using the game engine and the Xbox to make dynamic music, sound effects and dialog.
Our understanding is that any sound that responds to a game-play condition or event is considered to be dynamic. Dynamic sounds can be affected by the structures in the environment (like reverb or occlusion), they are spatially located, and they can respond to physics (such as Doppler shift). For example, the sound of an engine will change with the rpm's or gear shifting, and the volume of the sound of a shell casing bouncing on a surface will change with both it's velocity as well as it's change in distance from the player.
Dynamic dialog occurs when different recorded lines are triggered based on changing conditions. Working closely with our AI programmer, Chris Butcher, we came up with a system that would allow our friendly marine AI's to say something in response to what they see, hear, or feel in several differing game states.
Dynamic music can vary in length or change in volume and intensity based on conditions that occur in the game. I never want the player to be aware that they have the ability to change the way music is being played. That would call attention to something that should be more subliminal, and remain on an emotional rather than cerebral level. For example, if music goes up the scale when ascending stairs and down when descending, the player might stop playing Halo and start playing the "making the music go up and down" game.
Every piece of raw audio data is called a "soundfile" and the set of instructions that organizes and determines how the soundfile is to be played is called a "soundtag". The Halo audio engine only recognizes the soundtag. The most important feature of a soundtag is that it contains enough permutations and the proper randomization so that players don't feel like they're hearing the same thing repeated over and over. Even the greatest and most satisfying sound, dialog or music will be diminished with too much repetition. It's also important to have the ability to randomize the interval of any repetition. It might be difficult to get the sound of one crow caw to be vastly different from another, but the biggest tip off to the listener that something is artificial is when the crow always caws just after the leaf rustle and before the frog croak every thirty seconds or so. The exception to that rule are specific game play sounds that need to give the player immediate and unequivocal information, such as a low health alarm.
I believe that music is best used in a game is to quicken the emotional state of the player and it works best when used least. If music is constantly playing it tends to become sonic wallpaper and loses its impact when it is needed to truly enhance some dramatic component of game play. In Halo, there are more than 80 minutes of original music, but these minutes are spread over the course of a single player's experience, which could extend from 20 to 70 hours of play. Therefore, much of the time no music is present.
I wrote several pieces of music, some linear and some designed to be dynamic, that express many dramatic or emotional states; combative, spooky, tense, sad, calm, defeated, or victorious. Some musical themes underscore linear cut scenes and became associated with a character, mission, or location. I wrote and produced each piece first in a linear fashion, and then re-mixed, edited, or re-arranged it to fit into the music playback engine we had designed, tailoring it to the specific context.
The music engine is relatively simple in construction. It consists of three basic file types within a single soundtag. The "in", which starts the piece, the "loop" which is the middle section and plays for an indeterminate amount time, and the "out", which is how the piece ends. In addition, there is the "alt_loop", which plays instead of the loop if called by the game, and the "alt_out", which plays if the end is called during the alt_loop. The looping sections are made up of as many looping soundfiles as desired, and each looping soundfile can be of any length and weighted to determine the likelihood of being played. The level designer only needs to insert the commands: start, start_alt, and stop, in order to call any given music soundtag. This made it relatively easy for me to sit with each level designer and "spot" a level in the same way I might sit with a film director and "spot" a movie.
Within each music soundtag, I could also set the preferred type of transition (immediate, cross-fade, or wait until the end of the current loop) between the alt and out soundfiles. We can give real-time volume commands to the currently playing soundtag and also give it an overall duration limit. This kind of soundtag was also used for ambient background sounds.
We also took advantage of multiple tracks in the ambient soundtags. An individual soundtag might be made up of more than one track of looping soundfiles. For example, a beach ambient soundtag could contain looping wave soundfiles and looping wind soundfiles, each with as many variable-length loops as desired. In addition, this soundtag could also contain "detail" soundfiles; sounds that add interest and color but don't loop in themselves. In the beach soundtag, detail sounds could be elements like seabirds, splashes, and gravel movement. The detail soundfiles can be weighted, given random 3D positioning random period, volume, and pitch ranges. The soundfiles for both music and ambient soundtags were 16bit 44.1k ADPCM compressed stereo, and read into RAM in 128k chunks in order to allow us to use longer pieces.
Most of the rest of the sound effects in Halo were made up of non-looping or "impulse" soundtags. These sounds play one permutation at a time, either in their entirety or interrupted by another sound that takes precedence. For example the automatic rifle soundtag is a collection of gunfire soundfile permutations that are called in sync with the weapon's rate of fire. Any single permutation is long enough to include the authentic ring out of an individual gunshot. Each call is interrupted by the subsequent call until the trigger is released and then that permutation is allowed to finish playing. Explosion permutations are not interruptible and thus can overlap. Mono looping soundfiles are the permutations for sustained sound effects like engines, fire, and the like. All of these soundtags can be called in scripts, or attached to objects, animations, particle systems, characters, or locations. The pitch and amplitude can be controlled in real-time by different parameters contained in other tags. For example, the pitch of an engine looping soundtag will vary depending on numbers coming from velocity and RPM's from the engine tag. The engine will sound like it's revving high if it's forward velocity is at top speed or all four wheels come off the ground and the accelerator is still pushed down. We were also able to cross-fade switch between different samples based on real-time events. The most important thing for these types of sound is perfect synchronization: what the players are hearing "feels" right with what they are seeing. The volume and pitch of impacts and scrapes need to respond to the velocity with which these events occur. Another technique we developed for implementing soundtags was something I call "cascading". This is used when attaching sounds to huge numbers of like events like particle systems, where the number of potential calls could overwhelm the number of voices available. In addition, hundreds of individual glass particle impact sounds occurring at the same time doesn't sound like a big sheet of glass breaking. The cascading system let's me create, for example, three soundtags that are related to each other and will call a subsequent soundtag when a certain number of calls for the initial soundtag have been reached. In the glass-breaking example I will have three soundtags, small_glass, medium_glass, and large_glass. The game's particle system will be sending out a request to play the glass impact sound for each glass particle that impacts. The engine will play 6 small_glass permutations and then on the 7th request cascade up to 3 medium_glass permutations and on the 10th request (all within a short period of time) play one large_glass permutation, after which the cycle starts over again. This cascading system will give the effect of sounds that are synchronized to specific events and at the same time be able to build a cacophony of sound with minimum voice usage.
The last big area of sound design for Halo is the dialog. We had quite a complex and involving story to tell with some interesting characters, but telling a linear story dramatically was not our only goal. During combat especially, we wanted the characters fighting with and against the player to seem fully alive and truly participating in the chaos of battle. We developed a matrix to help us script out all the possible situations in which the different characters might speak. This matrix contains 12 main categories, each containing multiple (the largest 28) soundtags, with each soundtag containing multiple permutations. For example, in the category "hurting people" one of 4 main soundtags, "damaged enemy" contains between 15 to 20 permutations depending on the character. For the friendly AI marines, we knew it would be impossible to record a unique actor for each marine that might appear in the game, so we used 6 actors to record 4 PFC's and 2 Sergeants and had the resulting soundtags attached to the corresponding marine models. We were also careful to keep track of those who lived and those who died in order to maintain the balance of the voice actors. If you start an encounter with 8 marines, two of the characters would be doubled up, let's say two of Pvt. Bisenti and two of Pvt. Mendoza. If at the end of the encounter only 4 marines remain, even if Bisenti or Mendoza never got killed, we would swap one of the duplicates for a missing Jenkins or Johnson, in order to keep the characters balanced.
All the actors were SAG and AFTRA talent, and during the casting auditions, their ability to improvise was tested along with their ability to bring the proper emotion to the script. There are more than six thousand lines of dialog in the game and one of the elements I still find enjoyable is listening to new and unique combinations of real-time interactions between the marine and alien AI's. I literally never know what they're going to say next.
Putting it All Together
The scariest part of producing a game of this size is during the final stage,when everything seems to come together at the same time and each element is screaming for attention. Throughout the process we were dependent upon programmers, hardware people, artists, writers, and designers to get their part done enough so that we could add the sound, dialog and music. For months it seemed as though nothing was quite ready and then all of a sudden everything was ready at once.
While the testers were banging on the game, we were attempting to do what I call the "Final Mix". Those of you who come from the traditional linear mediums understand the importance of the final mix stage, and yet in game production this is something that is rarely budgeted for in terms of time. Part of the reason is that there really is no final mix in a product that must mix itself in real-time every time it's played by a new person. We used this time to re-record, re-mix, change volumes, tweak the DSP, adjust weighting of permutations, dynamically compress soundfiles, and a myriad of other tasks that affects the subtle balance of the overall sound design of a game. This is also the time when the user interface and splash screen finally comes on-line. If not handled correctly, this is an area where sound and music can really be butchered. I prefer a consistent audio experience where sound or music isn't abruptly cut off when the user makes a menu choice, where there can be a sense of elegance and continuity between the opening movie, interface, and loading screen. I try to experience the game as though I'm a first time user or a veteran returning for my 200th time, and make sure the audio flows like movie audio.
Many things work well in Halo's audio, but the thing I'm most pleased with is the Dolby 5.1 Surround working in real-time. I wasn't sure until near the end that this was an attainable goal. The music and ambient sound is made up of stereo files and never get positioned within the game geometry. However, I figured that in the same way stereo sound fills up your car's four speakers, we could do the same with our stereo files. We simply took the stereo signal and sent 50% of it to the rears, and also to the LFO. The listener gets the sense of being in the middle of the ambient space and music, and therefore the sound of marines yelling or an explosion happening behind you is not jarring in anyway. Sounds that are spacialized will move when the camera moves, but the music and ambient remain constant. The other area I'm most pleased with is the random combat dialog of the marines. This is something that ended up being greater than the sum of its parts. It can be improved upon, but is still one of the elements that exceeded my expectations.
What Could Be Better
Not everything that could or should make sound does. Some sounds that should respond dynamically don't. Not every character in the game has a unique voice. The overall mix could have been improved. Ambient sounds should attenuate according to distance not time. Fake string samples should have been replaced with live players. Basically, we could have used another 6 months of development time, but I'll probably always feel that way about each project.
In terms of surround sound, the main problem we encountered involved giving up control of speaker sends to the audio engine based on camera position. In order to get a character's lips to sync to the soundfile, and to take advantage of the room DSP, we had to attach the soundtag to the character model that was speaking. This caused the sound to emanate from whatever speaker is appropriate based on the character's position in relation to the camera. Consequently, a speech that is coming from the front speakers can suddenly jump to the rears simply because of a close up shot of a character who is supposedly listening. This happens a few times in Halo and is annoying. We'll fix it in the next project.
Interactive entertainment is in direct competition with all forms of entertainment. As game developers we are vying for the public's time as well as their dollars. Time spent playing games means less time for watching television, listening to music or going to the movies. More and more people are building home theater systems, and consoles like the Xbox and Playstation2 also allow for viewing DVD's and listening to CD's. All these entertainment choices are taking place in the same location. This means that audio production on games is not being judged just against other games but against all the audio that can be heard from that system. My goal is to not just meet the standards set by the other mediums, but exceed them - to hear someone say "Wow, that CD sounded great and that DVD soundtrack was amazing, but the audio from that game blew them both away!"