To build truly believable virtual environments, we need to believe our ears as much as our eyes.
It took me a second to pinpoint the strangely familiar noise coming from the corner of the room, but then I saw the red plush toy perched on top of a lamp, and realized the Angry Bird call was coming from it—except that it wasn’t.
I stood giggling in Ivan Tashev’s tiny office in Microsoft’s Redmond HQ, and as a cacophony of sounds came at me from various directions I quickly forgot I was wearing headphones, or that the room was entirely silent apart from my own exclamations as—a bit like Alice in Wonderland—the inanimate objects in the room all started talking to me.
The phone rang, there was a knock on the door, and the radio started playing some tunes. A wartime poster of a smiling soldier that read “How about a nice cup of shut the fuck up?” now offered me that same cup, and the words were so clearly coming from his lips that it felt uncanny that they weren’t actually moving.
The point of all this was to show off the spatial audio technology that Tashev and his team at the Audio and Acoustics Research Group developed for the HoloLens, and which is now being incorporated into its new Windows Mixed Reality headsets.
Audio plays a huge part in building truly immersive virtual experiences, so developing a way to realistically simulate the way that sound comes at you from different directions was a priority from the get-go, says HoloLens inventor Alex Kipman.
Kipman had previously worked with Tashev on audio technology for the Kinect (which included acoustic noise suppression, echo cancelation, speech recognition and beam formation to identify sound sources) and recalls that when the team was celebrating shipping it, Tashev came over so they could high-five.
“I gave him a big hug and said, ‘Good job chap, but now we need to go work on something much harder,’” Kipman said. “Ivan goes, ‘What the….’ But that’s when you start building anechoic chambers and things that allow you to model human heads so that we can have databases of intelligence that really understand how the shape of your earlobes affect sound, and you start asking how can we reverse-engineer HRTF so we can get that effect with two little speakers.”
Earlier on I visited the anechoic chamber Kipman referred—a concrete cube sitting on a thick layer of rubble, with mesh-covered spikes on the wall and ceiling angles in such a way as to prevent any sound from bouncing back at you. The end result is a place of absolute silence, and it’s a heck of an unsettling place to be.
“If you stay here and don’t move for about 40 seconds, you start to hear a rushing sound, that’s the blood in the vessels of your ears,” Tashev said. “In a few minutes your own breathing starts to become deafening, you hear your heart and everything that is going on inside your body. After ten to fifteen minutes lying here, not moving, you may start to have audio hallucinations, because the human brain just isn’t made to cope with complete silence.”
In the middle of the chamber sat a huge acoustic measurement device called an arc rig, covered with 16 speakers and multiple microphones. Tashev explained that over the course of developing the HoloLens audio technology they literally strapped over 400 volunteers (he didn’t deny these might have been Microsoft interns) to a chair in the middle of the anechoic chamber with a couple of tiny microphones in their ears, spinning the rig around their heads for a couple of minutes to measure how they heard the sound. Each subject then had a CT scan done of their ears, head and upper torso, and precise measurements—everything from dress collar size and head circumference to interpupillary distance—taken.
The point of all of this wasn’t just to test torture interns, however, but to gain in-depth understanding of the way that humans perceive sound, so they could figure out the best way to simulate it realistically.
“Depending on where a sound source comes from, the waves will reach the entrance of your ear canals slightly differently: different time, different magnitude, depending on frequency, distance and direction,” Tashev said. “So if we want to make a technology that can make humans perceive that a sounds comes from any desired direction using headphones or small loudspeakers, we have to know how that sound changes—it’s about filters.”
Those natural filters, hardwired into us at an early age, are called Head Related Transfer Functions (HRTF) and these are key to making spatial audio sound natural.
“The big problem is that our heads are different shapes, in different positions almost as different as our fingerprints,” Tashev said. “This means that those HRTF are individual, and this has been the main obstacle to creating a good binaural rendering system.”
What they did in the end was use the dataset from those tests to devise a series of algorithmic filters which are inbuilt on the HoloLens, so that when you put the device on your head the depth camera automatically detects your measurements and adjusts accordingly so that users get a much more personalized audio experience.
But while all the R&D muscle that Microsoft put behind this tech is certainly impressive, the question is whether that personalized spatial audio translates into more immersive user experiences. And for that, you need the developer community to get on board.
Kean Walmsley is one such developer who’s quite enthused about designing apps for Mixed Reality around the idea of spatial audio. His idea was to use sound to guide the user, indicating which direction they should take. It occurred to him that this is the sort of thing that might eventually prove useful for visually impaired people, or in emergency situations, helping rescue workers find their way out of a burning building, for example.
“Hearing is an incredibly powerful sense that people use instinctively both to orient themselves and to find out about danger,” Walmsley said as he explained that the UI for HoloGuide—which was prototyped over a 2-day work hackathon—was inspired by the scene in the Pixar movie Brave where the young heroine is led down a path through the woods by a series of will-o-the-wisps. His application builds a map of the surrounding environment (through a combination of blueprint and 3-D modelling using tools like AutoCAD or Revit and the HoloLens’ spatial mapping capabilities which automatically scan its surrounding environment) placing a waypoint a few metres away down the path the user is meant to follow. The waypoint then pops out of existence when you approach it and shows up further along the path.
Since we’re all hardwired to respond to sound in a visceral and fundamentally instinctive way, these applications are wide-ranging. Just think of the way that we naturally turn towards a sound which we can’t readily identify, or slow down or stop in response to an alarm, and that effect is enhanced when we perceive sound as coming from a particular direction as opposed to emanating form our heads (as with normal headphones).
This is because our ears aren’t on the top of our heads, but on the sides of our heads and naturally, and best hear sounds within a 30-degree cone, explained musician and audio expert Wilfried Van Baelen. “This sensitivity is highest in that area which is probably programmed in our system due to the millions of years of evolution related to survival,” he said.
Van Baelen started playing music at the age of eight, and by the time he was 16 he and his brother had converted the chicken coop in the back of their garden into a recording student where they started testing quadraphonic sound. This became a life-long obsession which led him to found Auro-3D, a company that developed its own proprietary three-dimensional audio format which has so far been used in over 200 films including Spider-Man and Ghostbusters as well as games such as Namco Bandai’s Get Even. He believes that the technology, which he believes is still in its infancy, is poised to transform the entertainment industry as immersive technologies become more pervasive.
“In the past investment priority has been geared towards visuals over audio, but now that the visual experience is so good and 4K is becoming mainstream, we’re seeing audio moving up the priority list,” Van Baelen said. “We’ll see an increased focus on the impact of the subconscious experiences in all kinds of content (games, music, movies, live broadcast events, etc.) as content makers realize the much higher emotional impact they can achieve.”
And as more people in the entertainment industry realize how powerful spatial audio can be as a storytelling device, we’re going to see an influx of content designed to take full advantage of the technology that Microsoft is keen to push across all its devices (the HoloLens might be the most advanced hardware example to showcase its capabilities, but spatial audio is a standard part of the Windows 10 platform).
One indication of how mainstream such content will become is the fact that people such as Robert Stromberg (Avatar) are getting involved. The Virtual Reality Company (VRC) is a cinematic VR studio co-founded by Stromberg which has just released its first animated virtual reality series Raising a Rukus. Sound features quite prominently in the production, which VRC collaborated on with Skywalker Sound and Grammy Award-winning composer James Newton Howard.
“We try to engage as many senses as possible to make the users feel like they’re somewhere else, and spatial audio allows us to create immersive VR experiences by helping guide the user on where to look using directional audio cues,” VRC CEO Guy Primus said.
To fully appreciate the impact of spatial sound you do need the right hardware, however, and most people can’t afford to spend $3000 on a HoloLens. So it will be really interesting to see whether the new Mixed Reality headsets Microsoft is bringing to market in with HP, Dell, Lenovo, Acer and Samsung will fill that gap and prompt more developers to make VR/AR/MR content that not only looks good but actually sounds real.