Virtual Reality (VR) is quickly becoming a go-to tool in psychology for studying how we think and behave. It offers incredibly controlled yet dynamic environments, perfect for understanding human perception and decision-making. But there's always a big question: how well do findings from VR studies translate to the real world?
That's where sound comes in! While visuals dominate most VR experiences, our ears play a massive role in how we perceive space and immerse ourselves. Imagine being able to walk around a virtual world and hear sounds as if they were truly there, originating from specific points in the environment. This isn't just about cool effects; it's about making VR experiments even more realistic and impactful.
NOTE: If what you see here peeks your interest, make sure to have a look at the underlying paper that goes into detail on all aspects of this project! This web-page offers just a brief summary.
This project was a unique opportunity to play around with some truly cutting-edge audio technology: Wave Field Synthesis (WFS). Unlike your everyday surround sound, WFS systems use an array of loudspeakers to literally "synthesize" sound waves, making it seem like a sound is coming from a specific spot, even if there's no speaker there!
WFS is extremely rare in commercial or industry applications, but thanks to a cooperation between Fraunhofer IDMT and the Max-Planck Institute for Human Development (MPIB), I had the rare opportunity to play around with this tech and build a VR experience offering unparalleled sound rendering!
To bring this project to life, I had the incredible opportunity to work with some serious, high-end equipment. As you might expect, cutting edge, research-oriented tech comes with a respective amount of jank. And yes, indeed, the hardware and software did not always cooperate to the extent I would have liked, but such are the chores of charting the unexplored!
The heart of the project was a custom-built Wave Field Synthesis system at the Max Planck Institute for Human Development (MPIB), developed by Fraunhofer IDMT. Imagine a sound-isolated room with a square array of 64 speakers (16 on each side!), all working together to create a 2x2 meter "sweet spot" where you can experience perfectly localized sound:
Getting sounds into this system involved a dedicated rendering PC, hooked up via a special USB audio interface. For real-time control, I sent commands from our VR application to the WFS system over a network, letting me pinpoint exactly where a sound should appear in the room.
Building this all required a blend of specialized software. I used Unity 2021 as my primary development environment. Most of the core VR components were built using ARC-VR, an in-house framework, developed by yours truly, at MPIB specifically for VR research.
For voice instructions during the study, I leveraged ElevenLabs to generate natural-sounding voiceovers, and even built a custom Unity plugin to streamline that process. Switching between the WFS system and the stereo headphones for different parts of the experiment was handled by SVCL, a handy command-line tool.
My journey into combining WFS and VR kicked off with a simple prototype. The idea was to test the waters: could we actually use WFS in a VR environment for a sound localization task? The initial feedback was great – people found it fun and engaging! However, it quickly became clear that this first version, while cool, wasn't quite robust enough for serious scientific study. For instance, participants were limited to picking from a fixed number of phones in a virtual bar, which meant even if their hearing wasn't perfectly accurate, they could still guess correctly by chance. This "noise" in the data meant I needed a more precise way to measure how well people could pinpoint sounds.
This led me down a path of creating several different prototypes, each designed to tackle the challenge of accurately measuring sound localization. I experimented with various ways for participants to interact with the virtual sound sources:
This is, arguably, too many prototypes. A common swamp I find myself stuck in. I don't like to leave any stone unturned...
However, this extensive prototyping phase gave me some solid insight into the unique challenges of leveraging WFS alongside VR. If you're interested in reading about some of the limitations and challenges I encountered, be sure to refer to the project's underlying paper!
After a lot of prototyping and learning, I finally built the definitive version of the experiment for data collection. This version took the best elements from the earlier trials, especially the intuitive "phone-placement" interaction, and added a whole lot more to make it a truly comprehensive study.
In this final setup, participants were still tasked with pinpointing the origin of sounds within that 2x2 meter WFS sweet spot. The big upgrade? They now used hand-tracking (or controllers as a backup) to make their guess by simply pinching their fingers at the perceived sound location. This felt incredibly natural and immersive.
To really dig into how different factors influence sound localization, I introduced a variety of conditions:
Each participant went through 54 trials in total, experiencing both WFS and stereo headphone playback across all these conditions. It usually took about 25-35 minutes to complete.
One thing I'm particularly proud of is how user-friendly this project became. I spent a lot of time building an extensive interactive tutorial. Narrated by AI-generated voices, it guided participants step-by-step through how to interact with the system, how to make their guesses, and what to expect during the experiment. It even covered safety aspects and how to pause if they felt uncomfortable. This made it one of the most accessible VR experiences I've ever built!
Another crucial element was spatial calibration. Because the Meta Quest Pro is a mobile headset, its exact position in the real world could vary slightly each session. This meant the virtual environment might not perfectly align with the physical WFS sound field. To fix this, I implemented a clever calibration step: using the Quest Pro's passthrough camera, the experimenter could see the real room overlaid with a virtual grid. They could then use the VR controllers to precisely align the virtual and real WFS spaces, ensuring that what participants saw visually matched what they heard acoustically.
Finally, I built in experimenter controls that allowed me to monitor the participant's view in real-time, track their progress, and even pause the experiment if needed. This ensured a smooth and controlled data collection process.
So, after all that development and data collection, what did we find out?
Surprisingly, when it came to sheer accuracy, the good old stereo headphones, especially with a really steep volume falloff, actually performed better than the WFS system. Participants tended to take a bit longer with stereo, but their final guesses were often closer to the true sound source. This was particularly noticeable for stationary sounds.
However, many participants felt that the WFS system delivered a far more natural and immersive sound experience. It gave them an immediate, intuitive sense of where the sound was coming from, even if their final pinpoint wasn't as precise as with stereo. Very steep falloffs do not sound or feel natural, as sounds can travel very far in usual real-life conditions.
A key takeaway was the impact of "user-dependent optimization" on WFS. Our system didn't account for the listener's exact position in the room, and this showed in the results: participants' guesses with WFS tended to gravitate towards the edges of the listening area, especially for sounds originating from the center. This isn't a flaw of WFS itself, but rather highlights the importance of incorporating real-time user tracking into these systems for optimal performance. (For a deep dive into this, check out the full paper!)
This project provided invaluable insights into the potential and challenges of using WFS in VR for psychological research. If you're keen on all the nitty-gritty details, including the specific data and statistical analysis, you can dive into the full paper!