Show Tag: ssl

Select Other Tags

ITD and ILD are most useful for auditory localization in different frequency ranges:

  • In the low frequency ranges, ITD is most informative for auditory localization.
  • In the high frequency ranges, ILD is most informative for auditory localization.

Dávila-Chacón et al. show that the Liu et al. model of natural binaural sound source localization can be transferred to the Nao robot and there shows significant resilience to noise.

Their system can localize sounds with a spatial resolution of 15 degrees.

The binaural sound source localization system based on the Liu et al. model does not on its own perform satisfactory on the iCub due to the robot's ego noise which is greater than that of the Nao (~60 dB compared to ~40 dB).

Dávila-Chacón et al. compare different methods for sound source localization on the iCub.

Among the methods compared by Dávila-Chacón et al. for sound source localization on the iCub are

  • the Liu et al. system,
  • the Liu et al. system with additional classifiers
  • Cross-correlation.

Dávila-Chacón evaluated SOMs as a clustering layer on top of the MSO and LSO modules of the Liu et al. sound source localization system. On top of the clustering layer, they tried out a number of neural and statistical classification layers.

The result was inferior by a margin to the best methods they found.

The way sound is shaped by the head and body before reaching the ears of a listener is described by a head-related transfer function (HRTF). There is a different HRTF for every angle of incidence.

A head-related transfer function summarizes ITD, ILD, and spectral cues for sound-source localization.

Sound source localization based only on binaural cues (like ITD or ILD) suffer from the ambiguity due to the approximate point symmetry of the head: ITD and ILD identify only a `cone of confusion', ie. a virtual cone whose tip is at the center of the head and whose axis is the interaural axis, not strictly a single angle of incidence.

Spectral cues provide disambiguation: due to the asymmetry of the head, the sound is shaped differently depending on where on a cone of confusion a sound source is.

Talagala et al. measured the head-related transfer function (HRTF) of a dummy head and body in a semi-anechoc chamber and used this HRTF for sound source localization experiments.

Talagala et al.'s system can reliably localize sounds in all directions around the dummy head.

Sound-source localization using head-related impulse response functions is precise, but computationally expensive.

Wan et al. use simple cross-correlation (which is computationally cheap, but not very precise) to localize sounds roughly. They then use the rough estimate to speed up MacDonald's cross-channel algorithm which uses head-related impulse response functions.

MacDonald proposes two methods for sound source localization based on head-related transfer functions (actually the HRIR, their representation in the time domain).

The first method for SSL proposed by MacDonald applies the inverse of the HRIR $F^{(i,\theta)}$ to the signal recorded by $i$ For each microphone $i$ and every candidate angle $\theta$. It then uses the Pearson correlation coefficient to compare the resultant signals. Only for the correct angle $\theta$ should the signals match.

The second method for (binaural) SSL proposed by MacDonald applies the HRIR $F^{(o,\theta)}$ to the signals recorded by the left and right microphones every candidate angle θ, where $F^{(o,\theta)}$ is the respective opposite microphone. It then uses the Pearson correlation coefficient to compare the resultant signals. Only for the correct angle θ should the signals match.

The binaural sound-source localization methods proposed by MacDonald can be extended to larger arrays of microphones.

Cross-correlation can be used to estimate the ITD of a sound perceived in two ears.

Rucci et al. claim a mean localization error of 1.54°±1.01° (± presumably meaning standard error) for auditory localization of white-noise stimuli at a direction between $[-60°,60°]$ from their system.

In Casey et al.'s system, ILD alone is used for SSL.

In Casey et al's experiments, the two microphones are one meter apart and the stimulus is one meter away from the center between the two microphones. There is no damping body between the microphones, but at that interaural distance and distance to the stimulus, ILD should still be a good localization cue.

Yan et al. perform sound source localization using both ITD and ILD. Some of their auditory processing is bio-inspired.

Voges et al. use ITDs (computed via generalized cross-correlation) for sound-source localization.

Acoustic localization cues change from far-field conditions (distance to stimulus $>1\,\mathrm{m}$) to near-field conditions ($\leq 1\,\mathrm{m}$).

There are fine-structure and envelope ITDs. Humans are sensitive to both, but do not weight envelope ITDs very strongly when localizing sound sources.

Some congenitally unilaterally deaf people develop close-to-normal auditory localization capabilities. These people probably learn to use spectral SSL cues.

Humans use a variety of cues to estimate the distance to a sound source. This estimate is much less precise than estimates of the direction towards the sound source.