New paper: mapping speech comprehension with optical imaging (Hassanpour et al.)

Jonathan June 11, 2015

Although fMRI is great for a lot of things, it also presents challenges, especially for auditory neuroscience. Echoplanar imaging is loud, and this acoustic noise can obscure stimuli or change the cognitive demand of a task (Peelle, 2014). In addition, patients with implanted medical devices can't be scanned.

My lab has been working with Joe Culver's optical radiology lab to develop a solution to these problems using high-density diffuse optical tomography (HD-DOT). Similar to fNIRS, HD-DOT uses light spectroscopy to image oxygenated and deoxygenated blood signals, related to the BOLD response in fMRI. HD-DOT also incorporates realistic light models to facilitate source reconstruction—this of huge importance for studies of cognitive function and facilitates combining results across subjects. A detailed description of our current large field-of-view HD-DOT system can be found in Eggebrecht et al. (2014).

Because HD-DOT is relatively new, an important first step in using it for speech studies was to verify that it is indeed able to capture responses to spoken sentences, both in terms of effect size and spatial location. Mahlega Hassanpour is a PhD student who enthusiastically took on this challenge. In our paper now out in NeuroImage (Hassanpour et al., 2015), Mahlega used a well-studied comparison of syntactic complexity looking at sentences containing subject-relative or object-relative center embedded clauses (taken from our previous fMRI study; Peelle et al 2010).

Consistent with previous fMRI work, we found a sensible increase from a low level acoustic control condition (1 channel vocoded speech) to subject-relative sentences to object-relative sentences. The results were seen at both the single subject level (with some expected noise) and the group level.

We are really glad to see nice responses to spoken sentences with HD-DOT and are already pursuing several other projects. More to come!

References:

Eggebrecht AT, Ferradal SL, Robichaux-Viehoever A, Hassanpour MS, Dehghani H, Snyder AZ, Hershey T, Culver JP (2014) Mapping distributed brain function and networks with diffuse optical tomography. Nature Photonics 8:448-454. doi:10.1038/nphoton.2014.107

Hassanpour MS, Eggebrecht AT, Culver JP, Peelle JE (2015) Mapping cortical responses to speech using high-density diffuse optical tomography. NeuroImage 117:319–326. doi:10.1016/j.neuroimage.2015.05.058 (PDF)

Peelle JE (2014) Methodological challenges and solutions in auditory functional magnetic resonance imaging. Frontiers in Neuroscience 8:253. doi:10.3389/fnins.2014.00253 (PDF)

Peelle JE, Troiani V, Wingfield A, Grossman M (2010) Neural processing during older adults' comprehension of spoken sentences: Age differences in resource allocation and connectivity. Cerebral Cortex 20:773-782. doi:10.1093/cercor/bhp142 (PDF)

New paper: Prediction and constraint in audiovisual speech perception

Jonathan April 23, 2015

A review paper on audiovisual speech perception from me and Mitch Sommers (2015) is now in press in Cortex (part of a forthcoming special issue on predictive processes in speech comprehension). In this review Mitch and I have tried to start unifying two separate lines of research. The first is that ongoing oscillations in auditory cortex affect perceptual sensitivity. There is continued interest in the role of cortical oscillations in speech perception, even for auditory-only speech, where there is evidence that cortical oscillations entrain to the ongoing speech signal (Giraud & Poeppel 2012; Peelle & Davis 2012). Aligning cortical oscillations to perceptual input can increase sensitivity (i.e., faster or more accurate at detecting near-threshold inputs). Entrainment is amplified by visual input, making multimodal integration in auditory cortex a viable mechanism for audiovisual processing (Schroeder et al., 2008).

Alongside this increased perceptual sensitivity comes the visual information that restricts the possible sounds (i.e., words). For example, when trying to make a "cat/cap" distinction, having the lips open gives a clear indication that "cap" is not correct. This perspective is described within the intersection density framework, which is a straightforward extension of unimodal lexical competition to audiovisual speech: speech perception is constrained to items that are compatible with both auditory and visual input.

Schematic models for audiovisual speech processing, proposing a single mechanism of multisensory integration (occurring early or late in the processing stream), or multiple complementary stages.

We discuss these complementary types of integration in the context of schematic models of audiovisual speech processing. Although it seems like a basic point, from our perspective the available evidence suggests that multisensory processing influences perception at multiple levels (and in neuroantomically dissociable regions).

Finally, one very important aspect worth emphasizing: like all speech processing (Peelle 2012), the details of audiovisual speech processing are likely heavily influenced by the type of stimulus and task that we are doing. So, connected speech (sentences) may provide visual information that aids in processing that is simply unavailable in single words or phonemes. Similarly, phoneme studies (say, with a token of /da/) will not require the lexical competition and selection processes involved in word perception. This is not to say that any of these levels are more or less valid to study; however, we have to be cautious when trying to make generalizations, and sensitive to differences in visual information as a function of linguistic level (phoneme, word, sentence).

There are still many unresolved questions regarding the representations of visual-only speech, and audiovisual integration during speech processing. Hopefully the suggestions Mitch and I have made will be useful, and we look forward to having some more data in the coming years that speak to these issues.

References:

Giraud A-L, Poeppel D (2012) Cortical oscillations and speech processing: Emerging computational principles and operations. Nat Neurosci 15:511-517. doi:10.1038/nn.3063

Peelle JE (2012) The hemispheric lateralization of speech processing depends on what "speech" is: A hierarchical perspective. Frontiers in Human Neuroscience 6:309. doi:10.3389/fnhum.2012.00309 (PDF)

Peelle JE, Davis MH (2012) Neural oscillations carry speech rhythm through to comprehension. Frontiers in Psychology 3:320. doi:10.3389/fpsyg.2012.00320 (PDF)

Peelle JE, Sommers MS (2015) Prediction and constraint in audiovisual speech perception. Cortex. doi:10.1016/j.cortex.2015.03.006 (PDF)

Schroeder CE, Lakatos P, Kajikawa Y, Partan S, Puce A (2008) Neuronal oscillations and visual amplification of speech. Trends in Cognitive Sciences 12:106-113. doi:10.1016/j.tics.2008.01.002

New paper: Listening effort and accented speech

Jonathan August 5, 2014

Out now in Frontiers: A short opinion piece on listening effort and accented speech, written in collaboration with Wash U colleague Kristin Van Engen. The crux of the article is that there is increasing agreement that listening to degraded speech requires listeners to engage additional cognitive processes, under a generic label of "listening effort". Listening effort is typically discussed in terms of hearing impairment or background noise, both of which obscure acoustic features in the speech signal and make it more difficult to understand. In this paper Kristin and I argue that accented speech is also difficult to understand, and should be thought of in a similar context.

We have tried to frame these issues in a general way that incorporates multiple kinds of acoustic challenge. That is, the degree to which the incoming speech signal does not match our stored representations determines the amount of cognitive support needed. This mismatch could come from background noise, or from systematic phonemic or suprasegmental deviations associated with accented speech. A related point is that comprehension accuracy depends both on the quality of the incoming acoustic signal, and the amount of additional cognitive support a listener allocates: Degraded or accented speech may be perfectly intelligible if sufficient cognitive resources are available (and engaged).

Figure 1. (A) Speech signals that match listeners' perceptual expectations are processed relatively automatically, but when acoustic match is reduced (due to, for example, noise or unfamiliar accents), additional cognitive resources are needed to co… — Figure 1. (A) Speech signals that match listeners' perceptual expectations are processed relatively automatically, but when acoustic match is reduced (due to, for example, noise or unfamiliar accents), additional cognitive resources are needed to compensate. (B) Executive resources are recruited in proportion to the degree of acoustic mismatch between incoming speech and listeners' representations. When acoustic match is high, good comprehension is possible without executive support. However, as the acoustic match becomes poorer, successful comprehension cannot be accomplished unless executive resources are engaged. Not shown is the extreme situation in which acoustic mismatch is so poor that comprehension is impossible.

I like this article because it raises a number of interesting questions that can be experimentally tested. One of the big ones is the degree to which the type of acoustic mismatch matters: that is, are similar cognitive processes engaged when speech is degraded due to background noise as when an unfamiliar accent reduces intelligibility? My instinct says yes, but I wouldn't bet on it until more data are in.

Reference:

Van Engen KJ, Peelle JE (2014) Listening effort and accented speech. Front Hum Neurosci 8:577. http://journal.frontiersin.org/Journal/10.3389/fnhum.2014.00577/full

The Speech, Hearing, And Communication (SHAC) Lab