New paper: Listening effort and accented speech

Jonathan August 5, 2014

Out now in Frontiers: A short opinion piece on listening effort and accented speech, written in collaboration with Wash U colleague Kristin Van Engen. The crux of the article is that there is increasing agreement that listening to degraded speech requires listeners to engage additional cognitive processes, under a generic label of "listening effort". Listening effort is typically discussed in terms of hearing impairment or background noise, both of which obscure acoustic features in the speech signal and make it more difficult to understand. In this paper Kristin and I argue that accented speech is also difficult to understand, and should be thought of in a similar context.

We have tried to frame these issues in a general way that incorporates multiple kinds of acoustic challenge. That is, the degree to which the incoming speech signal does not match our stored representations determines the amount of cognitive support needed. This mismatch could come from background noise, or from systematic phonemic or suprasegmental deviations associated with accented speech. A related point is that comprehension accuracy depends both on the quality of the incoming acoustic signal, and the amount of additional cognitive support a listener allocates: Degraded or accented speech may be perfectly intelligible if sufficient cognitive resources are available (and engaged).

Figure 1. (A) Speech signals that match listeners' perceptual expectations are processed relatively automatically, but when acoustic match is reduced (due to, for example, noise or unfamiliar accents), additional cognitive resources are needed to co… — Figure 1. (A) Speech signals that match listeners' perceptual expectations are processed relatively automatically, but when acoustic match is reduced (due to, for example, noise or unfamiliar accents), additional cognitive resources are needed to compensate. (B) Executive resources are recruited in proportion to the degree of acoustic mismatch between incoming speech and listeners' representations. When acoustic match is high, good comprehension is possible without executive support. However, as the acoustic match becomes poorer, successful comprehension cannot be accomplished unless executive resources are engaged. Not shown is the extreme situation in which acoustic mismatch is so poor that comprehension is impossible.

I like this article because it raises a number of interesting questions that can be experimentally tested. One of the big ones is the degree to which the type of acoustic mismatch matters: that is, are similar cognitive processes engaged when speech is degraded due to background noise as when an unfamiliar accent reduces intelligibility? My instinct says yes, but I wouldn't bet on it until more data are in.

Reference:

Van Engen KJ, Peelle JE (2014) Listening effort and accented speech. Front Hum Neurosci 8:577. http://journal.frontiersin.org/Journal/10.3389/fnhum.2014.00577/full

New paper: Relating brain anatomy and behavior (Cook et al.)

Jonathan July 29, 2014

I'm happy to report that a collaborative project from Penn is now published, spearheaded by Phil Cook. In this paper we explored combining approaches in order to relate individual differences in gray and white matter to behavioral performance. Phil implemented two core steps in a group of participants that included frontotemporal dementia (FTD) patients and healthy older adults. For each participant we had T1- and diffusion-weighted structural images, providing cortical thickness and fractional anisotropy (FA) measurements. We also had a set of behavioral measures that included category fluency ("Name as many animals as you can in 30 seconds") and letter fluency ("Say as many words beginning with the letter F as you can in 30 seconds").

Regions of interest for cortical thickness (top row) and FA (bottom row) defined by eigenanatomy.

Phil first used eigenanatomy in order to define regions of interest (ROIs) for the gray matter images. Eigenanatomy is a dimensionality reduction scheme that identifies voxels that covary across individuals; ROIs are chosen that can maximally explain variance in the dataset.

The second step is my favorite aspect of this work, and can be implemented regardless of how ROIs are defined. Phil used a model selection procedure implemented in R to assess which combination of ROIs best predicted behavior. He used a combination of cross-validation and AIC to evaluate what predictors performed best. The elegant thing about this approach is that it incorporates both gray matter and white matter predictors in the same framework; thus, the model selection procedure can tell you whether gray matter alone, white matter alone, or some combination best explain the behavioral data.

Cross-validation model selection suggests the including both gray and white matter predictors (black circles) results in significantly better performance than any single modality, and that 4 regions provide the best predictions.

ROIs significantly associated with verbal fluency.

Perhaps not surprisingly, combining gray matter and white matter was consistently better than using either modality alone, as one might expect from a cortical system comprised of multiple regions connected with white matter tracts. It is encouraging that the regions identified are sensible in the context of semantic storage and retrieval during category fluency.

More importantly, the approach that Phil put together tackles the larger problem of how to combine data from multiple modalities in a quantitative, model-driven approach. I hope that we see more studies that follow a similar approach.

Reference:

Cook PA, McMillan CT, Avants BB, Peelle JE, Gee JC, Grossman M (2014) Relating brain anatomy and cognitive ability using a multivariate multimodal framework. NeuroImage 99:477-486. doi:10.1016/j.neuroimage.2014.05.008 (PDF)

New article: The hemispheric lateralization of speech depends on what "speech" is

Jonathan November 7, 2012

Just out in Frontiers in Human Neuroscience is a short opinion piece on whether speech is processed mostly in both hemispheres, or preferentially in the left hemisphere. The bottom line is that it depends on whether "speech" refers to speech-like sounds, words, or connected speech (e.g., sentences). Although this might seem an obvious point, it's one that fairly regularly gets glossed over in the literature.

It's a short read, but there are two main points worth emphasizing, one more theoretical:

Phonological and lexical information is processed largely bilaterally in temporal cortex, whereas connected speech relies on a left-hemisphere pathway that includes left inferior frontal gyrus. Importantly, the distinction between unconnected and connected speech is not dichotomous, but follows a gradient of laterality depending on the cognitive processes required: lateralization emerges largely as a result of increased linguistic processing.

and one more methodological:

[F]or true claims of differential hemispheric contributions to speech processing, the left and right hemisphere responses need to be directly compared...In the absence of these or similar statistical comparisons, any statements about lateralization of processing need to be made (and taken) lightly.

See the Frontiers website for the whole article.

The Speech, Hearing, And Communication (SHAC) Lab