Neuropsychologia. 2016 Jan 29;81:79-93. doi: 10.1016/j.neuropsychologia.2015.12.008.

Neural responses towards a speaker’s feeling of (un)knowing.

Jiang X1, Pell MD2.
  • 1School of Communication Sciences and Disorders and Center for Research in Brain, Language and Music, McGill University, Montréal, Canada. Electronic address:
  • 2School of Communication Sciences and Disorders and Center for Research in Brain, Language and Music, McGill University, Montréal, Canada. Electronic address:



During interpersonal communication, listeners must rapidly evaluate verbal and vocal cues to arrive at an integrated meaning about the utterance and about the speaker, including a representation of the speaker’s ‘feeling of knowing’ (i.e., how confident they are in relation to the utterance). In this study, we investigated the time course and neural responses underlying a listener’s ability to evaluate speaker confidence from combined verbal and vocal cues. We recorded real-time brain responses as listeners judged statements conveying three levels of confidence with the speaker’s voice (confident, close-to-confident, unconfident), which were preceded by meaning-congruent lexical phrases (e.g. I am positive, Most likely, Perhaps). Event-related potentials to utterances with combined lexical and vocal cues about speaker confidence were compared to responses elicited by utterances without the verbal phrase in a previous study (Jiang and Pell, 2015). Utterances with combined cues about speaker confidence elicited reduced, N1, P2 and N400 responses when compared to corresponding utterances without the phrase. When compared to confident statements, close-to-confident and unconfident expressions elicited reduced N1 and P2 responses and a late positivity from 900 to 1250 ms; unconfident and close-to-confident expressions were differentiated later in the 1250-1600 ms time window. The effect of lexical phrases on confidence processing differed for male and female participants, with evidence that female listeners incorporated information from the verbal and vocal channels in a distinct manner. Individual differences in trait empathy and trait anxiety also moderated neural responses during confidence processing. Our findings showcase the cognitive processing mechanisms and individual factors governing how we infer a speaker’s mental (knowledge) state from the speech signal. Copyright © 2015 Elsevier Ltd.

KEYWORDS: ERPs; Expressed confidence; Nonverbal communication; Pragmatic inference; Prosody

PMID: 26700458



In this paper, we are interested in the temporal neural dynamics underlying how listeners decode speaker confidence in spoken language by recording scalp EEGs. The expressed confidence in the voice, like vocal emotion, is characterized by both linguistic choices and vocal variations. Phrases of different levels of probability (I’m sure/ Maybe) code speaker’s certainty.  The voice such as the shape of intonation contour, as well as speech rate and pauses, affects the perceived confidence.

Behavioral evidence has suggested that nonverbal (vocal) cues are more reliable and effective cues than verbal (linguistic) cues for listener to interpret speaker meaning, and therefore receive more weights when multiple cues are present. The integration of linguistic and vocal cues about speaker confidence could enhance recognition over situations when only a single channel is available.

The neurocognitive mechanisms underlying how listeners processing speaker confidence from combined linguistic and vocal cues are unclear. In our recent attempt, we have set out to test the neural responses towards statements spoken in different tone of voice (Jiang & Pell, 2015).

In this study, we aimed to further delineate the neural responses towards linguistic cues which served as contexts in differentiating speaker’s contexts. To this end, we compared statements produced in a confident, close-to-confident or unconfident tone of voice which were either preceded by a congruent lexical phrase (I’m certain…/Most likely…/Perhaps…) or presented without the lexical phrase. By examining the ERP effects at the onset of the vocal expression after the phrase, our approach would allow new insights about expressed confidence processing as well as the contextual effects of the preceding linguistic context on how the listener’s brain register vocal confidence meanings.

We invited 30 native English speaker (half female) who did not suffer any neurological or psychiatric disorders to judge how confident the speaker was. Their trait empathy was measured using Interpersonal Reactivity Index (IRI) and their trait anxiety was measured using State-Trait Anxiety Inventory (STAI). Female revealed higher trait anxiety than male. They listened to ninety-six triplets of vocal confidence recordings (confident, close-to-confident and unconfident) which stated facts, opinions or made judgments. In an independent perceptual test, we ensured the three levels of confidence were differentiated in the perceived confidence. We also analyzed how acoustically these levels of vocal confidence differed from each other in both preceding linguistic phrase (e.g. I am confident) and the main utterance (e.g. he has access to the building). In general, we found the intended confidence levels can be differentiated by a minimal set of acoustic parameters in both the main utterance and the initial lexical phrase that bears many of the same prosodic information.


fig1Figure 1. Bar graphs showing the mean and standard deviation of the mean f0 (Meanf0), the f0 range (Rangef0), the speech rate (number of syllable per second), the mean amplitude (MeanAmp) and the amplitude range (RangeAmp), in recordings used in the current experiment of three intended levels of confidence for both the lexical cues and the following main utterances.


We recorded high-temporal resolution EEGs to capture the brain responses while the listeners made decisions about the speaker confidence. The experimental procedure in each trial was presented as below.



Figure 2. The schematic demonstration of the procedure and timing of each trial


We focused on ERP responses at different time scales: 70-160ms for N1, 180-250ms for P2, 300-900ms for N400, and 900-1600ms for the late positivity. We then built linear mixed effects models (LMEM) to capture how intended confidence and topographical factors modulate each neural response. We only analyzed trials that were correctly identified the intended confidence level by the listener (a rating of 4 and 5 for confident, 3 and 4 for close-to-confident and 1 for unconfident). We built models with both with-cues (LEX + VOC) and no-cue statements (VOC only) to examine the effects of preceding cue on the neural responses. In these models, we also included listener sex and individual IRI or STAI score to examine the individual differences in the confidence effects on the ERP responses.

In behavioral data, we found the speaker confidence rating increased as a function of the intended confidence level. As compared with males, female listeners rated higher in confident and lower in unconfident expression. As compared with VOC only statements, the LEX + VOC reduced the confidence rating than VOC only in unconfident and close-to-confident conditions.

We looked at the effects of speaker confidence and the effects of linguistic context for ERPs. For the early effects of confidence, we found the confident elicited a larger N100 response than the other expression types in female listeners and a larger P2 response in male listeners. Unconfident and close-to-confident expressions did not differ in the early time windows. The P2 between confident and unconfident expression in male listeners was fully mediated by the lower trait anxiety of male than female listeners.

For the late effects of confidence, as compared with the confident expression, unconfident expression elicited a larger positivity in 900-1600ms while close-to-confident elicited a larger positivity elicited a larger positivity in 900-1250ms. The magnitude of the late positivity was larger for those who displayed higher trait empathy.

For the effects of linguistic context, we found the early N1-P2 response was reduced in LEX+VOC statement than in VOC only. The N1 reduction was found in females and the P2 reduction was larger in male than in female listeners. The female listener also revealed a N400 reduction in the LEX+VOC condition. The magnitude of N400 effect was larger for those who displayed higher trait empathy.




Figure 3. The grand average waveforms of the three intended levels of confidence on Cz (top panel) and the grand average waveforms of the VOC only and the LEX + VOC condition (bottom panel) on Pz. The sentence exemplar was given at the top of each representation. The onset of the ERP waveforms was marked in the sentence.


These patterns support data showing that channel redundancy facilitates on-line recognition of a speaker’s cognitive or affective state in a very short time course (Pell et al., 2015). Females and high-empathizing listeners were more apt to generate inferences about the speaker meaning in relation to the communicative context. Females integrate the two linguistic and vocal channels in speech differently than males at early stages of forming a representation about the speaker’s mental state. Our data also highlight ways that trait anxiety affects early attentional shifts that mark the significance of vocal confidence expressions.

Our paper highlights the time course and the associated neural dynamics for decoding and inferring how confident a speaker is from combined linguistic and vocal cues in speech; this information is vital to social perception and impression formation, especially for a person to decode whether a speaker is credible or should be believed during routine social interactions. Our study has generated wide implication regarding fields where understanding another’s feeling of knowing is relevant, such as credibility judgment in a court environment, purchasing behavior as influenced by other’s persuasiveness as well as one’s perceived competency in public speaking area.



Jiang, X. & Pell, D. M. (2015). On how the brain decodes speaker’s confidence. Cortex, 66, 9-34.

Pell, M.D., Rothermich, K., Liu, P., Paulmann, S., Sethi, S., & Rigoulot, S. (2015). Preferential decoding of emotion from human non-linguistic vocalizations versus speech prosody. Biological Psychology, 11, 14-25



This research was supported by a Discovery grant(RGPIN/ 203708-2011) awarded to M. D. Pell from the Natural Sciences and Engineering Research Council of Canada and McLaughlin Scholarship to X. Jiang from Faculty of Medicine in McGill University.



Xiaoming Jiang Ph.D.

Research Associate at School of Communication Sciences and Disorders

McGill University

8th Floor, 2001 McGill College, H3A1G1

Montreal, Quebec, Canada



Multiselect Ultimate Query Plugin by InoPlugs Web Design Vienna | Webdesign Wien and Juwelier SchönmannMultiselect Ultimate Query Plugin by InoPlugs Web Design Vienna | Webdesign Wien and Juwelier Schönmann