AoA 2021

Academy of Aphasia 2021 Poster Presentation

Improving Automatic Semantic Similarity Classification of the PNT


Background: In the Philadelphia Naming Test (PNT; Roach et al., 1996), paraphasic errors are classified into six major categories according to three dimensions: lexicality, phonological similarity and semantic similarity to the target. Our team has developed software called ParAlg (Paraphasia Algorithms) for automatically classifying paraphasias by these three dimensions given a transcription (Fergadiotis et al, 2016, Mckinney-Bock & Bedrick, 2019). The classifier takes the form of a decision tree mirroring the scoring of the PNT, as illustrated in Figure 1. In ParAlg, the semantic similarity of a response to the target is determined with a binary classifier that uses a language model: a machine learning based model that produces meaningful representations of words in a vector space. Previously, the language model used in ParAlg was word2vec (Mikolov et al. 2013).

Objectives: This work focuses on improving the semantic similarity classification in ParAlg. We fine-tune a modern language model called BERT (Bidirectional Encoder Representations from Transformers; Devlin et al., 2019) alongside a binary classifier to categorize each transcribed response to a PNT item as semantically similar to the target or not. BERT produces contextual vectors, meaning the representation of a word changes based upon the context given to the model, in contrast to the static representations in word2vec. Therefore, BERT may allow for more accurate processing of polysemous words. Finally, we compare ParAlg classification results using word2vec and BERT.

Methodology: Our dataset is a subset of the Moss Aphasia Psycholinguistic Database (MAPPD; Mirman et al., 2010) consisting of 11,999 clinician-transcribed and categorized paraphasias from 296 participants (mixed, semantic, abstruse neologism, phonologically-related neologism, formal, other). Errors are classified using ParAlg with word2vec or BERT to make semantic judgments. Performance is evaluated using metrics computed based on the corresponding classification matrices using 5-fold cross validation in order to prevent over-fitting.

Results: Overall, BERT outperformed word2vec when determining the semantic similarity of each error to the target. Using BERT led to 556 semantic misclassifications compared to 1,084 with word2vec. There was a downstream effect of these improvements on categorization in the PNT.

Further, a post-hoc qualitative analysis suggests that BERT’s improved performance is associated with its ability to handle polysemy. For example, the most common word2vec error is marking the target/response pair glass/cup as semantically dissimilar. This is due to the fact that word2vec has one vector for each word despite polysemy; the closest word to cup in word2vec space is championship, since it is trained on news data. BERT, however, correctly classifies all 24 of those as semantically similar, since it produces contextual vectors and is able to refine to the appropriate meaning of cup.

Conclusions: Changing from word2vec to the contextual language model BERT makes substantial improvements to semantic similarity classification by reducing the number of semantic misclassifications by half. Moreover, BERT corrects a number of particularly “naïve” word2vec mistakes that affect the face validity of the system and may pose a significant implementation barrier for adoption by the clinical community.

Oct 24, 2021 12:30 PM