Learn here how it was built and how it works!
What does “a personal model of trumpery” mean?
A model is a representation (in this case, a mathematical formula) of someone or something. The phenomenon we study with the model is trumpery, an old English word originating from the French tromperie, meaning deception. We call it a ‘personal’ model because it has been developed for one person: the 45th US president. In the psychology literature, there are many deception models (we studied 24 of them in our paper!) but it is the first one to be tailored to one person.
A key psychological insight
Deception models are based on a key psychological insight: the deception hypothesis, which states that lying influences the words people use since lying can be cognitively demanding, elicit emotions and stress, and increase attempted behavioral control. In a nutshell, the type of words used when telling a lie differ from those used when telling the truth.
How was the model built?
Step 1: Get the tweets.
Tweets of the 45th US president have been systematically checked by the Washington Post. We gathered 3-months of
tweets (February 1st and April 30th 2018) and made two groups: the factually-correct ones (truths) and the
factually-incorrect ones (possible lies).
Step 2: Count the words.
We used LIWC (pronounce “Luke”, https://liwc.wpengine.com/) to count the words in each tweet, and to classify
them into more than 100 categories. Some categories are linguistic (adverbs, pronouns, punctuations), some are
psychological (emotions, cognitive processes). For each tweet, we obtained the proportion of words for each
category.
Step 4: Build a model.
Using statistical methods, we selected word categories that were most different between correct and incorrect tweets, but
also most different between themselves. We tried to keep as few categories as possible, keeping only the most
meaningful.
We obtained the following 13 categories. The graphs below show, for each of the 13 categories, the average
proportion of this category in correct tweets, in incorrect tweets, and in the tweet you randomly selected above. The model uses
these proportions to determine whether the tweet resembles more a true statement or a false statement.
Step 5: Prepare a test set.
We built the model on the data set we gathered in Steps 1 and 2. To test the model, we needed a second data set.
If a model is good, it should also be able to tell which tweets are factually correct or incorrect on a second, independent
dataset. When we worked on this, in Spring 2018, there were not enough new tweets to test the model, so we used
older tweets, from November 2017 to January 2018 and repeated steps 1 and 2, making two groups (correct /
incorrect) and counting words of each category. It gave us the test set.
Step 6: Compute probability.
Bear with us, there is some math in that part. To compute the probability of a tweet to be factually incorrect, we proceeded
in
two phases. First, we multiplied the proportions of words from each of the 13 categories selected in Step 4 by coefficients. The
coefficients are the key ingredients of the model. They were also determined in step 4.
For instance, for the tweet you drew, it is:
What we obtained here is not a probability yet (the barbaric term for this is “log odds”). To get a probability we need to apply (sorry, another barbaric term) the logistic function : $$ f(x)=\frac{1}{1+e^{-x}}$$
Step 7: Predict.
We predicted the tweet to be a factually incorrect if the probability computed in step 6 was higher than we would have expected.
The Washington post classified 30.3% of the tweets in our first dataset as factually incorrect. We called this probability the
"prior probability". So a tweet with a probability larger that 30.3% was predicted to be factually incorrect. Any
tweet with a probability lower than 30.3% was predicted factually correct.
With this reasoning, the tweet you drew was
Step 8: compute accuracy.
For each tweet of the test set, we can compare our prediction (factually correct/incorrect) with the
classification of the Washington Post. We were right 74% of the time. Yeah! We were very happy with that, not
expecting such a result!
Steps 9, 10, 11... keep working.
It was a nice first result, but we kept working to improve our research, thanks to suggestions from editors of
Psychological Science and from anonymous reviewers. We gathered 24 deception models from the literature and
checked whether they could do as well as ours. Spoiler: they don’t. We tested whether it mattered if we removed
some specific categories, or if we change which tweets are used to build the model and which tweets are used to
test the model. We even ran a so-called placebo check: Would we have been able to get the same results if the
Washington Post had done a completely lousy job, randomly saying which tweets were factually incorrect? The answer: no, not even
close.
Learn how the project started
In Spring 2018, Sophie was reading yet another article calling the 45 th US president, Donald J. Trump, a
liar, when she asked herself: shouldn’t we give him the benefit of the doubt? With fact-checking you
can demonstrate whether information is factually correct or incorrect, but not the intention to deceive.
Maybe he is just making honest mistakes, or he doesn’t know that what he’s saying is incorrect; maybe
his beliefs are just incorrect. Lying affects people’s behavior, including their speech. If he is not lying, but
merely wrong, lie detection methods should not work. However, if he is lying, then we should be
able to predict when he tells a lie from the type of words he uses. This is called “linguistic lie detection”.
At that time, Alice was doing a research internship in the group of Aurelien, Sophie’s boss at Erasmus
University Rotterdam. Aurelien, an economist, had received a grant from the European Research Council
to work on truth-telling behaviour and had hired Sophie, a psychologist, to work with him on the
project.
Sophie went to Alice, described her idea to her, and asked her to collaborate on this project. They
gathered tweets, contacted the Washington Post fact checkers to know which tweets
true and which ones were supposed to be lies, and connected the two datasets. A preliminary analysis
of the tweets proved very promising: there were tremendous linguistic differences between factually correct and incorrect tweets!
Sophie brought in two more researchers to join the team: Ronald, a computer scientist, with a lot of
experience on deception detection, and Aurelien, with no experience on this exact topic but with a lot of
good will and some data analysis skills to compensate. And this is how the project started.
Meet the researchers
A datavisualization created by Alice Havrileck on February 2021.