Linguistic studies have shown that language differs between truthful and deceptive statements. A person deliberately telling a lie, and not making an honest mistake or simply having wrong beliefs, will display behavioral and linguistic changes. This is called the “deception hypothesis”.
Differences between truthful and deceptive language are, however, small, and difficult to observe. There are also substantial variations between people, which makes it difficult, in practice, to predict whether someone is telling the truth. One would need to gather a lot of statements of a single person, check which ones are true, to then be able to predict when this person is telling the truth. A tough job that no one really bothered to undertake… until a few years ago when many statements of a single individual have been fact-checked.
Often controversial, but considered official White House communication, the tweets of the 45th US president are a gold-mine for linguistic analysis. They have the advantage to be mostly written by one person, hence representing his own linguistic patterns. The Washington Post fact-checkers analyzed every single one of them to determine whether they were factually incorrect, allowing us to build a detector of incorrect tweets, based on language use. The detector does not say whether it is a truth or a lie with certainty, but it relies on linguistic patterns of factually incorrect statements. The existence of these patterns suggest the tweets could be deceitful.

Feel free to try the detector here!

This tweet is predicted to be factually with a % probability.

Learn here how it was built and how it works!

What does “a personal model of trumpery” mean?

A model is a representation (in this case, a mathematical formula) of someone or something. The phenomenon we study with the model is trumpery, an old English word originating from the French tromperie, meaning deception. We call it a ‘personal’ model because it has been developed for one person: the 45th US president. In the psychology literature, there are many deception models (we studied 24 of them in our paper!) but it is the first one to be tailored to one person.

A key psychological insight

Deception models are based on a key psychological insight: the deception hypothesis, which states that lying influences the words people use since lying can be cognitively demanding, elicit emotions and stress, and increase attempted behavioral control. In a nutshell, the type of words used when telling a lie differ from those used when telling the truth.

How was the model built?

Step 1: Get the tweets.
Tweets of the 45th US president have been systematically checked by the Washington Post. We gathered 3-months of tweets (February 1st and April 30th 2018) and made two groups: the factually-correct ones (truths) and the factually-incorrect ones (possible lies).

Step 2: Count the words.
We used LIWC (pronounce “Luke”, https://liwc.wpengine.com/) to count the words in each tweet, and to classify them into more than 100 categories. Some categories are linguistic (adverbs, pronouns, punctuations), some are psychological (emotions, cognitive processes). For each tweet, we obtained the proportion of words for each category.

Step 3: Study differences.
When people make honest mistakes, the words they use do not differ between factually correct and factually incorrect statements. However, if they are deceitful, the deception hypothesis tells us that we should observe differences in language use. The graph on the left shows the LIWC categories for which there were large difference between factually correct and factually icorrect tweets. Red means incorrect tweets had more of this category, green means correct tweets had more. Incorrect tweets had more words overall, a much higher proportion of negative words, but a smaller proportion of emotion-related words. Such substantial linguistic differences seem to suggest that incorrect tweets are not honest mistakes, supporting the deception hypothesis.

Step 4: Build a model.
Using statistical methods, we selected word categories that were most different between correct and incorrect tweets, but also most different between themselves. We tried to keep as few categories as possible, keeping only the most meaningful.
We obtained the following 13 categories. The graphs below show, for each of the 13 categories, the average proportion of this category in correct tweets, in incorrect tweets, and in the tweet you randomly selected above. The model uses these proportions to determine whether the tweet resembles more a true statement or a false statement.

Mean for factually correct
tweets
Percentage of word use
for the tweet you drew
Mean for factually incorrect
tweets
Linguistic cues that are more used
in factually incorrect tweets
This tweet has words and the average number of words in the former US President's factually incorrect tweets is 40 wherease it is 31 in his correct tweets. The longer tweets are more likely to be incorrect.

This tweet is composed of % comparison words. This category of words is more used in the former US President's incorrect tweets with an average of 2.4%, to be compared with 1.4% in correct tweets.

This tweet is composed of % of @. This punctuation mark is more used in the former President's correct tweets with an average of 1.4%, to be compared with an average of 0.1% in incorrect tweets.

Linguistic cues that are more used
in factually correct tweets

Step 5: Prepare a test set.
We built the model on the data set we gathered in Steps 1 and 2. To test the model, we needed a second data set. If a model is good, it should also be able to tell which tweets are factually correct or incorrect on a second, independent dataset. When we worked on this, in Spring 2018, there were not enough new tweets to test the model, so we used older tweets, from November 2017 to January 2018 and repeated steps 1 and 2, making two groups (correct / incorrect) and counting words of each category. It gave us the test set.

Step 6: Compute probability.
Bear with us, there is some math in that part. To compute the probability of a tweet to be factually incorrect, we proceeded in two phases. First, we multiplied the proportions of words from each of the 13 categories selected in Step 4 by coefficients. The coefficients are the key ingredients of the model. They were also determined in step 4.

For instance, for the tweet you drew, it is:

=

What we obtained here is not a probability yet (the barbaric term for this is “log odds”). To get a probability we need to apply (sorry, another barbaric term) the logistic function : $$f(x)=\frac{1}{1+e^{-x}}$$

f(
)
= %

Step 7: Predict.
We predicted the tweet to be a factually incorrect if the probability computed in step 6 was higher than we would have expected. The Washington post classified 30.3% of the tweets in our first dataset as factually incorrect. We called this probability the "prior probability". So a tweet with a probability larger that 30.3% was predicted to be factually incorrect. Any tweet with a probability lower than 30.3% was predicted factually correct.

With this reasoning, the tweet you drew was

Step 8: compute accuracy.
For each tweet of the test set, we can compare our prediction (factually correct/incorrect) with the classification of the Washington Post. We were right 74% of the time. Yeah! We were very happy with that, not expecting such a result!

Steps 9, 10, 11... keep working.
It was a nice first result, but we kept working to improve our research, thanks to suggestions from editors of Psychological Science and from anonymous reviewers. We gathered 24 deception models from the literature and checked whether they could do as well as ours. Spoiler: they don’t. We tested whether it mattered if we removed some specific categories, or if we change which tweets are used to build the model and which tweets are used to test the model. We even ran a so-called placebo check: Would we have been able to get the same results if the Washington Post had done a completely lousy job, randomly saying which tweets were factually incorrect? The answer: no, not even close.

Learn how the project started

In Spring 2018, Sophie was reading yet another article calling the 45 th US president, Donald J. Trump, a liar, when she asked herself: shouldn’t we give him the benefit of the doubt? With fact-checking you can demonstrate whether information is factually correct or incorrect, but not the intention to deceive. Maybe he is just making honest mistakes, or he doesn’t know that what he’s saying is incorrect; maybe his beliefs are just incorrect. Lying affects people’s behavior, including their speech. If he is not lying, but merely wrong, lie detection methods should not work. However, if he is lying, then we should be able to predict when he tells a lie from the type of words he uses. This is called “linguistic lie detection”. At that time, Alice was doing a research internship in the group of Aurelien, Sophie’s boss at Erasmus University Rotterdam. Aurelien, an economist, had received a grant from the European Research Council to work on truth-telling behaviour and had hired Sophie, a psychologist, to work with him on the project.

Sophie went to Alice, described her idea to her, and asked her to collaborate on this project. They gathered tweets, contacted the Washington Post fact checkers to know which tweets true and which ones were supposed to be lies, and connected the two datasets. A preliminary analysis of the tweets proved very promising: there were tremendous linguistic differences between factually correct and incorrect tweets!

Sophie brought in two more researchers to join the team: Ronald, a computer scientist, with a lot of experience on deception detection, and Aurelien, with no experience on this exact topic but with a lot of good will and some data analysis skills to compensate. And this is how the project started.

Meet the researchers

Sophie Van der Zee is an assistant professor in behavioural economics at the Erasmus University Rotterdam. She studies why people lie and cheat, and how to catch them if they do. The result? She loves playing boardgames, but no one wants to play with her anymore.
Ronald Poppe is an assistant professor in computer science at Utrecht University. He develops artificial intelligence technology to analyze human behavior. So, he’s teaching computers how to study us. Probably, it started when he couldn't beat the computer at chess when he was a kid. If you can't beat them, join them.
Alice Havrileck is a design student at ENSCI who graduated in economics at ENS Paris-Saclay. She likes to study behaviors as well as applied arts and that motivated her to move from economics to design. She is still interested in obscure analysis but now she would rather illustrate them herself than with ugly economic graphs.
Aurélien Baillon is a professor of behavioral economics at Erasmus University Rotterdam. He likes to study how people cope with uncertainty, what they believe, what they say they believe, and whether we can trust what people say. He likes cooking and scuba diving too but did not manage to make a living out of it.

A datavisualization created by Alice Havrileck on February 2021.