© FT Montage/Getty

The level of vitriol targeted at MPs in messages posted on social media site Twitter rose sharply during and after a heated parliamentary debate in which Boris Johnson was accused of using “inflammatory” language the day after his historic Supreme Court defeat.

According to an analysis by the Financial Times of more than 2m tweets surrounding the debate, during which the British prime minister described death threats against politicians as “humbug”, there was a direct correlation between the language used in parliament and the volume of “toxic” tweets from both sides of the Brexit divide. 

The most abusive were those that mentioned the Brexit party or other hard Brexit-related terms in their user description. The FT’s research backs up the warning on Sunday by Justin Welby, the Archbishop of Canterbury, that Mr Johnson’s “inflammatory language” risks “pouring petrol” on Britain’s Brexit divisions. 

Twitter, meanwhile, last week announced higher costs and lower revenues amid controversy over its struggle to tackle abuse on its platform and use of personalised user data. The company claimed that 50 per cent of all abusive tweets were removed by automated tools before users reported them. 

The FT analysed more than 2m tweets over seven days to quantify the increase of toxic tweets. Of tweets mentioning MPs by their Twitter usernames, the number classified by an algorithm as toxic — defined as “a rude, disrespectful, or unreasonable comment that is likely to make you leave a discussion” — increased 392 per cent on the evening of the debate compared with the average for the preceding six evenings. 

A graphic with no description

The most toxic tweets came from users who mentioned the Brexit party or other hard Brexit-related terms in their user description: the percentage of abusive tweets directed at MPs by these accounts — 2.7 per cent — was 48 per cent higher than the average across all users.

The next most disruptive group were users mentioning the pro-Remain #FBPE hashtag in their description, with 1.9 per cent of their tweets classified as toxic — just above the overall average of 1.8 per cent. Users with party-political rather than Brexit-related terms in their descriptions — those mentioning Labour, the Conservatives or the Liberal Democrats — were relatively unlikely to use toxic language.

A chart looking at the numbers of toxic tweets aimed at MPs recently

Much of the vitriol was aimed at the prime minister himself, whose account received 2,097 toxic tweets in six hours — the most received by any MP. Mr Johnson faced swift condemnation from both opposition and Conservative MPs for his own use of language during a debate characterised by MPs and lobby journalists as among the most rancorous in living memory.

Other MPs, however, saw far larger relative increases in the levels of toxic language directed at them. After decrying the use of “offensive, dangerous or inflammatory” language in the House of Commons — to which Mr Johnson responded: “I have never heard such humbug in all my life” — Labour MP Paula Sherriff began receiving toxic tweets at a rate of more than 100 an hour. She had received a total of just 31 over the preceding week.

One such tweet from that evening read: “Tough shit Mrs Shrek. A #SurrenderBill or #SurrenderAct is exactly what Benn’s treacherous act is.” Another read: “Do what the people told you to effing do otherwise yes expect to be strung up metaphorically or physically.” 

In absolute terms, female MPs were disproportionally affected by the wave of toxicity: excluding party leaders, 45 per cent of the 20 largest increases in toxic tweets were directed at women, despite their constituting only a third of all MPs.

Ms Sherriff said she was “not surprised in the least” by the FT’s findings. She said she received “thousands” of messages via social media and email during the week of the debate, adding: “When I say thousands, I’m talking thousands that went beyond ‘You’re a shit MP’. Some of them, you can’t imagine how they could have been any more offensive.”

The communications include death threats, which she said had escalated in frequency and seriousness since she entered parliament in 2015. “I don’t cry any more”, said Ms Sherriff. “I just despair.” 

She added that advice to simply ignore social media was useless: “I need to read them because, if it’s a serious death threat, I need to alert people.”

Anna McMorrin, another Labour MP, explicitly linked Mr Johnson’s use of language to the levels of abuse received by MPs on social media, saying during the debate: “We know the impact that the prime minister’s language and behaviour are having on people out there in the country . . . Just today, I have seen a huge escalation in the abuse on social media and in the language and the incitement that he is causing.”

The FT’s findings also support Ms Sherriff’s experience in that the abuse she receives is “almost always” from purported Brexiters. “I just don’t see the same level of animosity from Remainers, looking not just at what I get but at what some of the Leave MPs get,” she said.

The analysis found that a similar, though less pronounced, increase in toxicity followed the Supreme Court’s September 24 ruling that the prorogation of parliament instigated by Mr Johnson’s government was unlawful and consequently void. The number of toxic tweets increased 164 per cent compared with a corresponding weekday in October.

After #Supreme Court, the two most prevalent hashtags used in toxic tweets at MPs on the day of the ruling were #BrexitBetrayal and #FuckOffRemainers.

A graphic with no description

In the FT analysis, tweets were classified as toxic using Perspective, a tool to aid online comment moderation created by Jigsaw, a security initiative of Google’s parent company Alphabet. According to the Perspective website, it uses machine learning models trained on “millions of examples gathered from several online platforms and reviewed by human annotators” to “score the perceived impact a comment might have on a conversation”.

Bertram Vidgen, a postdoctoral researcher in online hate speech at The Alan Turing Institute, uses natural language processing (NLP) tools including Perspective extensively. He confirmed its suitability for the FT analysis, despite the limitations of NLP tools in general.

He added that general-purpose tools of the type commonly used for online sentiment analysis would be less effective at detecting abuse. “A lot of people who come to this space hope they can use a sentiment classifier and it’ll give them good results or a good starting point, and it really doesn’t,” he said.

“There are big differences between anger, aggression or other typical sentiment categories and hate speech, which can be very complex, nebulous, indirect and implicit. So off-the-shelf tools are not going to do that well.”

Perspective is not without its limitations, however. Mr Vidgen said: “Every system will have been trained on a data set and that data set will inevitably have some biases.” Perspective had been criticised in particular for its over-classification of African-American English dialect as toxic, he noted.

While Twitter may be “radically unrepresentative” of British society as a whole, it would be a mistake to “judge the value of social media against the value of polls”, said Carl Miller, research director at the Demos think-tank’s Centre for the Analysis of Social Media.

“[In a poll], you ask people a series of questions which likely they don’t actually care that much about, and sometimes they generate an opinion as you’re asking. On social media, you’re sitting back and listening to these things just happen in the wild, as it were, unmediated. That can be massively valuable.”

He added: “It’s also the easiest way people have of directly and immediately reaching the powerful and the famous. Where previously people would shout at their television, now they start tweeting abuse.”

Additional reporting by Federica Cocco and Madison Darbyshire in London


Tweets mentioning at least one MP by their username were collected using the Twitter standard search application programming interface (API) between September 26 and 28 and on October 23. 

The FT understands that tweets are not subject to sampling prior to being made available via the API except to the extent that deleted tweets and those failing to meet Twitter quality metrics may have been excluded. Perspective uses convolutional neural network (CNN) models to perform NLP analysis of passages of text, scoring them on their likelihood to be considered toxic by a human moderator. 

The FT used only the “production” toxicity model, eschewing the less robust severe toxicity and “experimental” models. In order to establish a threshold above which a tweet could be classified as toxic, two female and two male FT reporters manually classified a sample of 300 tweets as toxic or non-toxic. 

Comparing the Perspective toxicity scores with the reporters’ classifications indicated that a score of 0.8, or 80 per cent, was in line with reporter expectations of what could be considered a toxic tweet. This is also the default threshold used by the Coral commenting platform, which leverages Perspective to aid comment moderation. 

The FT also tested general-purpose NLP tools but found them to be less suitable for abusive content detection.

Copyright The Financial Times Limited 2024. All rights reserved.
Reuse this content (opens in new window) CommentsJump to comments section

Follow the topics in this article


Comments have not been enabled for this article.