This is not the only study to analyse referendum day tweets, but here I present a more in-depth analysis, also based on a sample of tweets selected specifically as advocating #Leave/#Remain respectively.
#Leave / #Remain Trend Based on @Brndstr
On referendum day, they ran a campaign which encouraged people to tweet how they voted and, in return, their profile picture will change accordingly. This was not uncontroversial to some Twitter users, who took issue with the choice of the Union Jack (for Out voters) vs the EU flag (for In voters), but nevertheless, many people declared their votes in this way.
Show your support with a custom Profile Flag Filter for the #EUref - what will you vote for? #iVoted👍 🇬🇧 🇪🇺 https://t.co/qMZda1tKh8— Brndstr (@Brndstr) June 23, 2016
I found over 14,600 tweets mentioning @Brndstr in the 715 thousand original tweets we collected on June 23rd. I only limited the analysis to original tweets (i.e. excluded retweets and replies), since I wanted to study distinct, self-declared #Leave / #Remain intentions.
Inspection of a random sample in our Mimir Prospector dashboard showed all tweets had a set pattern, which made it trivial to distinguish #Leave and #Remain votes.
In particular, all #Leave tweets started with: I #VoteOut for the #Brexit #EURef vote with @Brndstr. All #Remain tweets started with: I #VoteIn for the #Brexit #EURef vote with @Brndstr
I used two Mimir queries with those texts, and found 6296 #VoteOut tweets and 8342 #VoteIn tweets. Thus, based on @Brndstr activity, one could hypothesize a #Remain majority.
#Leave / #Remain Trend Based on Full-Text Search
First, I searched for tweets containing "I", "voted", and "remain", within an 8 word window. This returned 14,665 matching tweets and upon manual inspection of the top 30 matches, I observed only 2 tweets which did not disclose the actual vote of their poster. Therefore, I considered this a sufficiently accurate query.
The corresponding "I", "voted" and "leave" query returned 11,046 matching tweets, i.e. #Leave votes were outnumbered by #Remain ones again.
These statistics are in line with the findings of other studies of Twitter #EUReferendum posts. For instance, the #EURef Data Hub (by the Press Association, Twitter, and Blurrt) showed Remain tweets dominating over Leave tweets on Jun 23rd, but not on 22nd and earlier, or (unsurprisingly) since.
It must be noted that, similar to the Ontotext study, the #EURef Data Hub statistics are derived from tweets referencing either the Leave or Remain campaigns, but not necessarily showing explicit support or voting intent.
However, as discussed in my earlier post, if we were to try and draw conclusions on the likely outcome based on tweets alone, then we need a more reliable Leave/Remain sample, indicative of actual support/self-declared voting intentions.
So now let's see if the same trend is present there.
#Leave / #Remain Voting Intentions Based on Our Classification Heuristic
I applied our classification heuristic for reliable identification of #Leave/#Remain posts to all tweets posted on or after 13:00 BST on June 22nd, but before voting closed at 22:00 BST on June 23rd.
As a result, I found just over 100 thousand tweets from 22nd: 39 thousand advocating Remain and 61 thousand - Leave.
On June 23rd, as Twitter activity picked up significantly (also observed by #EURef Data Hub), I found 291 thousand matching tweets. Unlike other studies, however, our voting intent heuristic identified 164 thousand tweets advocating Leave and only 127 thousands advocating Remain.
Therefore, even though voting tweets from @Brndstr and tweet volume statistics from #EURef Data Hub both indicate that Remain was dominant, this trend wasn't supported in our voting intention sample.
Now let us examine the trends over time, separately for original tweets, replies, and retweets.
The graph below shows that indeed #Remain tweets were dominant in the early hours of June 23rd, but not before or after. What is particularly interesting is that #Remain tweets start to fall sharply from around 4pm, whereas #Leave ones pick up sharply a little later. By the time polls close at 10pm, tweets advocating #Leave are more than double the ones supporting #Remain.
Reply tweets show a largely different pattern (see graph below), where replies advocating #Leave are consistently more than those advocating #Remain (at times up to 2.5 times more). This is a trend which we observed also earlier in June. This indicates that #Leave advocates were much more engaged in the Twitter debates, than the #Remain ones.
It should be noted also that the trend observed in original tweets in late afternoon and evening of June 23rd is also evident here, i.e. replies advocating #Remain start to fall, while replies advocating #Leave increase.
Lastly, I show below the trends in re-tweets, where again #Leave advocates dominate the debate, by re-tweeting much more than #Remain ones. Again, I already observed this trend earlier in June.
What Have We Learnt?
Having looked at tweets on 23rd, using @brndstr and “I voted XX” both gave Remain a majority over Leave, but using our classification heuristic, the opposite was true (i.e. Leave was the more likely winner).
Given the conflicting evidence based on the same set of tweets, it is easy to see why others failed to predict the overall majority correctly.
I must also highlight here that my own analysis was never aimed at being predictive. Instead, I am trying to understand how people engaged, debated, and wrote about the referendum on social media.
In particular, as the referendum clearly showed, older voters tend to vote in higher proportions than young ones and thus, they were those that ultimately determined the overall outcome. That older generation, however, is well known for being under-represented on Twitter, and also probably less aware of @Brndstr and similar services, which explains why these gave the wrong trends.
In future research I would like to explore whether representativeness on Twitter is the full story, and whether this matters for political discussions. Do the younger generation actually talk more or less about politics than the older generation? Also, older people aside - were Brexiters (i.e. people supporting Leave) over- or under-represented on Twitter, as compared to Bremainers (i.e. voters supporting Remain)?
As part of subsequent research, I plan to also collect a gold standard of human-annotated tweets where people will be asked to mark tweets indicating actual support and voting intent separately from tweets, which simply mention the Leave/Remain campaigns. This will enable me to quantify how the different sampling strategies affect the accuracy of voting trends over time.
Thanks to:
Dominic Rout, Ian Roberts, Mark Greenwood, Diana Maynard, and the rest of the GATE TeamAny mistakes are my own.