Over the past two years Twitter's Public Safety team has been releasing caches of accounts it believes to be part of state-backed information operations. Australian think tank ASPI (2020, 2019), Stanford's Cyber Observatory Center (2020) and startup Graphika (2020, 2019) have done admirable jobs analyzing the Chinese government's handiwork. I've spent the past few months in lockdown teaching myself some python data science techniques and figured I'd try my hand at supplementing their research in the context of renewed fears of Chinese influence on TikTok and WeChat.
My conclusion in brief:
China has no idea how to run a Twitter network and does not do a good job amplifying its message with insincere state-run accounts. The content it puts out is too hidebound by prescribed talking points and suffers from a general lack of understanding about how to operate in foreign cultural environments.
Using purchased accounts with large follower counts whose followers couldn’t care less about politics, much less speak Chinese or English, is the Chinese operation’s most commonly used but least successful tactic.
New strategies like paying YouTubers and technology like GPT3, however, could potentially change the game. However, unless the Chinese operators get comfortable with letting these accounts run free, these tacks are unlikely to have much success either.
I know these charts don’t have the best formatting but please bear with me. This is my first ever data science project and I ran out of time before the election to clean stuff up.
China's Hopeless Twitter Accounts
Before I start, one major limitation of this research is that Twitter has not released any information as to how it identified these accounts. We really just have to go on Twitter’s word that they didn’t overreach and catch legitimate accounts up in the dragnet. It’s more likely that they underreached, however. Joey Goodman created a nifty ML algorithm that identified thousands of likely IRA accounts which may have escaped Twitter's net, so the same is likely true for Chinese ones.
The first thing that immediately stands out is the lack of 网感, 'feel for the internet', of the Chinese government's covert tweets. To set the standard, let's first take a look at Russia's IRA, the 'industry leader' in the space. As ASPI writes:
The Russian effort displayed well-planned coordination. Analysis of IRA account data has shown that networks of influence activity cluster around identity or issue-based online communities. IRA accounts disseminated messaging that inflamed both sides of the debates around controversial issues in order to further the divide between protagonist communities. High-value and long-running personas cultivated influence within US political discourse. These accounts were retweeted by political figures, and quoted by media outlets.
You can get a sense of their sophistication just by the topics they tweet about. This chart is for their 'right-leaning' twitter network, demonstrating just how well they know how to push Americans’ buttons.
Graphs from FiveThirtyEight
Compared to what Russia has shown itself capable of, Chinese efforts are amateur hour. The vast majority of tweets generate zero engagement in the form of likes or retweets, much less retweets from the President, his press people or his children. As evidenced by the size of the accounts that were banned in June 2020 compared to those the year prior, it seems that Twitter has gotten savvier at spotting accounts earlier and has effectively neutered the Chinese network's potential influence.
Here's a chart of how the latest 2020 batch of tweets performed. Note that the color represents a log scale for how many tweets each square represents. The x-axis corresponds to likes and the y-axis retweets.
This graph really doesn’t do the ineptitude justice. Says Amal Sinha, “Out of 350k tweets, 98.22% had no retweets and no likes. And less than 100 tweets have >10 retweets.”
The outlier that got four thousand likes (though suspiciously zero retweets) reads:
#香港Fully support the Hong Kong Police Force to enforce laws strictly, stop riots and control chaos, maintain Hong Kong's safety and stability, and revive the glory of the past #香港
The double hashtags are pretty odd. Then again, I’ve never tweeted something that got 4k likes, so maybe they know something I don’t…
The 2019 dataset filled mainly with purchased accounts had much higher follower accounts but incredibly poor engagement. The top account that had 300k followers only had one tweet with at least 150 likes. This is the best the Chinese could do in 2019 before Twitter busted the original network.
Even with all those followers, no-one read their tweets. In the chart below, all the splotches above 2000 likes represent individual tweets, meaning that out of the million plus tweets there were only 250 that had more than 2000 likes. Every tweet over 1000 likes in 2019 was promoting porn.
Above is a pg13 sample…it gets much worse. Do recall that porn is banned on the mainland.
Interestingly, some accounts continued to occasionally run spammy content even after they sent their first political tweets. I’d like to think that some folks inside this operation were running a side hustle to compensate for their low salaries (a common practice in Chinese bureaucracies). After all, there’s an active market for Twitter followers and bot accounts on Taobao.
For comparison, the best-performing Russian account in its history generated six million retweets, of which just a thousand came from Russian-run accounts. And this was not an outlier: overall the Russians managed to run seventeen accounts that racked up 30k followers splitting evenly between right-wing and left-wing accounts, and even got people to turn out for in-person rallies across America. As Symantec writes,
It was a highly professional campaign. Aside from the sheer volume of tweets generated over a period of year 'TEN_GOP,' which pretended it was run by Tennessee Republicans, racked up 150,000 followers and ovs, its orchestrators developed a streamlined operation that automated the publication of new content and leveraged a network of auxiliary accounts to amplify its impact.
I tried to make the above heatmap chart with the Russian network data but downloading it from the Twitter repository gives you a mysteriously corrupted file…
I agree with the ASPI and Stanford's assessments that the vast majority of the Chinese accounts in the 2019 batch were purchased. Below are three wonderful representations of this. First is the language the accounts used to tweet. There's a giant spike in Indonesian in 2016 and 2017, but once 2018 and 2019 hit these accounts magically start learning English and Chinese and caring about Hong Kong. You can also see by looking in the tweet content itself that many of these accounts used to run marketing schemes for selling watches, porn, and most hilariously ways to get RMB out of China.
Secondly, starting in 2020 you can see aggressive dips during Chinese holidays.
These accounts clearly take weekends off, even though recall that during the fall of 2019 most HK protests actually occurred on weekends.
The third dead giveaway is the time of day the accounts are tweeting. The accounts in these datasets generally started tweeting around 2012. But as late as 2017, the time at which tweets are sent bears no relationship to a Beijing workday.
Yet slowly but surely, as the accounts go from tweeting about random stuff to sounding more and more like Chinese propaganda, you start to see the tweet volume line up more and more with China Standard Time.
By 2019, when these accounts are running wild trying to influence English and Chinese-speaking opinion about the Hong Kong protests, the effect is very clear.
By 2020, they're not even trying.
Gotta love that lunch break! What's more, the handful of tweets that were sent out during the night shift were universally spammy.
The map below shows the frequency of Chinese-language tweets in the 2020 dataset, with locations plotted by the self-described location in Twitter. Many of these locations were likely chosen by whoever first made the account, which often was not likely someone inside the Chinese operation.
There are locations for primarily Mandarin-speaking accounts.
And here's the same looking at English-language tweets.
I do find it curious that given all the other influence operations conducted in New Zealand and Australia, there are literally 0 accounts that claim to be from Oceania, particularly given that for a recent public tender aimed at increasing the follower count of state media on Twitter, one of the requirements was that "among the new followers, at least 8% need to come from North America, Australia, and New Zealand." Of course, accounts targeting these geographies may just be in different networks not yet uncovered by Twitter.
The coolest thing I made was an interactive map that allows you to explore the dataset. You can play around with one that features the 2020 accounts here. Do note that the links will bring up blank pages on mobile as they only work on desktop browsers. Among this map’s features, you can mouse over different points to see what the twitter accounts near your hometown were tweeting, and by changing the length of the bar on the bottom you can get a nice visualization of the tweets over time.
So What Were They Tweeting, Anyways?
ASPI created a nice representation of their favorite topics.
I scrolled through 15,000+ tweets Chinese and English to get a sense of the vibe and pick out some for your amusement and edification. Most of the COVID-19, Guo Wengui, and Hong Kong tweets can be construed as attempting to shape public opinion on domestic Chinese politics.
Typical propaganda posts:
"Justice may be late, but it will not be absent. The dream of "Hong Kong independence" will eventually be in vain. #HK"
"Street violence will not achieve any purpose. It will only lead to wider and more firm support for anti-violence."
And my personal favorite:
"After watching for over fifteen minutes, I don't think that RuPaul's Drag Race has anything to do with automobiles"
There were no hits on google for this sentence, it looks original!
The user descriptions were often hilarious:
"Environmentalist, Achiever, Grammar Nazi, Golf Junky, Pantless Jogger. Can you blow my whistle?"
"Dollar Store Owner, Saviour of Mankind, Citizen"
"Upholsterer, Sawmill or timber yard worker, Heavy and Tractor Trailer"
"Travelaholic. Beer fanatic. Unable to type with boxing gloves on. Food specialist. Bacon buff. Evil introvert."
"Zombie guru. Writer. Communicator. Amateur bacon expert. Typical gamer. Organizer."
At least half of the descriptions were corny sayings about love that mostly seem to be pulled from this website.
"Love - as а war... It is easy to begin... It is difficult to finish... It is impossible to forget!"
"Love is a fruit in season at all times, and within reach of every hand"
"Come live in my heart and pay no rent."
My favorite description had an echo to HK and Taiwan:
"If you love something set it free; if it comes back, it's yours. If it doesn't it never was."
There was even some chloroquine content.
Occasionally they tweet random 19th-century poetry and literature. For instance, this totally out of context Edgar Allen Poe line: "A panorama more deplorably desolate no human imagination can conceive. When I recovered myself a little, however, my gaze fell instinctively." I also came across nonsensical Jane Eyre and Bram Stoker's Dracula excerpts.
The focus on 19th-century literature is probably just because this stuff is out of print and easy to find a corpus for, but I'd like to also think that it reflects the broader place that this era of western classics holds in the Chinese literary imagination.
I feel for whoever is the underpaid employee working to write these English language tweets.
They probably did not expect when they were studying hard on English for the gaokao that they'd one day end up using their hard-earned skills not to travel the world and read Shakespeare but write tweets for a totally ineffective information operations effort. I hope that they were able to convince their bosses that reading Charlotte Bronte and watching American reality tv while on the clock was work-appropriate.
Unlike the English tweets, which frequently post spammy things that have nothing to do with politics, I've found very little in the Chinese corpus that isn't railing on about Taiwan, Hong Kong, COVID or Guo Wengui. As a non-native speaker, I'll defer to ASPI and Stanford's analysis on analyzing linguistic subtleties. One of their more interesting findings was how lots of the Cantonese writing seemed created through translation software, which goes to show just how half-baked this entire effort was.
This operation went out of its way to hire English speakers but couldn’t bother finding a handful of the sixty million Cantonese speakers on the mainland.
Performance of Chinese language tweets in the 2019 corpus. X-axis likes, Y-axis retweets
I wanted to make a chart comparing it to Russian accounts, but the file containing the data on Twitter's site is mysteriously corrupted…suffice to say that they've had much, much more success.
China’s performance is more comparable to Iran than Russia. Same axes below.
Many of the higher quality pro-HK police and 'Go Wuhan' images were lifted from random Weibo bloggers and other domestic propaganda organs.
However, the Guo Wengui content, which seems to be produced internally by this operation, is underwhelming. Just take a look at this mess:
走投无路 --He has nowhere to go
自掘坟墓 - The man is digging his own grave
这是我最后的归宿 - This will be my final resting place
I kind of like this one though. The Chinese is just his name.
Perhaps most concerning for American observers are the Taiwan-related tweets. The Chinese government is certainly not above attempting to influence Taiwanese elections, and has proven effective in the past. There were over a thousand tweets that mentioned Tsai Ing-wen by name around the Jan 2020 election. Interestingly, much of this content was simplified as opposed to the traditional Chinese script used in HK and Taiwan.
"#王立强 #蔡英文 #台湾选举
Russian, Japanese, German and Dutch
While the vast majority of the propagandistic content is in English and Chinese, the dataset overall contains dozens of languages. I asked a few friends to take a look for their impressions:
The Russian accounts seem to be entirely bought and were never used for propaganda purposes. According to UK-based Russia analyst Charles Lichfield, it's very unlikely that non-native speakers could pull off the type of content seen in the language. Said Charles, the tweets were "fairly genuine-looking harmless bullshit" featuring up to date slang, cultural references, and meme-y content. A typical tweet:
Original Tweet: Боль - это когда после трудного дня приходишь домой и вспоминаешь что есть вкусняшка А потом оказывается что её кто-то съел
Translation: Pain: when you come home remembering that there's something super tasty at home, but it turns out someone's eaten it.
Princeton PhD candidate Ayumi Teraoka observed much of the same in the Japanese language content. She found quite a few recipe tweets and a few sexually vulgar ones, but nothing remotely political. The same went for the Dutch tweets, according to my old classmate Montijn Huisman (he found a lot of soccer scores).
Sebastian Reil looking into the German tweets found mostly spam and a ton of tweets supposedly claiming to be located in 'Gayborhood.' There were a few political tweets as well, mostly about Eastern Lightning Cult, a persecuted Christian sect founded in China in the 90s, indicating that at least someone on the Chinese side of this operation speaks German, or cares enough to tell their contractor to tweet political stuff.
Original Tweet: Eastern Lightning Cult ist Kult, kein Christentum, sie umbringen die Fußgänger als Teufel,
Translation: Eastern Lightning Cult is a cult, no christianity, they kill pedestrians as devils
Says Sebastian on the rest of the language: "of the actual German tweets many are just a few words - so no whole sentences. Then there are a few longer ones that are actually grammatically correct full sentences. The only tweets with a political valence were on the Eastern Lightning Cult: for them, the grammar was surprisingly complex and correct - but the content and word choice sometimes so strange that it seems fake."
If you speak Arabic, Bahasa, Portuguese, Spanish, or Italian and would like to take a look at the tweets in your language please do get in touch.
Where CCP Propaganda Goes From Here
I have mixed feelings putting this whole operation on blast. If whoever was running this were smart, they’d take to heart the critiques of western observers, shed the current counterproductive KPIs and to learn best practices from the Russians.
I wonder whether, particularly for English-language content, if the talent is there in mainland China for this sort of thing. To do convincing information operations, you really need to have spent a lot of time ingesting content in the country you’re supposed to be producing for. Most Chinese nationals I met with who had strong English didn’t necessarily spend a ton of time on Reddit and Twitter, but rather just watched a whole lot of 2 Broke Girls. And if Beida kids aren’t spending their free time on Twitter, I can’t imagine whoever’s getting paid 2000 rmb a month for this position is either. More westernized Chinese who studied in the US are way too wealthy and ambitious to be interested in this sort of work. What’s more, as @BadChinaTake speculated, “I think the semblance of an electoral system also means your average Russian has a strong grasp of partisanship/party ID and how that can be exploited, which is key.”
Language ability and cultural acuity isn’t the only thing at play here, as the Chinese language content also lacks originality and flair. With 500 million registered Weibo users, there’s certainly a deep pool of talent to draw from familiar with a platform that works much like Twitter. My sense is that instead of being able to freely riff on the news of the day, they seem to have to stick closely to talking points much like state media does. The IRA seems to be more aiming at mostly stirring up shit in the US instead of constantly banging on about the illegitimacy of America’s Russia sanctions, content to only occasionally seed content more directly related to Russian interests. Instead, as @THoCPodcastAlt says, “Chinese ‘infiltrationists’ are still largely stuck in a particular sloganeering-gear which works great at home but rings so utterly false abroad.”
The operations revealed in this Twitter data dump don’t show any direct attempts to influence electoral politics in the US. However, I don’t think this is reason enough to discount the possibility. Most of the aggressive political activity happened starting in 2019. Since then, US-China relations have taken a dramatic turn for the worse. Given the activity surrounding the Taiwan presidential election, it’s clear that this operation is willing to cross this red line. However, I don’t think it is obvious to the Chinese government whether a Trump or Biden administration would better serve their interests (more on this topic in a prior ChinaTalk edition here).
Pro-China YouTubers + GPT3 As Research Extensions
What should western social media networks be on the lookout for as China seeks to improve this operation? Perhaps China might start outsourcing its tweets to third countries, much like Russia’s IRA seems to have done by contracting tweeters in Ghana. However, this seems like a reasonably straightforward type of thing to catch.
The Chinese seem to be using a combination of carrots and sticks to alter the sort of content coming from western YouTubers about China. Compared to the amount of English-language vlog-style content coming out of Japan and Korea, the bench of comparable China-based YouTubers is extraordinarily thin, giving the few western creators based on the mainland outsized impact when it comes to influencing foreign opinion. The two largest youtube accounts, Laowhy86 (500k subs) and Serpentza (750k subs) have packed up their bags and left the mainland thanks to harassment and threats from the police. Before the stick came out, however, the two popular channels were contacted and offered cash to tone down their criticism:
“SerpentZA has repeatedly, in several videos over the years, claimed to have been subject to multiple attempts by organizations in China to take over his channel, guaranteeing him huge Chinese audiences, lots of money, and even logistical services. Laowhy86 elaborated in an email, “We were offered compensation to play down some of the western media claims that Tibet and Xinjiang CCP governments are oppressive towards their citizens, and even offered to fly us out to shoot some positive videos promoting tourism in the region.”
Other YouTubers have also reported receiving offers of money to create pro-Chinese government content. In the past few years, a suspiciously coordinated bunch of young, white, male vloggers mostly based in Shenzhen spout the party line. To sit down, watch their videos, and a deep dive on the dreck they produce, I’d need to be actually getting paid to do this research. But I’d love to read the results of such a thing! Knockyourheart out.
Another potential gamechanger, at least on Twitter, is GPT-3. A few skilled operators with the freedom to be creative would be far more effective thanks to the amplification of GPT3-proposed content. Of course, today’s AI isn’t able to create compelling posters and video content. To do more research into what this would look like, I’d need a GPT-3 key…
Thanks so much for making it this far! Do consider supporting ChinaTalk financially.