Strange tweets? Strange coincidence? Or strange individual?
Former Philadelphia 76ers General Manager Bryan Colangelo has been fielding a lot of questions about his Twitter behavior, all stemming from the findings of a computer algorithm.
An anonymous source tipped off a sports writer that Colangelo might be using a number of anonymous Twitter accounts to criticize players and other GMs and to defend his own decisions. The tip resulted in a story in the NBA blog, The Ringer, and touched off a controversy that has engulfed the team this week.
The tipster claimed to have used a data analysis tool that linked five accounts to one user, who he claims is Colangelo, by comparing the writing style of their posts and patterns in who they were following.
But are programs like this reliable enough to make such a claim?
Definitely — according to a pair of researchers at Drexel University’s College of Computing & Informatics.
In a recent Philadelphia Inquirer story, College of Computing & Informatics professor Rachel Greenstadt, PhD, helped explain how these programs work. Greenstadt’s Privacy, Security and Automation Lab created one such program, called Doppleganger Finder, that applies machine learning algorithms for textual analysis to identify anonymous authors. She has been using the tool to analyze the online behavior of cyber criminals, and similar programs were famously used to identify J.K. Rowling as the author of a book called The Cuckoo’s Calling, which she wrote anonymously under a penname.
Programs like this use details such as the frequency of certain letter or word combinations, average length of sentences and paragraphs and various other attributes to “fingerprint” an anonymous author. They can then discern whether another bit of text was likely created by the same author, depending on how closely the profile matched.
So, in Colangelo’s case, with enough tweets, a program like Doppleganger Finder could have suggested to the tipster that those five accounts shared a common author.
Greenstadt’s College of Computing & Informatics colleague, Jake Williams, PhD, who, in related work, has developed software to identify robotic accounts on Twitter, suggests that the tipster probably had access to more than enough text from Tweets to a get a reliable result from the program they were using.
“When we were working on Twitter bot detections, we were able to see results with as few as 25 tweets,” said Williams. “Twitter’s API makes it possible to download a user’s 3,200 most recent public tweets, this is probably what was available to the tipster and would likely have been more than sufficient for the analysis to be suggestive.”
But, according to Greenstadt it would also require additional work to prove that those style characteristics were unique to that author.
As for the Twitter accounts said to be the work of Colangelo, Greenstadt said making the case would require more than just finding similarities among the five accounts. To validate the findings, the analyst would then want to show that the writing styles from those accounts were not only similar to one another but also were different from those in a sample of other posts on the same general topic of basketball, she said.
“You might want to compare a group of similarly followed and liked basketball personality-type people on Twitter,” she said in the Inquirer story.
To continue the investigation, Williams suggests there are a number of other characteristics that could help validate a common authorship of the accounts. His bot-spotter program uses both text analysis, looking at the characteristics of how humans naturally put words and phrases together, as well as behavior patterns of humans on Twitter.
These behavior patterns — things like the response time between messages, how frequently the accounts are following others and redundancies who they are following — could also be used to build the profile of an anonymous author.
Thus far, reporters have revealed some similarities in these behaviors across the accounts, most notably, three of them going from public to private after the reporter inquired about them with the team.
But discerning who is actually doing the tweeting will likely require some advanced sleuthing by humans to support the work of the computer programs that put them on the trail.
For media inquiries contact Britt Faulstick, assistant director, media relations, bef29@drexel.edu or 215.895.2617.