Ever wanted to try being an anonymous author? It’s hard work, even for someone without the celebrity of a J.K. Rowling or an Alexander Hamilton — who both tried and failed. Doing it on the internet is helpful, but even if you’re careful to cover your digital trail every step of the way, there’s one small problem — your writing style can actually give you away.
But fear not aspiring Publius, now there’s a computer program to help hide your textual tells.
A group of Drexel students took it upon themselves to spruce up a program called Anonymouth that’s been in the works in Drexel’s Privacy, Security and Automation Lab for quite a few years. I wrote about this anonymity assistance tool a while back, along with its adversarial partner Jstylo — which can identify an author by analyzing and comparing their writing style.
So I thought I’d put the new, user-friendly version, called Worden, to the test. Here’s a link if you want to try it out too: http://www.worden.info/
First it asks if you want to test your own document or someone else’s. To give it a real challenge, I plugged in an op-ed that I helped a professor write — to see if Worden would pick up on my style prints and attribute it to me or if it was sharp enough to see through them and attribute authorship to an “other.”
Then it prompted me to upload 10 examples of my writing. So I fed it 10 press releases I’d written.
After about three minutes of churning, Worden correctly concluded that I was not the main author. It also gave me about a dozen suggestions for technical and stylistic changes I could make to my writing if I wanted to disguise my documents and become a truly anonymous author.
They ranged from pretty reasonable ideas — like using more prepositions and subordinating conjunctions and fewer proper nouns (which makes sense since press releases are chock-full of Dr. So-and-so’s and the drier ones can be slim on the trappings of good storytelling) — to more bizarre suggestions like using more 13-letter-words and the letter combinations “ti,” “ic,” and “ati.”
And periods…apparently, my constant use of periods is a dead giveaway.
Here’s Worden’s prescription for my anonymity (use less of the red stuff and more of the green stuff):
While I don’t know how useful these scrabble challenges will be in my line of work, I suppose if I were churning out incendiary texts to fuel a revolution (ala Hamilton), rather than news releases, I wouldn’t mind throwing in an extra “automatic,” “enticingly” or “investigatory” (yes, I counted, it’s 13) as a precaution.
This was a fun exercise, but I thought it might be even more interesting to analyze some other people’s writing and to really put the program to the test and see what else it could do.
What if we weren’t trying to use it to identify an anonymous author, but rather to compare the writing of known authors?
So, after brainstorming with their advisor, Jeffrey Salvage, we asked the creators — a fantastically obliging group including Travis Dutko, Marc Barrowclift, Corey Everitt, Jiakang Jin, Eric Nordstrom and Ivan Orrego — to take a look at some political speeches from this year’s primary season. And just for fun we had them to compare the style of today’s political speechwriters to one of their aspirational figures — uberpolitician of yore: former President Ronald Reagan.
[Note: this is really not part of Worden’s intended use, but for the sake of our own little experiment, the team tweaked a few things under the hood to let us see who emerged as the most “Reaganish” from a field of would-be presidential nominees. So, unfortunately, you can’t yet try this at home folks.]
To make our test work they fed Worden some Reagan speeches — seven, to be exact — ranging from his inaugural addresses to the classic “Mr. Gorbachev, tear down this wall” speech at the Brandenburg Gate.
They chased it with seven transcripts each from 10 of the Republican primary candidates, and to spice it up they gave it some Clinton and Sanders orations as a kicker.
By comparing each of the seven Reagan speeches, one at a time, to the series of transcripts from the field of candidates and averaging their results, the team generated individual rankings for the “cumulative probability” of authorship (noted as a percentage). So they’re basically asking the question: If Reagan wasn’t the “author” of this speech, who is the most likely suspect to have written it? — or, for our purposes: Which of today’s candidates delivers the most “Reaganish” speeches?
And the results of our completely unscientific study, performed by an incredibly scientific tool:
What can we tell from this? Well, it appears none of today’s speechwriters were really channeling Dutch while putting words in their candidate’s mouths, at least not according to Worden.
Of course we have to take these results with a healthy dose of salt because Worden is comparing the fine-grain, technical aspects of the speeches — not the meaning of their content. But, for what it’s worth, Reagan and Jeb Bush’s speechwriters both use “more” and the letter combination “le” about the same amount. And, perhaps not surprisingly, Trumpspeak has the least in common with Reagan’s Brandenburg Gate oration.
While these little tests were driven more by curiosity than the need for anonymity, you could probably imagine how a program like Worden would be a powerful tool (in the hands of brighter people than me) for analyzing volumes of text created in everyday, online interactions — and advising word-wielding dissidents on disguising their messages or, conversely, helping authorities build a dossier on cybercriminals.
In summation, my final word on Worden (after consulting with Worden): gesticulation