How the Irish Lit AutoTweets Twitter Account Is Generated

If you're looking for non-technical information about Irish Lit AutoTweets, you should go here, instead. The document you're reading right now contains technical information about how the tweets are generated.

The Short Version

I exported all of the course-related email that I wrote during the quarter into a folder on my local hard drive, saved the relevant web pages as text in the same folder, copied and pasted the course Twitter stream into a text file in that folder, and saved a text-only version of student paper comments (yes, into the same folder). Then I massaged the data in various ways to make it more amenable to text processing. I generated compiled probability data from the corpus using DadaDodo, saved the compiled data, opened a new Twitter account, wrote a script to generate sentences based on that compiled data and upload individual selections to that Twitter account, installed it as a cron job on my laptop that runs five times a day, and wrote a couple of web pages to describe what happens. You're reading one of those web pages now.

You can download and use the script if you'd like, subject to certain restrictions; read on for details. If you find it useful, I'd be grateful if you'd Flattr me.

The (Much) Longer Version

All of this happens under Bodhi Linux. I tend to use the MATE desktop environment and Tilda terminal emulator, though that's not really relevant. Really, with some minimal modification, this should work on more or less any POSIX operating system.

I save all of my course-related emails, sent and written, for a quarter in a specific folder in Mozilla Thunderbird, my email client. I sorted the folder by sender and exported the 460 messages that I wrote using the ImportExportTools extension for Thunderbird. I saved all of the website documents as plain text with Firefox in the same folder, copied and pasted the Twitter stream into a text file that got saved in folder, and exported the document in which I wrote summary comments as plain text (yes, into the same folder). Then I concatenated all of them using the standard POSIX cat command to produce a the first draft of the textual corpus.

Then there was a long and boring period of searching and replacing to eliminate URLs, student names, smart quotes, carriage returns, double spaces, etc. etc. etc. Really, it's a pretty tedious process. Searching, replacing, and other editing were largely done with Bluefish, which is a pretty good text editor that supports searching for regular expressions and understands POSIX backslash sequences. I installed DadaDodo (sudo apt-get install dadadodo under Bodhi Linux, because it's an Ubuntu derivative).

I moved the corpus (150.txt, because the course for which I did the work was English 150) into a folder, created a symbolic link with a short name (/150) on the root level of my drive to save myself some typing, and moved to /150 in my terminal. Once that was done, I compiled the textual corpus into a useful statistical set with:

dadadodo -w 10000 -o chains.dat -c 1 150.txt

After that, it was a matter of writing and testing a script that generates chunks of text that are within acceptable length parameters and sends them to Twitter. I signed up for a new Twitter account for the purpose and edited the profile to point back here. It used to be easily possible to post to Twitter from a terminal (emulator) using programs like cURL (here's how you used to be able to do it), but Twitter stopped allowing basic authentication and requiring OAuth with API version 1.1, so it's now a pain in the ass to update Twitter from a script necessary to use a more full-featured terminal-based client. Doing a bit of research led me to choose TTYtter, more because of its ease of use than for its amusing name.

This involved saving the TTYtter script to folder that's in my $PATH, then making it executable:

chmod +x ~/.scripts/ttytter

Running TTYtter for the first time causes it to go through its authentication procedure and create a token in my home directory called .ttytterkey. I moved this to /150 and call TTYtter from the script with -keyf=/150/.ttytterkey so that I can also use TTYtter without the -keyf switch to manage my personal Twitter account. (Yes, there are other ways to handle this issue, but this method is quick and it works.)

Once all the pieces were in place, all I needed to do was to put the finishing touches on the script, which is just a quick Bash script that accomplishes a few things:

  1. Calls dadadodo, using sed to strip the leading whitespace that dadadodo tends to put at the beginning of the generated text.
  2. Checks to see if the resulting text is within acceptable length parameters. Of course, Twitter's famous 140-character limit is a hard upper limit that winds up being kind of annoying here, but I also enforce a minimal limit (currently 46 characters) because the very short tweets that DadaDodo generates tend to be rather dull, in my opinion. (Early tests of the script produced the tweet History. about one out of every five times.) If the automatically generated tweet isn't an acceptable length, it just keeps trying until it generates one that is.
  3. Sends the resulting tweet of acceptable length to Twitter using TTYtter.
  4. Outputs the resulting tweet and its length to standard output. I primarily found this helpful while writing the script, but decided to leave it in because I may want to run the script manually from the terminal emulator from time to time.

You're welcome to download and adapt and use the script yourself for your own purposes if you'd like, subject to certain conditions; it's licensed under a Creative Commons Attribution-ShareAlike 4.0 International license. You'll need to make it executable (chmod +x generate.sh), of course. If you find it useful, I'd be grateful if you'd Flattr me, though this is (of course) not necessary. I'd love to hear about it if you do anything interesting with it, or if you have suggestions for improvements. I'm happy to answer questions, but can't provide extensive support or bring you up to speed if the explanation here is unintelligible to you.

Once I'd done that, I installed it as a cron job (crontab -e) so that it runs periodically (I decided on five times a day). Here's the line from my cron file that I use as of the time of this writing:

45 4,9,12,17,22 * * * /150/generate.sh

Which is to say that the script runs on my laptop at 4:45 and 9:45 a.m. and 12:45, 5:45, and 10:45 p.m. Provided that my laptop is on and connected to the Internet, of course.

I also installed it as an anacron job so that it runs at least once a day if my laptop is turned on. Here is the relevant line from /etc/anacrontab:

1 20 cron.daily /150/generate.sh

Some Notes on the Script

I often specify full paths in the script because cron jobs may not have a properly set-up environment that guarantees environment variables, such as $PATH, are properly set.

Switches for the the dadadodo invocation:

-c 1
Just generate one sentence.
-l /150/chains.dat
Don't use the original corpus; use the manually compiled statistical data instead. This is faster, though the script still runs fairly quickly without it.
-w 10000
Use a really wide text wrap amount to keep DadaDodo from wrapping text.

Switches for the ttytter invocation:

-hold
If the Twitter rate limit is exceeded, keep trying until we can post. This would probably never happen with a five-time-a-day cron job, but is handy if I wind up posting manually from a terminal.
-keyf=/150/.ttytterkey
Use the TTYtter key specified in the absolute path rather than the default TTYtter key in my home directory. This allows me to use TTYtter to update my personal Twitter account when I'm not running this script, as well.

Reservations About the Current Setup

All of which is to say, again, that I'd love feedback if people have thoughts or ideas about how this could be done better.

Change History

Creative Commons License
Code on this page by Patrick Mooney is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Based on a work at http://is.gd/IrishLitAutoTweetsTechnical.