r/xpa: X & P's Adventures
An Experiment in Simple Data Aggregation

Overview

My partner and I take epic road trips whenever we get a chance. By epic road trip, I mean we travel hundreds of miles per day with only very general plans plans and try to get as many things done as possible in comparatively short periods of time. Often, she drives while I read to her or point my camera out the window. During a trip in late August and early September of 2015, during a ten-day trip to Colorado, I realized that our friends were interacting with our records of vacations in multiple other software ecologies: Facebook, Twitter, Instagram, Yelp, etc. etc. etc. It occurred to me that the data and media we generate about our vacations could be aggregated fairly easily so that it's collected in a central, canonical location for our friends, so that when they want to know how our vacation is going, why, they can just go to that place and find out. For us, too, it's a sort of auto-journal of what happened during our vacations, with optional manual annotations.

After pondering this in the back of my head for a day or so, I also realized I was already using everything I needed to do this.

Working with IFTTT

Everything that gets aggregated to arrives there via a service called IFTTT, a site whose name is an acronym for if this then that. IFTTT is a service that triggers and takes actions under certain flexible, user-defined circumstances; it's useful precisely because of how flexible its triggers and actions are. I was already using IFTTT for a number of things; it works by letting users create so-called recipes that link together social media accounts and other data sources. These recipes have forms like if I post a photo to Instagram, also post it to Flickr or if I post a tweet tagged with #fb, also post it to my Facebook account without the #fb tag or if the season changes on Mars, tweet about it for me. You can see the complete list of all the IFTTT recipes I'm using at the moment, if you'd like, or you can read IFTTT's own brief explanation of how they work.

IFTTT recipes link together channels, which roughly means services and data sources. Each service that can be linked together has a channel on IFTTT: there's a Facebook channel, a LinkedIn channel, a Twitter channel, etc; many social networks have channels on IFTTT. But channels can also be other data sources: there's an Android location channel on IFTTT that can detect, say, when I move out of an area or when I move into an area, so I can use that to trigger any number of different things (I use it, for instance, to remind me to turn off WiFi on my phone when I leave the house so unused WiFi doesn't drain my battery). Channels can have triggers (situations in which the recipe is activated and does something) or actions (things the recipe does) or (as seems to be most common) both; a recipe runs when (or shortly after) the trigger fires, and takes action, possibly based on data from the trigger channel. So, for instance, the Facebook channel has multiple triggers that, when they appear in the first half of a recipe, control when that recipe is activated: New status message, New status message with hashtag, New link post, New link post with hashtag, New photo post, New photo post with hashtag, etc. ... . The Facebook channel also has several possible actions that it can take if it appears in the second half of the recipe: post a status update, post a link, upload a photo. Each of these triggers and each of these actions can be linked to any other channel that a user might use in any way that s/he can think of a way to make useful in a recipe. The data links from the trigger to the action are called ingredients by IFTTT: that is, if you're using Facebook as your trigger for the recipe, and the recipe triggers when, say, you post a new status update, the ingredients IFTTT has for that trigger currently include From (your full name), Message (the full text of the status update), and UpdatedAt (the time at which the status update was made), each of which can become part of whatever happens in the second half of the recipe.

Once recipes are set up, and assuming that they haven't been disabled, they run in the background, usually about every fifteen minutes, which means there may be a delay of up to about that long before the action is taken. Of course, this would make IFTTT unsuitable for anything that absolutely had to happen with a shorter delay, but it's perfectly fine for what I want to do with r/xpa. If you ever want recipes to be checked immediately, you can log into IFTTT and manually trigger a check.

Channels have to be activated before they can be used: you have to give IFTTT permission to read and post to, say, your Twitter account. Once they're activated, they stay activated until you deactivate them: you generally only need to activate a particular channel once. (If something you do somehow deactivates a channel, IFTTT is kind enough to send you an email letting you know that this is the case so that your recipes aren't failing silently. If you want to check how your recipes are doing, you can check their logs in IFTTT, too.)

All in all, I really like IFTTT. It's a clever and useful service that can be flexibly configured to do a huge number of productive things, and it's usable for free. I'm using it already, so this project is a natural extension of things I'm already doing. And my experience with them on Twitter shows that they engage in meaningful discussions with people who engage with them on social media, rather than just engaging in PR-building horseshit, as so many companies do.

Aggregating on Reddit

Since I've already activated many channels for my various social media accounts, all I really need to do is to decide where to host the data I'm aggregating. Whatever source I use, I want it to have several features:

On reflection, then, it seemed to me, taking all of this into account, that the ideal place to aggregate data was Reddit. It's easy to set up a subreddit (which is what Reddit calls individual fora or communities), the TOS is acceptable to me, and it's a service that my partner and I both use already — so do some of our friends, for that matter. It's got an IFTTT channel that gives me plenty of control, so it's easy to link up with other data sources in IFTTT. I've never before been a moderator of a subreddit, so it gives me a chance to play with the Reddit moderation features, and I can make my partner another moderator. The overview for a subreddit list the most recent articles by default, titles can link to individual articles, there are discussion features, I can limit posting to myself and my partner while allowing for other people to participate in discussions, and there can be thumbnails on posts. And I generally quite like Reddit's privacy policy: signing up is quick and involves little disclosure of personal data if someone wants to comment, and Reddit is explicitly tolerant of users having multiple accounts, users creating throwaway accounts, and anonymity in general — in fact, their TOS specifically prohibits linking a user's Reddit account to a real-life identity or other online identity, a fact which I like a lot. All in all, it's a more or less perfect location for my purposes.

But, then, part of the reason that it's perfect for my purposes is that I wind up on Reddit fairly often anyway, and so, if I aggregate data here, my attention will naturally be drawn periodically to anything in the data-aggregation process that needs me to look at it. So, reader, if you're following along and setting up a similar set of aggregators for your own data, Reddit may not be nearly perfect for you; it might be better to set up a blog on one of the blogging platforms that IFTTT supports. Pick something that will draw your attention toward itself periodically anyway.

And so I created a subreddit, r/xpa (which was, perhaps surprisingly, not already taken), to serve as the location for the aggregated data, filled out the necessary spots in the form, and, in a few minutes, was clicking the create subreddit button. I made it publicly viewable but restricted posting to moderators — which, you know, makes perfect sense in this particular case. Then I activated the Reddit channel on IFTTT and created a series of IFTTT recipes to pipe data to that subreddit.

Recipe Logic and Details

Each data source I'm going to capture for r/xpa needs to have its own recipe. There's a Facebook-to-r/xpa recipe, a Twitter-to-r/xpa recipe, an Instagram-to-r/xpa recipe, and so on.

Some of the recipes are selective in what they aggregate: they only trigger if the entry they're examining in the source data stream meets certain criteria; e.g., if it contains a particular hashtag (always #xpa or some simple variation on it, because I don't want to have to remember a set of mappings from services to necessary hashtags). This is the case primarily for services that I want to continue to use for non-vacation-related things during vacations or for services from which I may want to post (very occasionally) to the subreddit even when I'm not on vacation. Twitter is an example of both categories: On the one hand, so much of my online activity filters down to Twitter that this would result in a flood of irrelevant postings to the subreddit (and duplicates of existing postings that already came from, say, Instagram or Tumblr). Disabling all of this would be far more work than just making the trigger selective by requiring that a tweet need to include the hashtag #xpa to be piped to the sub. On the other hand, if my partner and I are sitting around at home, talking about where our next road trip will take us, and I want to say something about public about this conversation, Twitter is one of the places I'm likely to do so.

Here's a list of the selective recipes that I use:

Submit new text post by email with #xpa in the subject line

I can always create a new text post on the subreddit just by emailing it to IFTTT, now that I've activated the email channel and given it the email address from which I'll be posting. (Of course, I could always create a new text post just by logging into Reddit in a browser on my phone, but the GMail client on my slow old phone starts up more quickly than a browser.) I leave this on even when I'm not on vacation: I might use this trigger to write a post about an upcoming vacation, but the chances of triggering this recipe accidentally are practically nil (because I'd need both to send the email to a specific address and to include the necessary hashtag). This is also a general way to manually capture information that doesn't get grabbed automatically: virtually anything can be shared by email from my phone, so if I want to mail a link to, say, a Google Maps route for the day, or a link to a song that I just Shazammed, I can just email it with the tag #xpa.

Note that I could set up the recipe to trigger on any email to the relevant address, but making sure that the recipe fires only when there's a specific hashtag both helps to ensure I don't post accidentally (e.g., by not being careful about selecting an autocomplete entry in my GMail client, say) and lets me use the GMail channel to trigger other recipes in the future if I want to: I would just need to pick a different hashtag that would need to appear in the subject line to trigger these hypothetical future recipes.

Capture Diigo bookmarks tagged #xpa
Sometimes, while traveling, I do research that contributes to the trip itself; I tend to bookmark things rather promiscuously, and I use Diigo to manage my bookmarks. So one way to easily post a link to something relevant to my travels is just to include #xpa as one of the hashtags when I bookmark something.
Warn followers when my cell phone battery is low
Certainly the most pretentious recipe here here. Yes, there's an Android battery channel on IFTTT. This recipe definitely gets turned off when I'm not on vacation.
Capture Tumblr posts tagged #xpa
While traveling, I may well write something on my blog that's related to my vacation; this recipe captures it. (I'm hoping this recipe will also encourage me to short-blog about my vacations while on a trip.)
Capture tweets tagged with #xpa
Another quick way to capture something I'm likely already doing that's vacation-related.

Not all of the recipes are selective — many are instead comprehensive; they grab all of the data from a particular source, on the assumption that everything I post to that source while on vacation is vacation-related. (Facebook is an example of a service that I use this way.) I generally manage these recipes by turning them off when I'm not on vacation, then turning them on when I go on vacation again.

Here's a list of the comprehensive recipes that I use:

Capture Foursquare checkins
It stands to reason that any check-ins made in Foursquare (well, Swarm, these days, I guess, though it's still the Foursquare channel on IFTTT) while I'm on a trip are, well, trip-related, so they all get captured. When I get home, of course, the recipe gets turned back off, because I check in at places on Foursquare when I'm not on vacation, too.
Capture Flickr photosets
Flickr photosets are generally submitted after I get back from vacation, so I tend to leave this recipe turned on longer than some of the others. Still, once I start posting Flickr photosets, they're definitely all going to be trip-related until I turn the channel off again.
Capture Instagram photos
Yes, all of them; but I turn this recipe off quickly when I get home from a trip.
Capture Facebook status updates
Again, I'm assuming that anything I say on Facebook during a trip is likely to be trip-related. It's worth noting that this Facebook recipe only captures status updates; if I also wanted to capture, say, photos posted to Facebook, I'd need to create an additional recipe to do so. The triggers for IFTTT's Facebook channel allow a user to be rather selective, in fact, and there's a fair amount you can do with the channel, but there isn't currently an any post trigger; and, anyway, I don't use Facebook much, and (in particular) my photos wind up getting posted on Flickr and Instagram instead of directly to Facebook albums; they're either cross-posted to Facebook, or links to them wind up getting posted later. I may very well never have shared a photo on Facebook that wasn't already shared on another service somewhere, so I don't really miss out by not capturing Facebook photos. Ditto for other Facebook-related triggers: I just don't use that social network all that much.
Capture Yelp reviews

Pretty straightforward: I assume that reviewing businesses while on a trip probably means that I'm talking about somewhere I went while on that trip. Again, I'm hoping that this recipe will help motivate me to review more businesses on Yelp. Again, I turn it off when I'm done writing trip-related reviews.

Perhaps the most interesting thing about this recipe is that there is no Yelp channel on IFTTT: this recipe instead consumes an RSS feed that's generated by Yelp when I write reviews, because IFTTT does, in fact, have a feed channel. (You probably already know that a feed is a way of automatically syndicating content; a whole lot of social networks and other services that you use probably already produce feeds that you can use with IFTTT.)

Reddit has two kinds of posts: text posts and link posts. Reddit's IFTTT actions can create both. A text post contains text, and many of the posts that get aggregated import text directly from their source and put it in the body of post. Link posts are, well, links: clicking the title doesn't take you to a chunk of text hosted on Reddit, but rather takes the reader directly to whatever is linked. Some of my recipes (e.g., the one that captures Tumblr blog posts) create text posts, because it makes sense in those cases to import all of the text. Some of the recipes just construct like posts: the recipe that captures Instagram photos just captures a link to the photo on Instagram (Reddit itself doesn't have any photo-hosting features), which has the bonus side effect of creating a thumbnail that gets shown next to the entry title. Both post types have a title, and I do my best to keep this in a more or less consistent format from one recipe to the next: as often as possible, the post title starts with the date and time at which the aggregated-data even actually happened (e.g., when the original photo was posted to Instagram), and I want each post to indicate that it's me posting (my partner will be contributing, too, once she gets IFTTT set up and working for her), and give some indication of what happens: post titless often have the general format of [current date] Patrick [wrote/tweeted/posted an Instagram photo/reviewed on Yelp ...]. If what's being posted is short text, e.g. a tweet, it might get imported directly into the post title itself. Otherwise, the text gets put in the post body and the title is an indicative link that gives some information about what happened and when.

How It Works on Vacation

Way before the vacation starts ...

I might post the occasional thing via email or Twitter, thinking about plans or talking about them on Twitter with my partner. Occasionally, I might create a text post directly, but really, this isn't something I want to do often.

Right before the vacation starts ...

I get on IFTTT and make sure that all of my vacation-related recipes are activated. I refresh the page and check again, just to be sure, then shut down my laptop, toss it (and its power adapter!) into my backpack, and leave. If I forget to turn on recipes before I leave, I can do it later with the IFTTT app on my phone.

On vacation ...

I enjoy my vacation, and do things that automatically gather data for the subreddit: take Instagram photos, write Yelp reviews, tweet with the #xpa hashtag, post Facebook status updates, etc. etc. etc. I may blog a bit on Tumblr about incidents and tag the post #xpa (though this is currently more wishful thinking than reality, in fact). If I do research, I save it on Diigo and make sure it's tagged with ... yeah, the #xpa hashtag. If I have another data source that documents the vacation in some meaningful, I can submit it via email. IFTTT gathers all of this and pipes it to r/XPA for me; all I really need to do is log in from time to time and make sure to clear out anything that gets caught in the moderator queue. (Not much does.) Mostly, I'm just engaging with my social media on vacation in the same way that I normally would; but I try to do so consciously, in ways that support better data aggregation.

I've also got a background process running in my head that looks for new data that should be meaningfully aggregated and tries to think of ways to automate this. Currently, this mostly means that I'm thinking about things I do that wind up generating an RSS or Atom feed.

When I get home ...

I turn off most of the comprehensive recipes right away. (They're labeled with VACATION RECIPE at the beginning, in all caps, to make them easier to identify.) Especially if my partner gets the first shower. I'm probably offloading and backing up my photos more or less immediately, too.

Then I start going through photos, being fairly selective about which make it to Flickr. (My photo processing workflow is a whole separate subject that I won't go into here. It's also a process that needs to be redesigned and streamlined, but that really is to the side from this post.) Individual albums get uploaded to Flickr, then (usually) shared manually to Facebook and Twitter. (This would be easy enough to automate, too, but I currently prefer to do it manually to retain control. That may change in the future, though.) Neither Facebook nor Twitter photo-sharing posts gets shared to r/XPA: the Facebook vacation recipe has already been turned off, and the tweet I send out for an album doesn't have the #xpa hashtag. This is because the Flickr recipe doesn't get turned off until I've uploaded all of the photo albums that I want to upload. I may be writing some Yelp reviews as I go through the photos and get memories of places re-awakened. A few days later, though, the only vacation-related recipes that are still active are selective ones that I might want to use to talk about upcoming vacations: thepost-by-email recipe, the capture-hashtagged-tweets recipe, the capture-hashtagged-Tumblr-posts recipe. Everything else is off.

Wait, But ...

Do you really want to aggregate all of this information here? Aren't you concerned about privacy?
All of the information that's being aggregated here is already public; anyone who wants to do anything nefarious with, say, my location can just get it directly from FourSquare. The other side of the coin, of course, is that I don't post information anywhere in the first place if I want it to remain private: there are places I don't check in on FourSquare, things I don't photograph, thoughts I don't express. That's thoughtful privacy maintenance in the age of social media. r/XPA doesn't make things public, it just aggregates data that's already out there. Anyone who wants to do the research can get all of this information already. In fact, for a technically sophisticated plotter of nefarious plots, aggregation is hardly a benefit at all, because the technically sophisticated plotter could aggregate this data herself. It's not that difficult to do so.
Do you really want to aggregate all of this data here? Aren't you worried there's too much detail? Shouldn't you be more selective?
Well, data aggregation here is the point: it results in a detailed auto-journal for us and our friends. Too, this is another benefit of using Reddit as the site for aggregation: it invokes the Reddit browsing culture. After all, who reads every single post in any subreddit? Virtually no one. Aggregating the data into a variety of links and articles makes it easy for viewers to skim through and look at what they want to see.
[2015-09-10] You keep saying we, but you're the only one posting in r/XPA. Shouldn't we be seeing posts from your partner, too?
This first vacation was a trial run to get things working technically. Wait for our next vacation.
I've got a great idea for a data source you can aggregate!
Great! There's a large number of ways to get in touch with me and let me know.

Trips We've Documented on r/XPA

  1. 2015-08-27 to 2015-09-07: Road trip to Colorado. Some photo albums from this trip, to serve as a rough itinerary:
  2. 2016-03-12 to 2016-03-26: Road trip to Texas, with stops in Colorado and Utah . Some photo albums from this trip, to serve as a rough itinerary:
  3. 2016-06-03 to 2016-06-12: Road trip to Portland, with stops in California and elsewhere in Oregon . Some photo albums from this trip, to serve as a rough itinerary:
  4. 2016-09-23 to 2016-09-26: Weekend in San Francisco. There is a single photo album for this entire trip.
  5. 2016-11-25 to 2016-12-03: Road trip to Seattle, with stops in California and Oregon. There is a single photo album for this entire trip.