Twitter Location Info

April 17, 2010 by Daniel Menjívar • #technology

Back in November 2009, Twitter announced the ability to geo-tag your tweets, which was only released as an API (Application Programming Interface) at the time (meaning that only Twitter clients (like Tweetie) could access the geo-location data, and the information wasn’t visible on the Twitter website). But recently, Twitter has made significant enhancements to the API and the location info, if it’s available, is now displayed on their website too.

I noticed a couple weeks ago that the Twitter API, (which returns data in RSS, XML, JSON and/or ATOM formats), now includes a wealth of location information, and I was excited to start using some of that information on my websites. If you check out DanielMenjivar.com, or the homepage of my blog, you can see it in action. it’s very neat.

For my geo-tagged tweets, (yes, you can turn it on or off on a per-tweet basis), I’m now displaying the name of the community where that tweet was submitted, which links to a google map that shows the actual location of that tweet… (My location data has been available via twitter.com/DanielMenjivar for some time now, so I’m not worried about displaying my location info on my websites. Whether or not I should geo-tag my tweets is a separate issue – and I can always permanently remove all my geo-data from Twitter at the click of a button if I want/need to, so I’m not worried about it.)

Apart from that, I also made some other changes. Any links included within my tweets actually work as links now, @usernames now link to Twitter profiles, #hashtags link to Twitter Search, and I made changes to how the dates are displayed (ex. 5 minutes ago, 35 seconds ago, etc.).

At the same time, when I was reading up on the Twitter API, I realized that there is a limit of 150 requests per hour, which for unauthenticated requests like the ones I’m making, are based on your IP address. (In this case, that would be the IP address of my server.) Considering the amount of traffic I get just to DanielMenjivar.com, 150 requests per hour is not really a lot. (Granted, I only display Twitter updates on my home page, but still…) So instead of fetching the data from Twitter with every page request (like I was before), I created a new script to run once every minute on my server, (as a CRON job). The script fetches the data from Twitter (only if there are changes), removes the excess API data that I’m not using, and then saves everything to a very small JSON file. This way, my server only makes 60 requests per hour to Twitter, and then all my sites just need to access this significantly smaller, cached JSON file.

Apart from staying under the limit of 150 requests per hour, the advantage is that I’m not having to access Twitter’s servers (which are very slow, and go down often) with every page request. Now I’m just locally accessing a very small text file and all the parsing is done in the background by the CRON job. My page load times are not at all affected by the speed of Twitter’s servers anymore. (I still have to format the dates with each request though, so that they display the correct information, since I want to display them as "35 seconds ago", for example, and not a specific time… But the time it takes to format this is very insignificant.)

There are only two disadvantages:

The first disadvantage is that my server is now always polling Twitter once every minute, even if there are no requests for that information. I’d be willing to bet that I was averaging at least 60 requests per hour before though. So while it might be overkill in the wee hours of the night, it’s probably an advantage in the day-time (when traffic to my sites is highest) to have only 60 requests per hour… At the same time, the script I wrote is super efficient – the amount of CPU it takes to run is not even 0.5% and it takes much less than ¼ second to execute – the only limiting factor is the speed of Twitter’s servers. So it first checks for a 200 response code – if it gets a 304 response, meaning that there aren’t any new tweets/information since the last request, it quits running. So this "disadvantage" is not really anything to worry about, and may even be an advantage compared to how I was running things before…
The second disadvantage is that my tweets only update every minute now and not in real time. So there’s potential that the most recent tweet is not loaded for 59 seconds, but I’m not convinced that it’s anything to worry too much about, no? Actually, it wouldn’t even be 59 seconds most of the time, that’s just the theoretical maximum. In order for a tweet not to be displayed on my site for a whole 59 seconds, it would mean that I would have to submit it to Twitter just microseconds after the CRON job finished checking for new tweets… Otherwise, the lag would be less than that, and possibly only a couple seconds. Again, I’m not convinced it’s a big enough issue to worry about, but it still is a "disadvantage" if you compare it to loading tweets in real time.

Anyways, if you’re interested seeing some of my code, or are interested in knowing more about this, let me know in the comments below. If not, that’s cool too – just check out the Twitter updates on my sites, and follow me on Twitter!

Update – April 17, 2010 @ 6:51pm

I just got an idea after posting this…

I’ve decided that rather than formatting the dates with each page load, I’m going to do it in the CRON script instead. It makes the code easier to maintain if I can just have it all in one place.

I remember I initially did it this way, (since I want to display the dates the same way across all my sites), but the issue was that if I just recently posted to Twitter within the minute, and you visited my sites, the "35 seconds ago" didn’t update when you refreshed the page – it was static according to when the CRON job last checked for new tweets.

So what I’m going to do to get past that, is display "less than a minute ago" for these tweets. it’s not as specific as showing how many seconds ago, but having minutes as the lowest denominator is fine, no? Especially since I’m only checking for new tweets once a minute anyways, I think it makes more sense that way.