DM Blog

PHP Code: Twitter CRON Job

by #code

A while back, I wrote about how I was using the Twitter APIs to display the location info for my tweets. Well, I’ve had a lot of emails about that article and requests to share some code, so now here I am, writing my first code-oriented article… Hm.

If you haven’t read my previous article on this topic, I highly suggest that you start there in order to give you some background on what this script aims to do, and why…

Anyways, without further delay, Here’s a breakdown of the code:

#!/usr/local/bin/php -q

First, we put this line at the top of the PHP file so that the server knows to process the file using PHP. it’s necessary since we’ll be using a CRON job to run this script.

Next, we create a class, and initialize some variables.

<?php
class dm_twitter
{
	private $tweeter;
	private $statuses;
	private $json_url;
	private $filename;

	function __construct()
	{
		// initialize the variables – username, number of statuses to retrieve and the url to the twitter json feed
		$this->tweeter	= 'DanielMenjivar';
		$this->statuses	= 6;
		$this->json_url	= 'https://api.twitter.com/1/statuses/user_timeline/'.$this->tweeter.'.json?count='.$this->statuses;
		$this->filename	= 'public_html/twitter.json';
	}

I put this stuff in the class constructor so that it was at the top of the file and easy for me to access/change if/when I ever needed to. The $tweeter is the Twitter username, the $json_url shouldn’t change, but it’s there just in case, and the $filename is the relative path to the file that we will be saving to. In my case, I have this script sitting in a folder that isn’t publicly accessible, so that’s why I need to specify the public_html folder first. If possible, I don’t recommend that you leave this script sitting in a publicly accessible folder, but the file we’re writing to should be publicly accessible, of course…

In terms of the number of statuses, keep in mind that, according to the Twitter API, the number of statuses actually returned using this method could be less than the number requested because it won’t include any retweets. Since I don’t retweet all that often, (but sometimes I do a couple in a row), I’ve set this to 6 since I am only really wanting about 3 tweets to display at a time. Depending on your retweeting habits, or how many tweets you want to display, you can adjust this number accordingly. The higher you set it, the more flexibility you have to use more tweets, but the larger the JSON file will be, which could translate into slower loading times if you’re pulling too many.

Here’s the function that gets the json file from Twitter:

private function get_json ()
	{
		// use cURL to get the json feed
		$ch = curl_init($this->json_url);
		curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
		curl_setopt($ch, CURLOPT_VERBOSE, 1);
		curl_setopt($ch, CURLOPT_NOBODY, 0);
		curl_setopt($ch, CURLOPT_HEADER, false);
		curl_setopt($ch, CURLOPT_FOLLOWLOCATION,1);

		// try getting the data from twitter and return data if successful, otherwise, just exit the script
		try
		{
			$data		= curl_exec($ch);
			$response	= curl_getinfo($ch);
			curl_close($ch);

			if ( intval( $response['http_code'] ) == 200 )
			{
				return $data;
			} else {
				exit();
			}
		} catch (Exception $e) {
			exit();
		}
	}

And a function to reformat the json file into the format we want:

private function reformat_json()
	{
		// get the json feed and decode it
		$data = json_decode($this->get_json());

		// create a formatted array with all the tweets
		$index = 0;
		foreach ($data as $tweet)
		{
			$tweets[$index]->date	= $this->TimeAgo($tweet->created_at);
			$tweets[$index]->link	= 'https://twitter.com/'.$this->tweeter.'/statuses/'.$tweet->id;
			$tweets[$index]->text	= $this->linkify_text($tweet->text);
			if ($tweet->place->full_name)
			{
				$tweets[$index]->place	= $tweet->place->full_name;
				$tweets[$index]->maplink= 'https://maps.google.com/maps?&q='.implode(",",array_reverse($tweet->coordinates->coordinates));
			}
			$index ++;
		}
		// we need to send some user info too!
		$twitter_data->userlink	=	'https://twitter.com/'.$this->tweeter;
		$twitter_data->tweets	=	$tweets;
		return json_encode($twitter_data);
	}

I’m using json_decode() to convert the json object to a PHP object, and then creating a new object with only the fields that I want. Have a look at the Twitter API to see if there any additional fields you want, (there are tons) but for my purposes, I didn’t need/want too much.

(The TimeAgo() function and the linkify_text() function will be explained further below.)

it’s important to note that I’m only including location information for tweets that have a community name ("place") returned. It makes the JSON file smaller that way (than if I were to just assign NULL values to these properties), but it means you have to test if that property exists on the other end… Also, I’m linking the actual geolocation of the tweet to Google Maps, but you can use whatever you want. it’s important to note that the "coordinates" array returned from Twitter needs to be reversed in order to send the latitude and longitude to Google in the order they expect.

Lastly, I append the Twitter link for the specified user and return the whole thing as a JSON object using json_encode().

Next, a function to save our JSON file:

public function save_json_file()
	{
		$content = $this->reformat_json();
		$filename = $this->filename;

		// Let’s make sure the file exists and is writable first.
		if (is_writable($filename))
		{
			// open the file for writing
			if (!$handle = fopen($filename, 'w'))
			{
				echo "Cannot open file ($filename)";
				exit();
			}

			// Write $content to our opened file.
			if (fwrite($handle, $content) === FALSE)
			{
				echo "Cannot write to file ($filename)";
				exit();
			}

			// close the file
			fclose($handle);

			return true;
		}
	}

You’ll note that this is the first (and only) public function (method) in this class, so this is the one that calls everything else. In case it isn’t obvious, it’s getting our reformatted JSON data and saving it to the file we specified in the __construct().

Our next function, linkify_text();, takes the Tweet’s text (or any other text you feed it) and makes it XHTML safe, makes embedded URLs into hyperlinks, converts embedded usernames (@DanielMenjivar) into Twitter profile links, and converts hashtags (#toronto) into Twitter Search links.

private function linkify_text($raw_text)
	{
		// first set output to the value we received when calling this function
		$output = $raw_text;

		// create xhtml safe text (mostly to be safe of ampersands)
		$output = htmlentities($raw_text, ENT_NOQUOTES, 'UTF-8');

		// parse urls
		$pattern = '/([A-Za-z]+:\/\/[A-Za-z0-9-_]+\.[A-Za-z0-9-_:%&\?\/.=]+)/i';
		$replacement = '<a href="${1}" rel="external">${1}</a>';
		$output = preg_replace($pattern, $replacement, $output);

		// parse usernames
		$pattern = '/[@]+([A-Za-z0-9-_]+)/';
		$replacement = '<a href="https://twitter.com/${1}" rel="external">@${1}</a>';
		$output = preg_replace($pattern, $replacement, $output);

		// parse hashtags
		$pattern = '/[#]+([A-Za-z0-9-_]+)/';
		$replacement = '<a href="https://search.twitter.com/search?q=%23${1}" rel="external">#${1}</a>';
		$output = preg_replace($pattern, $replacement, $output);

		return $output;
	}

The last part of this class is a function that calculates how long ago a tweet was submitted, and returns it to us in relative terms, similar to how Twitter displays it.

//$datefrom is the timestamp for the content, and you can leave the $dateto value to see the current delay
	private function TimeAgo($datefrom,$dateto=-1)
	{
		// convert the $datefrom into the format we need first
			$datefrom	= strtotime($datefrom);

		// Defaults and assume if 0 is passed in that its an error rather than the epoch
			if($datefrom<=0) { return "a long time ago"; }
			if($dateto==-1) { $dateto = time()+1; }

		// Calculate the difference in seconds betweeen the two timestamps
			$difference = $dateto – $datefrom;

		switch ($difference)
		{
			case ($difference < 60): // less than 60 seconds
				return "less than a minute ago";
				break;
			case ($difference >= 60 && $difference<60*60): // between 60 seconds and 60 minutes
				$datediff = floor($difference / 60);
				$res = ($datediff==1) ? "about $datediff minute ago" : "about $datediff minutes ago";
				return $res;
				break;
			case ($difference >= 60*60 && $difference<60*60*24): // between 1 hour and 24 hours
				$datediff = floor($difference / 60 / 60);
				$res = ($datediff==1) ? "about $datediff hour ago" : "about $datediff hours ago";
				return $res;
				break;
			case ($difference >= 60*60*24): // greater than 1 day
				$datediff = floor($difference / 60 / 60 / 24);
				$res = ($datediff==1) ? "about $datediff day ago" : "about $datediff days ago";
				return $res;
				break;
		}
	}
}

You could also use this function elsewhere, and if you change the $dateto value, then you can calculate the delay from a different point in time, instead of "now"… I’m also adding one second to "now" in order to not get any errors.

that’s it for the class, but obviously, you have to instantiate it first and then run save_json_file(); to actually do anything. Since I’m running this as a CRON job, I just appended that to the bottom of the file and it was all good. But before I get into that, I’ve also added another function to this script (not within the class though) after a couple days of research and troubleshooting.

It turned out, that sometimes Twitter’s servers didn’t respond, or were too slow, or who knows what was going on, but the script wouldn’t finish running. The next time the CRON job ran the script, (a minute later) it ran as a new independent process, which didn’t interfere with anything. Except that, after a week or so, I then had tons of unfinished processes sitting on the server (taking up memory and resources), so I needed to have a way to make sure this script only ran once at a time.

With the help of some of the guys at Liquid Web’s support, the solution seemed to be to check if that script was running first, and then terminate it before starting all over again. They gave me the idea or the "what to do" but it was up to me figure out how to code it all up, which surprisingly, wasn’t that difficult. Here it is:

// this function looks for all previous processes running this script, and if they exist, it kills the process :-)
function kill_previous()
{
	// set the path to this file to the $file variable
	$file = __FILE__;

	// run the command line to get a list of all the php5 processes and assign it to $processes
	exec("ps -C php5 -o pid= -o %a",$processes);

	// foreach PHP5 process, if it contains the name of this file, then let’s kill that process
	foreach ($processes as $key=>$process)
	{
		$pid	=	intval($process);
		if (strstr($process,$file) && $pid != getmypid()){
			exec("kill $pid");
		}
	}
}
// end of kill_previous function

This function finds out what the name of the current file is first, (so you can name/rename this file anything, and it will still work) and then gets a list of all PHP processes that are currently running (for the current user only, of course). Included with the list of processes is any arguments that were sent with the PHP command. Because of the way I am setting the CRON job to run my script (more on that later), the filename is included in the arguments. This let’s me search for the file, and then kill that process. it’s important to get the ID of the current process that’s running this script and not kill that process too, otherwise your script will end there and finish just as soon as it started. (I made that mistake and didn’t realize for a couple days that my tweets weren’t updating – duh!) Simple no?

The last part of this file is to actually run this function, instantiate the class and run it:

// let’s kill all previous processes running this script, and run it again
	kill_previous();
	$twitter = new dm_twitter;
	$twitter->save_json_file();
?>

that’s it.

Here’s the whole file as a download so that you don’t have to copy and paste everything above: dm-twitter.php

The last thing you need to do is run a CRON job to do this for you. I set mine to run once every minute, but you can do whatever you want. I did this using the Cron Jobs option under cPanel, using:
php5 -q /full/path/to/your/script/dm-twitter.php >/dev/null 2>&1.

And that’s it! Now you have a small JSON file that updates with new tweets from Twitter every minute. The only thing left to do is to grab this JSON file and process it to display your tweets on your site (or wherever you want to put it). On some of my sites, I use JavaScript to display the tweets, and on other sites, I’m just accessing the JSON file with PHP, but I’ll leave that up to you how you use it.

Now my server only accesses Twitter once every minute, (not with every request) and it runs in the background. All my sites just need to access this one JSON file now, and it’s formatted just how I want it to be, and is much leaner than the JSON file you get from Twitter.

Let me know if you have any thoughts, questions, comments, or anything you have to say by writing a comment below. Thanks for reading – I hope this helps!