Pulling an iTunes Feed with PHP (CF to PHP Conversion Example)

Way back in 2013, I posted about using ColdFusion to pull data from an iTunes-capable RSS and CFFEED’s funkiness with it.  For that post, I just put the bit of code that took the RSS contents and put it into a DB.  It was a script for a client, that eventually was updated to pull content from the same source’s API, put it in a DB, and generate an iTunes feed that was hosted on the client’s site, so it stayed in their control so if they changed hosts or podcast providers, they wouldn’t have to do much with the feed.

Said client is also my only “hosted” client at this point, so I’ve moved her over to the new host, which is purely PHP.  It was an easy move as the script doing this API to RSS flip was the only ColdFusion on her site, but obviously I needed to redo the script for her before she had a new episode to release J

It seemed like a good “moving from CF to PHP” example, so decided to do a post on it.  As mentioned, it actually had two parts.  First, pulling the episodes from the API (in this case, a Spreaker show), then going through the results, cleaning them up, and storing them in the database table.  (Note, for my client’s privacy, the examples below are using a public Spreaker feed and I’ve generic-sized the content)

<cfprocessingdirective pageencoding="utf-8" suppresswhitespace="yes">
<cfset variables.pageTitle = "Update Episode DB/RSS" />

<!--- Get episodes from API: by default only gets the last handful, not the whole list of available ones --->
<cfhttp result="aEpisodes" url="http://api.spreaker.com/show/452240/episodes" method="get" />
<cfset myFeedContent = DeserializeJSON(aEpisodes.fileContent) />
<cfset myEpisodes = myFeedContent.response.pager.results />

<cfset newEpisodes = 0 />

<!--- loop through episodes, check if in DB, and if not, added --->
<cfloop index="thisEpisode" array="#myEpisodes#">
	<cfset episodeid = ReReplace(thisEpisode.episode_id, "[^0-9]", "", "ALL") />
	<cfquery name="qCheck" datasource="my_dsn">
		SELECT episodeid
		FROM podcast_episodes
		WHERE episodeid = <cfqueryparam value="#episodeid#" cfsqltype="cf_sql_integer" />
	<cfif qCheck.RecordCount EQ 0>
		<cfset episodetitle = Replace(thisEpisode.title, " - ", ":") />
		<cfset seasonepisodetag = TRIM(ListGetAt(episodetitle, 1, ":")) />
		<cfset seasonnumber = ReReplace(ListGetAt(seasonepisodetag, 1), "[^0-9]", "", "ALL") />
		<cfset episodenumber = ReReplace(TRIM(ListGetAt(seasonepisodetag, 2)), "[^0-9]", "", "ALL") />
		<cfset episodetitle = TRIM(ListGetAt(episodetitle, 2, ":")) />
		<cfset durationInSeconds = thisEpisode.length/1000 />
		<cfset duration = "#INT(durationInSeconds/60)#:#numberformat(durationInSeconds MOD 60,'00')#" />
		<cfset bytelength = (durationInSeconds * 192000) / 8 />
		<cfset dateposted = DateFormat(thisEpisode.published_at, "mmmm d, yyyy") & " " & TimeFormat(thisEpisode.published_at, "hh:mm:dd tt") />
		<cfset spreakerurl = thisEpisode.site_url />
		<!--- do what we can to fix up our description format --->
		<cfset description = "<p>" & thisEpisode.description />
		<cfset description = Replace(description, "Progress Report:", "</p><p><strong>Progress Report:</strong>") />
		<cfset description = Replace(description, "Monthly Goals:", "</p><p><strong>Monthly Goals:</strong>") />
		<cfset description = Replace(description, "Random Recommendation:", "</p><p><strong>Random Recommendation:</strong>") />
		<cfset description = Replace(description, "Music Clip:", "</p><p><strong>Music Clip:</strong>") />
		<cfset description = Replace(description, "Links:", "</p><p><strong>Links:</strong>") />
		<cfset description = Replace(description, " )", ")", "ALL") />
		<cfset description = REReplaceNoCase(description, "(\bhttps://[a-z0-9\.\-_:~@##%&/?+=]+)", "<a href=""\1"" target=_blank>\1</a>", "ALL") />
		<cfset description = REReplaceNoCase(description, "(\bhttp://[a-z0-9\.\-_:~@##%&/?+=]+)", "<a href=""\1"" target=_blank>\1</a>", "ALL") />
		<cfquery datasource="my_dsn">
			INSERT INTO podcast_episodes (episodeid, seasonnumber, episodenumber, episodetitle, bytelength, duration, spreakerurl, description, dateposted)
				<cfqueryparam value="#episodeid#" cfsqltype="cf_sql_integer" />, 
				<cfqueryparam value="#seasonnumber#" cfsqltype="cf_sql_smallint" />, 
				<cfqueryparam value="#episodenumber#" cfsqltype="cf_sql_smallint" />, 
				<cfqueryparam value="#episodetitle#" cfsqltype="cf_sql_varchar" maxlength="100" />,
				<cfqueryparam value="#bytelength#" cfsqltype="cf_sql_integer" />, 
				<cfqueryparam value="#duration#" cfsqltype="cf_sql_varchar" maxlength="6" />, 
				<cfqueryparam value="#spreakerurl#" cfsqltype="cf_sql_varchar" maxlength="255" />, 
				<cfqueryparam value="#description#" cfsqltype="cf_sql_text" />, 
				<cfqueryparam value="#dateposted#" cfsqltype="cf_sql_timestamp" />
		<cfset newEpisodes = newEpisodes + 1 />

<cfquery name="qPodcastPosts" datasource="my_dsn">
	SELECT episodeid, seasonnumber, episodenumber, episodetitle, bytelength, duration, spreakerurl, description, dateposted
	FROM podcast_episodes
	ORDER BY dateposted DESC

<p><cfoutput>#newEpisodes#</cfoutput> episodes added</p>

Without the loveliness that is ColdFusion datasources, I obviously had to set up the DB connection and what not for the PHP version. I also added a bit more error checking on the PHP one (or at least nicer errors).

	$DB_NAME = 'databasename';
	$DB_USER = 'username';
	$DB_PASSWORD = 'password';
	$DB_HOST = 'localhost';
	$dbc = new mysqli($DB_HOST, $DB_USER, $DB_PASSWORD, $DB_NAME);
	// get the episodes from the API
	$ch = curl_init('http://api.spreaker.com/show/452240/episodes');
	curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
	curl_setopt($ch, CURLOPT_TIMEOUT, 7);

	$curl_contents = curl_exec($ch);
	$curl_info = curl_getinfo($ch);

		$aResults = json_decode($curl_contents);
		$aEpisodes = $aResults->response->pager->results;

		echo '<pre>';
			print_r('<p class="error">Curl Info</p>');
			print_r('<p class="error">Curl Contents</p>');
		echo '</pre>';  exit;
	$numNewEpisodes = 0;
	// loop feed and add any new episodes to the database - we don't just truncate and rerun because it only returns the newest episodes by default
	foreach($aEpisodes as $aThisEpisode):
		$sqlCheckEpisode = "SELECT episodeid
			FROM podcast_episodes
			WHERE episodeid = " . $aThisEpisode->episode_id;
		$qCheckEpisode = $dbc->query($sqlCheckEpisode);
		if ($dbc->error):
			echo("<p class='error'>Unable to check for episode existing<blockquote>");
			print_r($sqlCheckEpisode . "<br />");
			echo("</blockquote>"); exit; // stop to avoid flood of errors if it's a global issue
		if($qCheckEpisode->num_rows == 0):
			$episodeid = $aThisEpisode->episode_id;
			$aTitleBreakdown = explode(":", str_replace(" - ", ":", $aThisEpisode->title));
			$aTagBreakdown = explode(",", $aTitleBreakdown[0]);
			$seasonnumber = trim(preg_replace("/[^0-9]/", "", $aTagBreakdown[0]));
			$episodenumber = trim(preg_replace("/[^0-9]/", "", $aTagBreakdown[1]));
			$episodetitle = mysqli_real_escape_string($dbc, trim($aTitleBreakdown[1]));
			$durationInSeconds = $aThisEpisode->length/1000;
			$duration = round($durationInSeconds/60, 0, PHP_ROUND_HALF_DOWN) . ":" . sprintf("%02d", ($durationInSeconds % 60));
			$bytelength = ($durationInSeconds * 192000) / 8;
			$dateposted = $aThisEpisode->published_at;
			$spreakerurl = trim($aThisEpisode->site_url);
			$description = '<p>' . $aThisEpisode->description;
			$description = str_replace("Progress Report:", "</p><p><strong>Progress Report:</strong>", $description);
			$description = str_replace("Monthly Goals:", "</p><p><strong>Monthly Goals:</strong>", $description);
			$description = str_replace("Random Recommendation:", "</p><p><strong>Random Recommendation:</strong>", $description);
			$description = str_replace("Music Clip:", "</p><p><strong>Music Clip:</strong>", $description);
			$description = str_replace("Links:", "</p><p><strong>Links:</strong>", $description);
			$description = str_replace(" )", ")", $description);
			$description = preg_replace('/https?:\/\/[\w\-\.!~#?&=+\*\'"(),\/]+/', '<a href="$0">$0</a>', $description);
			$description .= '</p>';
			$description = mysqli_real_escape_string($dbc, trim($description));
			$sqlInsertEpisode = "INSERT INTO podcast_episodes (episodeid, seasonnumber, episodenumber, episodetitle, bytelength, duration, spreakerurl, description, dateposted)
			if ($dbc->error):
				echo("<p class='error'>Unable to insert episode<blockquote>");
				print_r($aThisEpisode . "<br />");
				echo("</blockquote>"); exit; // stop to avoid flood of errors if it's a global issue
	echo '<p>' . $numNewEpisodes . ' new episodes added to database.';

I debated removing a lot of the string replaces, as they were originally for formatting the descriptions from the API for display on an auto-generated webpage, which has long been replaced with just posting using WordPress and the Spreaker WordPress plug-in (much more user-friendly and neater in appearance).  Meaning the DB now purely feeds the iTunes feeds.  She also no longer uses those tag bits, but since older entries still have those bits it seemed better to keep them nicely formatted for the feed.

Regardless of whether any new episodes were added or not, the script always recreates the entire podcast RSS feed.  This is because it’s just easier than trying to find the spot in the text file and shoving in the new episode.  I also chose to always write it to a file, versus pointing to a dynamic link, because it isn’t updated often enough to warrant needing it to be “live” (she generally does just 1-2 episodes a month).  And at least with the ColdFusion version, my host’s CF stuff crashed enough that having it written to an XML reduced downtime for the feed itself.

<!--- Generate RSS feed --->
<cfset myRSSFeed = GetDirectoryFromPath( GetCurrentTemplatePath() ) & "my_podcast.xml" />

<cfsavecontent variable="myRSS"><?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" version="2.0">
		<title>My Podcast</title>
		<copyright>&#xA9; 2013-<cfoutput>#Year(NOW())#</cfoutput> Miss Podcaster, My Company</copyright>
		<itunes:author>Miss Podcaster</itunes:author>
		<itunes:summary>***AWESOME PODCAST DESCRIPTION***
			<itunes:name>Miss Podcaster</itunes:name>
		<itunes:image href="http://miss-podcaster.com/podcast_cover_art.png" />
		<itunes:category text="Arts">
			<itunes:category text="Literature" />
		</itunes:category><cfoutput query="qPodcastPosts"><cfset endOfFirst = FindNoCase("</p>", description) - 1 /><cfif endOfFirst LTE 0><cfset endOfFirst = LEN(description) /></cfif><cfset episodeSummary = TRIM(ReplaceNoCase(LEFT(description, endOfFirst), "<p>", "", "ALL")) />
			<title>S#seasonnumber#, E#episodenumber#: #XMLFormat(episodetitle)#</title>
			<itunes:author>Miss Podcaster</itunes:author>
			<itunes:image href="http://miss-podcaster.com/podcast_cover_art.png" />
			<enclosure url="http://api.spreaker.com/download/episode/#episodeid#.mp3" length="#bytelength#" type="audio/x-m4a" />
			<pubDate>#DateFormat(dateposted, "ddd, dd mmm yyyy") & " " & TimeFormat(dateposted, "HH:mm:ss") & " GMT"#</pubDate>

<cffile action="write" file="#myRSSFeed#" output="#myRSS#" nameconflict="overwrite" />

<p>RSS Feed File updated</p>

Now I will admit, when it came time to do the PHP version, I seriously missed cfsavecontent.  Yes, PHP has heredoc, but I ran into a few issues there, as noted in my comments in the code.

	// now get all the episodes from the DB to write the RSS feed
	$sqlGetEpisodes = "
		SELECT episodeid, seasonnumber, episodenumber, episodetitle, bytelength, duration, spreakerurl, description, dateposted
		FROM podcast_episodes
		ORDER BY dateposted DESC
	$qEpisodes = $dbc->query($sqlGetEpisodes);
	if ($dbc->error):
		echo("<p class='error'>Unable to get the episodes! <blockquote>");
		print_r($sqlGetEpisodes . "<br />");
		echo("</blockquote>"); exit; // stop to avoid flood of errors if it's a global issue
	$currentYear = date("Y");
	$podcast_description = "***AWESOME PODCAST DESCRIPTION***";
	$myRSS = '<?xml version="1.0" encoding="UTF-8"?>'; // this had to come out of the heredoc below because it was causing all sorts of unhappiness due to the ?
	$myRSS .= <<<EOD

<rss xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" version="2.0">
		<title>My Podcast</title>
		<copyright>&#xA9; 2013-{$currentYear} Miss Podcaster, My Company</copyright>
		<itunes:author>Miss Podcaster</itunes:author>
			<itunes:name>Miss Podcaster</itunes:name>
		<itunes:image href="http://miss-podcaster.com/podcast_cover_art.png" />
		<itunes:category text="Arts">
		<itunes:category text="Literature" />
	while($aThisRow = $qEpisodes->fetch_assoc()):
		$pubDate = date(DateTime::RFC2822, strtotime($aThisRow['dateposted']));
		$description = htmlspecialchars($aThisRow['description'], ENT_XML1, 'UTF-8');
		$myRSS .= "
				<title>S{$aThisRow['seasonnumber']}, E{$aThisRow['episodenumber']}: " . htmlspecialchars($aThisRow['episodetitle'], ENT_XML1, 'UTF-8') . "</title>
				<itunes:author>Miss Podcaster</itunes:author>
				<itunes:image href="http://miss-podcaster.com/podcast_cover_art.png" />
				<enclosure url='http://api.spreaker.com/download/episode/{$aThisRow['episodeid']}.mp3' length='{$aThisRow['bytelength']}' type='audio/x-m4a' />
	$myRSS .= <<<EOD
	$myRSS = trim($myRSS);
	$myRSSFeed = "my_podcast.xml";
	file_put_contents($myRSSFeed, $myRSS);

<p>RSS Feed File updated</p>

So that’s how I did it.  Could it be tighter?  Sure.  I could have used PHP’s XML stuff, but I found it excessively cumbersome (and rather convoluted) for this purpose.  I did streamline a few things, though, like moving the description up to a variable for one-stop updating, but in the end the ColdFusion version was 118 lines of code, while the PHP one needed 196. But it works, runs fast, and the feed was a spot on match after generating, keeping iTunes happy.