An Eclectic World Recode: DB Conversion Scripts


I initially planned my first actual post to be on setting up Zend in a hosted environment.  I’ve done what I think is needed, but I haven’t actually tested it due to the lack of a true hosted environment to push to just yet (I’ll have it before the end of the series when I finish switching hosts).  So for now, we’ll just say I have the basic Zend skeleton working in my mostly-kind-of-sort-of-I-think emulated hosting set up and continue on to step two. 😉

Before I get into the actual coding of the new app, I need to get the database going.  As part of this upgrade, I’ll be moving from PostgreSQL to MySQL, implementing my current naming conventions, fixing a few deficiencies in the old design, and adding a few new minor features.

After writing the new MySQL schema and creating the DB, I needed some upgrade scripts to move the data from the old DB to the new.  For my purposes, I ran it all on my local development environment, allowing me to easily to “localhost” for both DBs, however the same script mechanisms would work remotely too by changing the configurations.

Note: In case it’s not obvious, the table names and passwords may have been changed a bit for these examples, but the actual code works as written 😉

I added a folder under public to store the scripts, then created an index file that has a list of all the scripts being made, with links to each, and a base_stuff.php file that will have some useful tidbits so we can include it in every script for easy reference.

UpgradeIndex

While in this case, I’d be the only one doing the upgrade, I still formatted the index as I would for one of our work projects, with reminder notes in case someone else ends up having to run them.  You’ll also note the “attempt to fill in missing covers from Amazon” is not linked in any section, as I’m still debating doing that as part of the upgrade or just doing it later.

Now let’s look at the base_stuff.php.  First, I set up some basic variables that work for the different environments I’m doing the development in, plus the eventual production environment.

	if ($_SERVER['APPLICATION_ENV'] == 'ssw_development'):
		$DBAddress = '172.24.1.111';
		$old_file_location = "/var/www/html/my_site/dropinn/coverImages/";
		$new_file_location = "/var/www/html/my_site/public/covers/";
	
	elseif ($_SERVER['APPLICATION_ENV'] == 'localdev'):
		$DBAddress = 'localhost';
		$old_file_location = "C:\www\html\my_site\dropinn\coverImages\\";
		$new_file_location = "C:\www\html\my_site\public\covers\\";
	
	else:
		$DBAddress = 'localhost';
		$old_file_location = "/var/www/html/my_site/dropinn/coverImages/";
		$new_file_location = "/var/www/html/my_site/public/covers/";
		
	endif;

Then I set up some variables for all of the pages to use, in this case just my Amazon IDs.

	$AWSAccessKeyID = "what_ever_the_access_key_id_is";
	$AmazonAccessKey = "what_ever_the_secret_key_is";
	$AmazonAssociateTag = "your_associate_tag";

I also set up my initial DB connections.  As far as I could tell, while MySQL got the nice MySQLi update in PHP, PostgreSQL did not, which mean having to go look up how to do a PostgreSQL connection.

	// host, username, pass, db
	$old_dbc = pg_connect("host=" . $DBAddress . " dbname=MySiteDB password=somesuperawesomepassword user=MyUserName");
	pg_set_client_encoding($old_dbc, "utf8");
	
	// db address, username , pass, db
	$new_dbc = new mysqli($DBAddress, "MyUserName", 'anothersomesuperawesomepassword', "My_Site_DB");
	$new_dbc->set_charset("utf8");

I also throw in a little bit of CSS to have the ability to do a quick error class, if needed, to highlight script problems.

<style type="text/css">
	.error { font-weight: bold; color: red; }
</style>

Finally, I end with two functions, the first reads a CSV file, and is pretty basic PHP:

	// reading the CSVs
	function readCSV($csvFile){
		$file_handle = fopen($csvFile, 'r');
		while (!feof($file_handle) ) {
			$aCSV[] = fgetcsv($file_handle, 1024);
		}
		fclose($file_handle);
		return $aCSV;
	}

The second handles generating the appropriate URL for accessing Amazon’s API (thanks to Kenny Lucius for the basics of this code!)

	function generateSignedAmazonURL($aParameters, $AmazonAccessKey) {
		//sanity check
		if(!$aParameters) 
			exit('Missing parameters');
		
		/* create an array that contains url encoded values
		like "parameter=encoded%20value" 
		USE rawurlencode !!! */
		$encoded_values = array();

		foreach($aParameters as $key=>$val)
			$encoded_values[$key] = rawurlencode($key) . '=' . rawurlencode($val);
		
		/* add the parameters that are needed for every query
		if they do not already exist */
		if(@!$encoded_values['AssociateTag'])
			$encoded_values['AssociateTag']= 'AssociateTag=' . rawurlencode($AmazonAssociateTag);

		if(@!$encoded_values['AWSAccessKeyId'])
			$encoded_values['AWSAccessKeyId'] = 'AWSAccessKeyId=' . rawurlencode($AWSAccessKeyID);
	
		if(@!$encoded_values['Service'])
			$encoded_values['Service'] = 'Service=AWSECommerceService';
	
		if(@!$encoded_values['Timestamp']):
			$objDateTime = new DateTime('NOW');
			$timeStamp = $objDateTime->format('Y-m-d\TH:i:s\Z');
			$encoded_values['Timestamp'] = 'Timestamp=' . rawurlencode($timeStamp);
		endif;
	
		if(@!$encoded_values['Version'])
			$encoded_values['Version'] = 'Version=2013-08-01';
		
		/* sort the array by key before generating the signature */
		ksort($encoded_values);
		
		/* set the server, uri, and method in variables to ensure that the 
		same strings are used to create the URL and to generate the signature */
		$server = 'webservices.amazon.com';
		$uri = '/onca/xml'; //used in $sig and $url
		$method = 'GET'; //used in $sig
		
		/* implode the encoded values and generate signature
		depending on PHP version, tildes need to be decoded
		note the method, server, uri, and query string are separated by a newline */
		$query_string = str_replace("%7E", "~", implode('&',$encoded_values));   
		$sig = base64_encode(hash_hmac('sha256', "{$method}\n{$server}\n{$uri}\n{$query_string}", $AmazonAccessKey, true));
		
		/* build the URL string with the pieces defined above
		and add the signature as the last parameter */
		$url = "http://{$server}{$uri}?{$query_string}&Signature=" . str_replace("%7E", "~", rawurlencode($sig));
		return $url;
	}

With that done, we can create the first script, which goes and gets what might be called the “pick list” tables for our first section and copies them to the new DB.  For the most part, this is really simple code as they are all fairly straight jobs of grab existing, loop, and insert.

<?php
	include_once('functions.php');
	
	// Empty all of the tables we're going to work with; note, this will NOT reset their ids,
	// but for these it doesn't matter since they will be importing with IDs anyway
	$aTablesToEmpty = array(
		'my_library_of_books_yeah_authors',
		'my_library_of_books_yeah_formats',
		'my_library_of_books_yeah_genres',
		'my_library_of_books_yeah_publishers',
		'my_library_of_books_yeah_series',
	);
	
	foreach($aTablesToEmpty as $key => $tableName):
		$sEmpty = "DELETE FROM " . $tableName . ";";
		$aResults = $new_dbc->query($sEmpty);
		
		if ($new_dbc->error):
			echo("<p class='error'>Unable to delete " . $tableName . "<blockquote>");
			printf($new_dbc->error);
			echo("</blockquote>"); exit;
		endif;
	endforeach;
	
	// NOW DO ALL THE INDIVIDUAL IMPORTS
	// import my_library_of_books_yeah_authors
	$qGetAuthors = "SELECT DISTINCT authorid, CASE WHEN firstname = '' THEN NULL ELSE firstname END AS firstname, lastname
		FROM mylibrary_authors INNER JOIN mylibrary_my_library_of_books_yeah_to_authors ON authors_authorid = authorid
		ORDER BY authorid
		";
	
	$qRunSelect = pg_query($old_dbc, $qGetAuthors);
	
	if(!$qRunSelect):
		echo("<p class='error'>Unable to get authors<blockquote>");
		printf(pg_last_error($old_dbc));
		echo("</blockquote>");
		echo '<pre>';
			print_r($qGetAuthors);
		echo '</pre>';exit;
	endif;
	
	while($aThisRow = pg_fetch_assoc($qRunSelect)):
		$qInsert = "INSERT INTO my_library_of_books_yeah_authors(author_id, first_name, last_name)
			VALUES(
				'" . $aThisRow['authorid'] . "',"
				. (empty($aThisRow['firstname']) ? 'NULL,' : "'" . mysql_real_escape_string($aThisRow['firstname']) . "',") . 
				"'" . mysql_real_escape_string($aThisRow['lastname']) . "'
			);";
		$new_dbc->query($qInsert);
		
		if ($new_dbc->error):
			echo("<p class='error'>Unable to insert author! <blockquote>");
			printf($new_dbc->error);
			echo("</blockquote>"); exit; // stop to avoid flood of errors if it's a global issue
		endif;
	endwhile;
	
	echo '<p>Authors inserted</p>';
?>

Again I had to diddle around a bit to figure out the changes with PostgreSQL queries versus MySQL, but on the whole it works in a similar fashion other than not being able to use the lovely object oriented style of writing them.

Now, for a lot of our internal apps, this stuff is even easier as we can do straight INSERT INTO…SELECT FROM queries as we’d just be going from one MySQL DB to the other.  But as we’re crossing between the two, it’s the old fashion grab, loop, and insert method.

Anyway, within that script, I repeated the same basic method to load in the formats, genres, and series.  Then in the next script, “Load Books”, I did the same principle, just more fields, for the main books table followed by the book to author connections.

The next script you’ll see in my index above is “Fix Series Order and Volume Names”.  I added a field for series position in my old DB awhile back, but it was only populated for new works after that.  So we’re going to use a script to go through and mass fix most of them, namely anything that has “volume” in its name.

<?php
	include_once('functions.php');
	
	// Get a list of all of the books that are in a series and have "volume" in the name
	$qGetBooks = "SELECT book_id, book_title, series_name, series_length
		FROM my_library_of_books_yeah_books INNER JOIN my_library_of_books_yeah_series ON my_library_of_books_yeah_books.fk_series_id = my_library_of_books_yeah_series.series_id
		WHERE fk_series_id IS NOT NULL
			AND series_order_by IS NULL
			AND LCASE(book_title) LIKE '%volume%'
		ORDER BY book_title
		";
	
	$qRunSelect = $new_dbc->query($qGetBooks);
		
	if($new_dbc->error):
		echo("<p class='error'>Unable to get series my_library_of_books_yeah_books<blockquote>");
		printf($new_dbc->error);
		echo("</blockquote>");
		echo '<pre>';
			print_r($qGetBooks);
		echo '</pre>';exit;
	endif;
	
	// loop through our results
	while($aThisRow = $qRunSelect->fetch_assoc()):
		// manually fix any that were found on earlier runs that the script failed to figure out the number for (mostly a few works that also had numbers in the title
		switch($aThisRow['book_id']):
			case 1411:
				$possibleVolumeNumber = 5;
				break;

			case 1487:
				$possibleVolumeNumber = 6;
				break;

			case 1384:
				$possibleVolumeNumber = 33;
				break;

			case 705:
				$possibleVolumeNumber = 7;
				break;

			default:
				$possibleVolumeNumber = trim(preg_replace("/[^0-9]/", "", $aThisRow['book_title']));
				break;
		endswitch;
		
		// if we have a valid volume number, update the book with it's correct positioning
		if($possibleVolumeNumber <= ($aThisRow['series_length'] + 1) && $possibleVolumeNumber > 0):
			$qUpdate = "UPDATE my_library_of_books_yeah_books
				SET series_order_by = $possibleVolumeNumber
				WHERE book_id = " . $aThisRow['book_id'];
			$new_dbc->query($qUpdate);
			
			if ($new_dbc->error):
				echo("<p class='error'>Unable to insert my_library_of_books_yeah_books! <blockquote>");
				print_r($qUpdate . "<br />");
				printf($new_dbc->error);
				echo("</blockquote>"); exit; // stop to avoid flood of errors if it's a global issue
			endif;
		
		// if we couldn't figure it out, spit out the details so it can be manually fixed above
		else:
			echo "<p>I'm not sure if book #" . $aThisRow['book_id'] . " - <em>" . $aThisRow['book_title'] . "</em> is volume $possibleVolumeNumber in " . $aThisRow['series_name'] . " which is supposed to have " . $aThisRow['series_length'] . " titles!</p>";
		endif;
	endwhile;
	
	echo '<p>Serialized works with volumes now have positions set...right?</p>';
?>

Next up we’ll move the covers from the old folder to the new.  Now, if you aren’t renaming your files and reorganizing them, you could just manually copy them over and be done.  But I’m doing both, so script it is!  This also lets me clean out any whose books were deleted but the images left behind.

<?php
	// I have a ton of books, so need to up that time out a bit...
	ini_set('max_execution_time', 120);
	
	include_once('functions.php');
	
	// get the IDs of all our books
	$qGetBooks = "SELECT book_id
		FROM my_library_of_books_yeah_books
		ORDER BY book_id
		";
	
	$qRunSelect = $new_dbc->query($qGetBooks);
		
	if($new_dbc->error):
		echo("<p class='error'>Unable to get my_library_of_books_yeah_books<blockquote>");
		printf($new_dbc->error);
		echo("</blockquote>");
		echo '<pre>';
			print_r($qGetBooks);
		echo '</pre>';exit;
	endif;
	
	// loop the books
	while($aThisRow = $qRunSelect->fetch_assoc()):
		// set our file name to a variable - since they are all JPGs this is easy enough to do
		$fileName = 'my_library_of_books_yeah_cover_' . $aThisRow['book_id'] . '.jpg';
		
		// set the old and new file locations to a variable 
		$oldFile = $old_file_location . 'book_' . $aThisRow['book_id'] . '_large.jpg';
		$newFile = $new_file_location . 'covers_books/' . $fileName;
		
		// if the book has a cover image, copy it over to the new name and location and update our DB accordingly
		if(file_exists($oldFile)):
			if (!copy($oldFile, $newFile)):
				echo "Unable to copy " . $oldFile . ' to ' . $newFile . "!  Stopping!";
				exit;
			endif;
		
			$qUpdate = "UPDATE my_library_of_books_yeah_books
				SET i_haz_cover = true
				WHERE book_id = " . $aThisRow['book_id'];
			$new_dbc->query($qUpdate);
			
			if ($new_dbc->error):
				echo("<p class='error'>Unable to update book cover status! <blockquote>");
				print_r($qUpdate . "<br />");
				printf($new_dbc->error);
				echo("</blockquote>"); exit; // stop to avoid flood of errors if it's a global issue
			endif;
		endif;
	endwhile;
	
	echo '<p>Book covers moved and DB updated</p>';
?>

The last of the book scripts is the most complicated of the upgrade scripts.  In my existing DB, I didn’t have the book publisher data, so the easiest way to preload it is pulling it from Amazon.  As this is a longer bit of code and I’m still struggling with a few hiccups, I’ll cover it in a later post.

For now, though, this is the basics of it all.  I did a similar thing for the movies and video games, just changing table names and fields accordingly.  With all of them run, the DB is now in a usable form that we can get started with the actual application development!  Yay!