Recursive Queries With SQL…WHAT!?!


In our application, AgriLife People (ALP), one thing we needed was to be able to build a hierarchical chain of a unit and all its child units.  This is primarily used for cascading down management rights and for building contact lists.  In previous iterations of the application, we cheated by basically writing a deep self-referencing query that basically dug down about two levels deep.  This worked because that’s all the levels the units went at the time, and most were either a standalone unit or just had a single child.

But then one of our agencies did a big clean up of their unit structure last year, and in doing so, created a significantly deeper set of unit levels – four levels deep at some spots.  So yeah, our “hack” stopped working.  We’re in the process of recoding that application for PHP7 and ZendFramework 3 anyway, as well as moving to MariaDB, and so we decided we needed to find a better solution to the issue than our hacked up query.

We discussed using a script to build a nightly table of the units or something like that or using recursive queries in the code to just keep going for each unit, but of course, those methods also had flaws, in either timeliness of the data when units changed (though this is admittedly not often) and in performance.  We also found that each solution we came up with fixed the problem for one usage case, but not another.

My partner off-handled noted how nice it would be if we could just do the recursive query in SQL, and we both laughed, then decided to take a mental break for a few minutes to come back and reapproach with fresh minds.  While I was catching up on email, on a whim I Googled recursive queries in SQL…and holy shit, it exists!  Specifically, through the use of what are known as “common table expressions”, or CTEs, it was quite possible to build a recursive query to do what we needed.

Suffice to say, I called over my partner, showed him, and we sat down and started trying it.  It took us a few starts and stops, as the MariaDB documentation gives an example, but not necessarily with the clearest breakdown for someone new to the technology.  So we had a few moments of frustration before we finally figured out which parts were which and how to then build the query itself.

First, the relevant parts of the table design:

CREATE TABLE units (
	unit_id int(10) unsigned NOT NULL AUTO_INCREMENT,
	unit_name varchar(180) NOT NULL,
	parent_unit_id int(10) unsigned DEFAULT NULL COMMENT 'NULL for has no parent; else unit id',
	unit_is_active tinyint(1) NOT NULL DEFAULT 1,
	PRIMARY KEY (unit_id),
	CONSTRAINT fkc_units_unit_id FOREIGN KEY (parent_unit_id) REFERENCES units (unit_id) ON UPDATE CASCADE
) ENGINE=InnoDB CHARSET=utf8;

And now, the query that will get all of our units with their parents, as well as what their level is.

WITH RECURSIVE child_units AS 
( 
	SELECT unit_id, unit_name, parent_unit_id, 1 as depth_level
	FROM units
	WHERE unit_is_active = 1
	AND parent_unit_id IS NULL
	  
	UNION ALL
		  
	SELECT gcu.unit_id, gcu.unit_name, gcu.parent_unit_id, depth_level+1
	FROM child_units INNER JOIN units gcu ON gcu.parent_unit_id = child_units.unit_id 
		AND gcu.unit_is_active = 1
)
SELECT unit_id, unit_name, parent_unit_id, depth_level
FROM child_units

Now, that was great for getting us all of the units with all ancestors, but it really isn’t very usable itself, since we typically need just the ancestors of a specified unit. So then we tweaked the query that allowed us to get just a specific unit and it’s ancestors by changing our where statement in the first query (known as the anchor query).

WITH RECURSIVE child_units AS 
( 
	SELECT unit_id, unit_name, parent_unit_id, 1 as depth_level
	FROM units
	WHERE unit_is_active = 1
		AND unit_id = 387
	  
	UNION ALL
		  
	SELECT gcu.unit_id, gcu.unit_name, gcu.parent_unit_id, depth_level+1
	FROM child_units INNER JOIN units gcu ON gcu.parent_unit_id = child_units.unit_id 
		AND gcu.unit_is_active = 1
)
SELECT unit_id, unit_name, parent_unit_id, depth_level
FROM child_units

So that gets us one unit and all its ancestors:

But, that still isn’t very usable in our ZendFramework application, since we need to pass in the unit ID rather than it being hardcoded, and Zend’s query builders really didn’t like CTEs LOL We also needed to be able to call it a few ways, so we ended up settling on setting up a stored procedure to accept the passed in input and return the results.

DROP PROCEDURE IF EXISTS getUnitAncestors;

DELIMITER //
CREATE PROCEDURE getUnitAncestors (IN target_unit_id INT)
BEGIN
	WITH RECURSIVE child_units AS 
	( 
		SELECT unit_id, unit_name, parent_unit_id, 1 as depth_level
		FROM units
		WHERE unit_id = target_unit_id
		   AND unit_is_active = 1
		  
		UNION ALL
		      
		SELECT gcu.unit_id, gcu.unit_name, gcu.parent_unit_id, depth_level+1
		FROM child_units cu INNER JOIN units gcu ON gcu.parent_unit_id = cu.unit_id 
		AND gcu.unit_is_active = 1
	)
	SELECT unit_id, unit_name, parent_unit_id, depth_level
	FROM child_units
	WHERE unit_id != target_unit_id
	ORDER BY parent_unit_id, unit_name
;
END;
//
DELIMITER ;

And with that, we call the procedure from within Zend and get the results. Now, one curious bit on calling a stored procedure with Zend is you have to remember to deal with the cursor after getting the results stored, otherwise your next query will fail in spectacular fashion.

$select_query = "CALL Units_GetAncestors({$unit_id})";
$prepared_db_statement = $this->db_adapter->createStatement($select_query);
$query_results = $prepared_db_statement->execute();

$child_units = new ResultSet\ResultSet();
$child_units->initialize($query_results)->buffer();                                    
$child_units = $child_units->toArray();                                                                        

$prepared_db_statement->getResource()->closeCursor();

Now, the one drawback here is that because we aren’t using Zend’s SQL functions, we aren’t getting the usual input validation that you’d get in your where statement or the like. Fortunately, we can do a prepared statement, so minor change…

$select_query = "CALL Units_GetAncestors(:unit_id)";
$prepared_db_statement = $this->db_adapter->createStatement($select_query);
$query_results = $prepared_db_statement->execute(['unit_id' => $unit_id]);

$child_units = new ResultSet\ResultSet();
$child_units->initialize($query_results)->buffer();                                    
$child_units = $child_units->toArray();                                                                        

$prepared_db_statement->getResource()->closeCursor();

And now it’s totally usable and we can now properly cascade down stuff regardless of how many levels the units go! Woo hoo! Already looking forward to using this recursive functionality to implement another long-requested feature: being able to quickly pull up the hierarchy of an employer and their subordinates!