Generating 1000+ Row Spreadsheets; Hint: Don't Use AddRows!


This was apparently our month for Excel creation fun.  Last week, on top of the whole maximum width issue, we've been struggling with some performance issues.  In particular, we have a set of eight spreadsheets that we have to run every 24 hours.  They all have the same 24 columns on a single sheet, but the number of rows of data varied. In the smallest sheet, there are only about 70 rows, while the largest had over 5,300.  We made use of threading and schedule tasks so that they would overnight when we typically have minimum application usage.  For the most part they all ran, though slow once it passed the 1000 record mark or so.

The three biggest of these spreadsheets have approximately 1700, 1900, and 5300 records, respectively.  The first two took about 5 minutes to run or so, but they finished.  That last one, though, Mr. 5300+ rows, was killing our servers. And I mean really killing it – it was consuming every ounce of CPU and RAM on the system to the point we'd have to restart the ColdFusion instance.  It was also not quite finishing.  It would make the file (about 1.2-1.6 MB XLSX), but it was always a bit corrupted per Office 2010 and if we didn't grab it before too long, ColdFusion would blank it.   Yes, even though it was at this point dying and sucking the server dry, it would still manage to blank the populated but slightly corrupted file it made even though it wouldn't free up its' resources.

Suffice to say, we could not have that happening every night. But if we didn't run it, then the users themselves would kill the server asking for that file because the system would automatically generate a new one if the existing one was more than 24 hours old.  So we had to manually touch the last good one we got to keep it "fresh" while we struggled with a solution.

I worked on this mess for almost 2 days – trying all sorts of stuff.  The ColdFusion logs had heap errors and out of memory errors, so of course I looked to the JVM settings to try tweaks that might help.  Our server admin upped our development box to 2 CPUs to see if throwing more power would help (it has 8 GB of RAM, for the curious).  Nothing I tried worked and a lot of the stuff I found searching around might as well have been written in Japanese.  I mean Oracle seems to have a great doc on JVM tuning, but only if you actually understand it all.  Really, I need a "JVM tuning for idiots guide."

But muddling through what we could find, none of the setting tweaks we searched for helped.  I also found multiple forum posts about cfspreadsheet and ColdFusion having issues with large data sets, which this really isn't but it is for the purposes of making a spreadsheet.  On day two, I was getting to the point of more desperate measures: like switching to a CSV for this one file.  It certainly had benefits – significantly smaller file, ran in less than a minute versus 10 even with it writing line by line, and they didn't have any fancy formatting needs so nothing broke.  CSV of course then made it different from all the others and we did lose the "pretty" formatting on the header.

Still, with all other options gone, we started to think that might be the only solution. Then I finally stumbled on blog post by someone who also had performance issues with records like this and his amazingly simple fix that illustrated something I previously was unaware you could do:  rather than going through the process of using AddRows like we were, dump the query straight to the write function!

In other words, change this:

<cfset oSpreadsheet = SpreadsheetNew(sheetName, "true") />
<!--- format the column headers --->
<cfset sCellFormatting = StructNew() />
<cfset sCellFormatting.bold = "true" />
<cfset sCellFormatting.bottomborder = "thin" />
<cfset sCellFormatting.bottombordercolor = "black" />
<cfset sCellFormatting.fgcolor = "light_yellow" />
<cfset SpreadsheetAddRow(oSpreadsheet, headers) />
<cfset SpreadsheetFormatCellRange(oSpreadsheet, sCellFormatting, 1, 1, 1, ListLen(headers)) />
<cfset SpreadsheetAddRows(oSpreadsheet, qListContents) />

<!--- left align all columns codes --->
<cfset sCellFormatting = StructNew() />
<cfset sCellFormatting.alignment = "left" />
<cfset SpreadsheetFormatColumns(oSpreadsheet, sCellFormatting, "1-#ListLen(headers)#") />
<cfspreadsheet action="write" filename="#fullFilePath#" name="oSpreadsheet" sheetname="#sheetName#" overwrite="true" />

To this:

<cfspreadsheet action="write" filename="#fullFilePath#" query="qListContents" sheetname="#sheetName#" overwrite="true"  />

Yeah, one line…one line to do it!  And holy cow did that thing run so much faster!! It was only slightly slower than generating a CSV. And we still had an XLSX sheet so we didn't have to deal with explaining to people why one was "different". The only draw back was the header row sucked because it just used whatever the field names were from the CFQUERY tag. So I decided to try a little experiment to see if I could fix the header without going back to the issue we had before.

<cfspreadsheet action="write" filename="#fullFilePath#" query="qListContents" sheetname="#sheetName#" overwrite="true"  />
<cfspreadsheet action="read" src="#fullFilePath#" name="oSpreadSheet"  />

<!--- format the column headers --->
<cfset SpreadsheetDeleteRow(oSpreadsheet, 1) />
<cfset sCellFormatting = StructNew() />
<cfset sCellFormatting.bold = "true" />
<cfset sCellFormatting.bottomborder = "thin" />
<cfset sCellFormatting.bottombordercolor = "black" />
<cfset sCellFormatting.fgcolor = "light_yellow" />
<cfset SpreadsheetAddRow(oSpreadsheet, headers, 1, 1, false) />

<cfset SpreadsheetFormatCellRange(oSpreadsheet, sCellFormatting, 1, 1, 1, ListLen(headers)) />
<cfspreadsheet action="write" filename="#fullFilePath#" name="oSpreadsheet" sheetname="#sheetName#" overwrite="true" />

It adds another maybe 30 seconds of processing time, but it works! Our users get the consistent results they expect, our server isn't being vampired dry, and we learned two new tricks this week that may help us greatly improve other spreadsheet generating areas of our applications.