Progress of the migration of news site data

Posted by Lacy Paschal on Friday, April 30, 2010 in Info.

We’ve been chugging along on the Vanderbilt News site migration over the past several weeks. Once we finally made the decision that we would definitely be going with WordPress as the backend, we could finally start working out the process for migrating approximately 4,000 pages from the old CMS.

It was a 4 step process — which we did for sets of 100 pages at a time (that was the maximum amount of xml items we could pull from our source data at a time).

First we take the existing XML from the current news site — pulling 100 pages at a time.
We pass that XML through an XSL stylesheet we created that parses the original XML format and translates it into the format of a wordpress XML import file.
Run a series of cleanup “find and replace” — removing microsoft word code (oh MSO how we loathe thee), font tags, empty divs, etc.
Import the newly created XML file into wordpress. Confirm all the tags and meta data transferered over.

Rinse and repeat. 40+ times.

Tags: brand bar, news, wordpress, xslt