Taxonomy Term Shuffles - Hook Updates with Batch API in Drupal 7

Recently, I was tasked with changing the taxonomy terms applied to a large number of nodes, and to output the updated nodes to a file. In the interest of expediting someone else’s undertaking of a similar exercise, here are some code snippets to help the cause.

Before getting into the code, one piece that our hook will need is a map of which nodes require reassignment from one set of taxonomy IDs to another. Since this requires an editor’s discretion to reshuffle the mapping of taxonomy terms, we opted to get those mappings into a variable by reading them from a CSV file which contains two columns: the old taxonomy ID and the new taxonomy ID.

Here is an outline of the steps taken to accomplish the task at hand:

Upload a formatted CSV file and copy the rows into an array in the database

Upload a formatted CSV file. We used a file upload field to upload the CSV file and then read each row into an array (the full code block is available in a gist). For this specific task, the CSV needs to contain one column that holds the old term ID and an adjacent column that maps to the new term ID.

Batch process the form submit to read each line of the CSV and save the row into an array in the database

We had many terms to be deleted and re-mapped so we batch processed the CSV form submit to read in the rows and save the data into an array using variable_set(). The batch callback starts here in the gist.

Use a hook update to ingest the variable created by the CSV

Once we have that mapping in place and saved to an array, we can use hook_update_N() to process that value. We create an array of old tids (taxonomy IDs) and an array of new tids from the CSV variable in order to know how to reassign nodes from the old terms to the new. The indexes of each array are important for remapping tids of relevant nodes.

Get the nodes attached to old terms and re-assign new terms to them

Implement entity_metadata_wrapper to get the nodes attached to old terms and re-assign new terms to them. Our hook_update_N() implementation runs a query that pulls the nodes we need to update based on the old tids array and uses entity_metadata_wrapper to save the new term tids to each node. The following code snippets live in the .install file of our custom module.

Here’s how we grab the variable and save the row columns into arrays for the old and new tids:

$csv_rows = variable_get('{csv_variable_name}', array());
// ...
$tids_old = $tids_new = array();
foreach ($csv_rows as $row) {
$tids_old[] = $row[0];
$tids_new[] = $row[1];
}

Then we loop through the old tids to find all the nodes that have these tids attached to them and save the nids (node IDs) to an array:

foreach ($tids_old as $tid_old) {
// Get all the nids that have this term attached.
$these_nids = taxonomy_select_nodes($tid_old, FALSE, FALSE, $order = array('t.nid' => 'ASC'));
// Merge array of attached nids to master array of nids.
if (!empty($these_nids)) {
$nids_all = array_merge($these_nids, $nids_all);
}
}

In another batch process that removes the old tids of a node and saves the node with the new tids, we use entity_metadata_wrapper to find the node’s current tids:

$this_node = node_load(array_shift($sandbox['nids']));

// Load and wrap the node if it exists.
if (!empty($this_node)) {
$node_wrapper = entity_metadata_wrapper('node', $this_node);
// Get the term object values of this node's field_categories.
$node_terms = $node_wrapper->field_categories->value();
// Get the raw tids of this node's field_categories.
$node_tids_raw = $node_wrapper->field_categories->raw();

// Create new array to store updated tids per node.
if (!empty($node_tids_raw)) {
// Make sure $node_tids is an array.
$node_tids = is_array($node_tids_raw) ? $node_tids_raw : array($node_tids_raw);
$node_tids_update = $node_tids;
}
}

We then loop through the node’s tids and compare them with the old tids to see which matching tids should be removed and then save the node:

// Loop through the node's field_categories tids.
foreach ($node_tids as $delta => $tid) {
if (in_array($tid, $sandbox['tids_old'])) {
// Update the node's field_categories array of tids by removing duplicate to-be-deleted tids.
$node_tids_update = array_diff($node_tids_update, array($tid));

// Grab the corresponding key from the to-be-deleted terms array.
$key = array_search($tid, $sandbox['tids_old']);
// If the new tid is not in the nodes categories, add it.
if (!in_array($sandbox['tids_new'][$key], $node_tids_update)) {
$node_tids_update[] = $sandbox['tids_new'][$key];
}
}
}

// Return all the values from the updated field_categories array indexed numerically.
$node_tids_update = array_values($node_tids_update);

// Save the updated field_categories array to the node.
$node_wrapper->field_categories->set($node_tids_update);
$node_wrapper->save();

Save data about updated nodes for reporting

With each iteration of the loop in our batch process, we want to provide some reporting to track which nodes were updated with the new tids so we save the updated tids into a sandbox variable:

$sandbox['csv_nids'][] = array(
$this_nid,
drupal_get_path_alias('node/' . $this_nid),
'"' . implode(" / ", $node_tids) . '"',
'"' . implode(" / ", $node_tids_update) . '"',
);

Set up redirects from the old taxonomy terms to the new taxonomy terms

Once the batches are complete, we set up 301 redirects from the old tids to the new tids. Here is the the code block in the gist handling the redirects.

Add reporting on the redirects

We also want to provide reporting on these new redirects so we save the data to another sandbox array for outputting to CSV. Here is the code snippet that saves the redirects to a sandbox variable.

Once all the variables for this new CSV are ready, we can use a custom function to take those sandbox variables as parameters and generate the new CSV. Here’s how it’s called:

// Output updated tids/nids to csv file.
_my_module_create_csv($sandbox['csv_tids'], $csv_tid_headers, 'tids-to-delete');
_my_module_create_csv($sandbox['csv_nids'], $csv_nid_headers, 'nids-updated');

And here is the custom function in the gist for reference.


To wrap up, we ingested a CSV file by importing each row as an array of its columns into a variable in the database. Then we ran a hook update to parse the variable derived from the CSV into other variables needed to remap new taxonomy IDs to nodes attached to old taxonomy IDs marked for deletion using Batch API. Next we created redirects of the old taxonomy terms to the new taxonomy terms. Finally, we wrote the changes of updated nodes and taxonomy term redirects to CSV for reporting. The net result is all the nodes tagged with the old taxonomy terms were successfully reassigned to the new taxonomy terms.

Roadmap Your Drupal 7 Transition

We’re offering free 45 minute working sessions to help you assess your organizations level of risk, roadmap your transition plan, and identify viable options!

Drop us a note, and we’ll reach out to schedule a time.