LET’S FIX STUFF AND BUILD THINGS. MISTAKES ARE OK.
At DrupalCon Austin, Kristen Pol presented a practical and inspiring session called "Drupal Site Tuneup -
Redundant/Duplicate Fields: Inefficiency and Sorrow
One of our early Drupal sites was created with two date fields for different content types, and we carried on the same hierarchy of types and fields in a later project with similar content. We had 90,000 nodes by the time we realized that our various date fields, separated by content type, were used for the same purposes and were in fact redundant. Not just awkward, this structure was beginning to affect content listings -
We needed to combine these fields into a single canonical date field. It was an intimidating prospect: we'd have to get this conversion done on a live site seeing a fair amount of traffic, so we had to avoid significant slowdowns and protect existing data from loss.
VBO and Rules to the Rescue
We settled on Views Bulk Operations + Rules as safest approach. VBO provides the option of processing nodes on-
Since Rules can perform all kinds of actions on nodes (optionally under specific conditions), and since VBO can execute Rules components directly on the nodes returned by a views query, it's extremely useful for safely testing and performing the work of processing field changes and updates.
Of course, you can simply write your own batch operation in a custom module, but the Rules + VBO method provides easy GUI-
If you're comfortable with writing your own batch operations, go for it. This post will discuss the kinder, gentler GUI option for cases where a safety net is more important than speed (as in our example case), or where working via the admin interface is more convenient.
We encountered no difficulties using this method -
Here are the general steps to combining several existing fields. In this case we're assuming redundant essentially-
It may be easiest to select the field on your largest node set as the canonical field. If field "date_1" belongs to 10,000 nodes on content type A, and content type B and C have only a few hundred nodes each, "date_1" is probably the best candidate for your consolidated data. You could also create a new field for all three types, if you wanted to change the original field settings.
For each content type that needs its data moved to the new field:
Add the canonical field to the content type
If editors will be working on content during this time, you may want to relabel or "hide" the new field on node edit pages (e.g. in its own fieldgroup).
Create a new "copy field" rules component taking a node as a parameter.
Add condition: node has [canonical field] (e.g. "date_1")
Add condition: node has [redundant field] (e.g. "date_2")
Add action: set data value: [redundant field] => [canonical field]
If more than a few nodes of this type will be created during the consolidation, you may want to create a rule which runs this "copy field" component as an action whenever a node of this type is saved. This will prevent your needing to go back and update any new nodes after the initial conversion is complete.
Set up a view which loads all nodes of this content type. Display results as a table, adding both the new and legacy date field (so that you can observe which rows have been completed -
Add a VBO operation to this view, executing the "copy field" component -
Run the consolidation for this content type from the view page.
If you're enqueuing the operation, you can use the "select all XXXX rows" button and allow the operation to run in small batches each cron run until complete. Otherwise, you can set the views pager to the number of items you're willing to process at once (large batch operations can really slow down a site), and select and process all rows on each page, one page (set) at a time.
Once all your content types have the new field filled with legacy field data, you can give the new field prominence on edit pages (if you relabeled or moved it before), and you are free to change node display fields and views fields/sorts/filters to reflect the new, consolidated field.
Notes and Considerations
Because our case involved a very large amount of nodes, we did not delete the original "legacy" date fields. You may be able to do this safely, but do make backups first, etc. We wanted to observe the data for some time afterwards before doing anything irreversible.
If you're running a VBO operation as a background batch process, you can use the Queue UI module to observe progress -
We noticed increasingly sluggish server response times a few days after we began a background batch operation on several tens of thousands of nodes. We weren't able to prove it was the operation, but halted it via Queue UI and switched to processing batches "live", 500 at a time.
Occasionally, I have found that a VBO view doesn't complete execution of a rules component when the operation is run, but does correctly run components triggered by a "Save" operation. I'm not sure why this is -
When fields change, you may need to adjust which fields are indexed or displayed in search results; you may also need to rebuild the search index.