DRUPAL TOOLBOX

LET’S FIX STUFF AND BUILD THINGS. MISTAKES ARE OK.

We recently needed to pull a small amount of content from Tumblr - about 500 posts - into Drupal.  As of this writing there doesn't appear to be a pre-existing module for such an import, so we investigated Feeds and Migrate.  Migrate proved more flexible and provided reusable code - a simple module is attached to this post.


Method


I tested with Feeds at first, but even with Feeds Tamper, special field handling (e.g. URLs included inside of HTML) necessitated extra tweaking and code.  Since the Migrate module accesses data more directly to begin with (and automatically produces a resuable migration module) I switched to Migrate.

 


Tumblr Feeds


I used two feeds to pull data, since Tumblr’s API made some fields more accessible than others (Audio) or provided them inconsistently (Title).


Tumblr2Wordpress

The tumblr2wordpress application online provides a Wordpress-friendly XML feed of all posts, including all basic info easily accessible: title, date, body, tags.  This alone would be suitable for A/V-light blogs.  (Thanks go to Hao Chen and Ben Ward for building and maintaining this application.) The address for the generator is: http://tumblr2wordpress.benapps.net/


Tumblr API

I found this feed to be less user-friendly, provide elements such as titles and slugs less consistently, and by default only provides a subset of posts at a time.  However, it provides distinct subfields for audio and video (captions, id3 tags).


It is possible to grab all post feeds (50 posts each) programmatically, via curl.  This handy command does the trick:

curl "http://[blog-shortname].tumblr.com/api/read?start=[0-700:50]&num=50" -o "tumblr-api-[blog-shortname]-#1.xml"


This will pull posts from the bolded blog shortname in the URL, to a max of the bolded number (example above is 700) and will save them as filename (bolded) + “1,” “2,” etc.  From the 2nd+ files, you can easily cut and paste all but the header and closing RSS tags into the first file to get a single XML file.


Now you have two convenient XML files: the WP-friendly feed with basic post info, and the Tumblr version with detailed media fields.


Content types:

Only one content type was needed (and it’s a Drupal default): Article.


Fields:

Title       (default)

Body      (default)

Tags       (default)

Images (default)

Audio    (file field)

Video    (file field)


The audio and video fields are file fields, which take a filename/URL.  They can have extra subfields, such as “caption.”

This allows files to be attached either by upload or remote URL, and to be reused rather than permanently associated with a post.  The subfields such as “caption” are thus also associated with the file, not the post, so can remain consistent if audio is reused.


Migration

If you haven't used it before, the Migrate module takes some getting used to, as it requires building classes for each migration, mapping source -> destination fields, and processing rows or fields as needed.  However, it is therefore infinitely flexible, and the approach requires writing a migration module, at the end you have your very own reusable Tumblr migration module.


I created two migration classes, for:

* Tumblr2Wordpress

* Tumblr API


The first class is very straightforward.  It pulls in XML files from the xml folder in the module, and uses any which fit the T2WP format to generate title, a Tumblr post ID, post body, and tags.  The T2WP seemed to provide title and body fields more consistently.  No special processing is needed:  run the migration with basic field mappings and you have posts.  A few will not have titles, but in our case most did.  It would probably also be possible to keep the tumblr slugs as path aliases (e.g. /this-is-the-post-slug) for slug/URL consistency if desired.


The second class uses the Tumblr API XML feed to pull in audio and video information and attach it to the posts pulled in via the T2WP feed.  It uses the post ID from the first class to check which nodes already exist and to update them.


It pulls the video and audio embed code, and extracts the a/v URL, which is used to create the file entity and attach it to the node.  This is where Feeds became more difficult than useful, as it isn't good for conditional processing (if field X exists, do this) – so you may need different feed imports for each post type using feeds – and Feeds doesn’t handle arbitrary data manipulation very well (Feeds Tamper is awesome but still limited).   The audio id3-title is used to title the audio file descriptively in the system.


Procedure

Both migrations are in the same “group” in the module, called Tumblr Migrations.  From the Content -> Migrate page with this module enabled, you can check the box next to Tumblr Migrations, click on Import (optionally setting a max number to import for testing), and posts will be built from both XML files.


Notes


    I need to alter the way photos are processed, because I’ve learned that some have multiple sets with their own captions, while others have no photo-specific captions.  The first set I tried had one caption for the set with several uncaptioned photos, but others need to have captions handled individually.

    I have not added audio or video captions to their files, since in our case the "captions" were essentially the same as the body of the post (i.e. contained both text and a/v embed codes).  If there is concise, media-specific data to grab and store, we can do that.  I did grab audio id3-title for the filenames.



Results

At the end we have our Tumblr posts and a module and documentation we can use again.


The tumblrmigrate module attached to this post was my first foray into writing Migrate classes - so it isn't too polished.  Still, it works, and it's simple enough that it may provide a good early exploration of Migrate for someone else, too.


Migrating from Tumblr to Drupal