LET’S FIX STUFF AND BUILD THINGS. MISTAKES ARE OK.
We recently needed to pull a small amount of content from Tumblr -
I tested with Feeds at first, but even with Feeds Tamper, special field handling (e.g. URLs included inside of HTML) necessitated extra tweaking and code. Since the Migrate module accesses data more directly to begin with (and automatically produces a resuable migration module) I switched to Migrate.
I used two feeds to pull data, since Tumblr’s API made some fields more accessible than others (Audio) or provided them inconsistently (Title).
The tumblr2wordpress application online provides a Wordpress-
I found this feed to be less user-
It is possible to grab all post feeds (50 posts each) programmatically, via curl. This handy command does the trick:
This will pull posts from the bolded blog shortname in the URL, to a max of the bolded number (example above is 700) and will save them as filename (bolded) + “1,” “2,” etc. From the 2nd+ files, you can easily cut and paste all but the header and closing RSS tags into the first file to get a single XML file.
Now you have two convenient XML files: the WP-
Only one content type was needed (and it’s a Drupal default): Article.
Audio (file field)
Video (file field)
The audio and video fields are file fields, which take a filename/URL. They can have extra subfields, such as “caption.”
This allows files to be attached either by upload or remote URL, and to be reused rather than permanently associated with a post. The subfields such as “caption” are thus also associated with the file, not the post, so can remain consistent if audio is reused.
If you haven't used it before, the Migrate module takes some getting used to, as it requires building classes for each migration, mapping source -
I created two migration classes, for:
* Tumblr API
The first class is very straightforward. It pulls in XML files from the xml folder in the module, and uses any which fit the T2WP format to generate title, a Tumblr post ID, post body, and tags. The T2WP seemed to provide title and body fields more consistently. No special processing is needed: run the migration with basic field mappings and you have posts. A few will not have titles, but in our case most did. It would probably also be possible to keep the tumblr slugs as path aliases (e.g. /this-
The second class uses the Tumblr API XML feed to pull in audio and video information and attach it to the posts pulled in via the T2WP feed. It uses the post ID from the first class to check which nodes already exist and to update them.
It pulls the video and audio embed code, and extracts the a/v URL, which is used to create the file entity and attach it to the node. This is where Feeds became more difficult than useful, as it isn't good for conditional processing (if field X exists, do this) – so you may need different feed imports for each post type using feeds – and Feeds doesn’t handle arbitrary data manipulation very well (Feeds Tamper is awesome but still limited). The audio id3-
Both migrations are in the same “group” in the module, called Tumblr Migrations. From the Content -
I need to alter the way photos are processed, because I’ve learned that some have multiple sets with their own captions, while others have no photo-
I have not added audio or video captions to their files, since in our case the "captions" were essentially the same as the body of the post (i.e. contained both text and a/v embed codes). If there is concise, media-
At the end we have our Tumblr posts and a module and documentation we can use again.
The tumblrmigrate module attached to this post was my first foray into writing Migrate classes -