Import raw content (nodes, users) into Drupal

Published in: 

(This article is made specifically for showing how to get raw content into Drupal, but the methods below can likely be adapted for other use cases [we also imported a list of users using these techniques for instance].)

Importing raw content into Drupal, can be, well let's just say it's not always easy. In two years of experience with Drupal we've imported a MoveableType site, a Blogspot site, content from one Drupal site to the other, and now raw data from a csv file (raw meaning that it did not come from some other CMS/website), and we've used everything from custom scripts, several different Drupal modules along the way including import-exportapi.module, userimport.module, nodeimport.module, WordPresstoDrupal.module.With the exception of the WordPresstoDrupal.module we've been mostly disappointed with the results of our experiments with the Drupal based import options. That said, this may not be the best method to start off with if you can find a ready-made solution that works for you, since it is somewhat time consuming.

Besides being a somewhat complex thing to begin with, the issues around importing content into a Drupal installation are exacerbated by the wide array of variables any point and click solution has to contend with. It's impossible for any module developer to be able to keep up with the number of different/custom Drupal content types for even one version of Drupal, let alone for all new versions of Drupal.

In short, for the serious site-operator/Drupal developer getting raw content imported to Drupal is always going to be a problem because one size is never going to fit all (the 'bad' side of such a customizable platform). This is important to understand in order to fully appreciate the workaround below, which is not particularly easy or fast, but is 99.9% reliable from a data integriy standpoint.

"How's that," you say? Well, because the following method does not actually "import" anything at all, at least not in the traditional sense that most people think of when adding new content into a database. With this method you'll create complete Drupal nodes with SQL, which you will then simply place it in your existing database. Nothing is automatically calculated, scripted, or extrapolated - which is exactly why this method is not easy or particularly fast, but also why it is precise:

In brief:
Create two separate sql import files. One for the "node" table of your database, and another for the "node_revisions" table. The last step will be to update the "sequences" table to let the system 'know' about the new content.

(these "node" and "node_revisions" tables are used only for certain module based content types not for cck nodes - though you could adapt these instructions to create new cck nodes by figuring out what tables you need to populate and using the techniques below)

What you need:

  • A spreadsheet program. This is likely where you content originated from since it's raw. Used to manipulate/arrange the data and create numbering sequences for the sql file.
  • A good non rich-text editor. We used a combination of BBEdit (to prefix/suffix columms and search and replace) as well as Textmate (to syntax check the sql before importing). We don't recommend trying this without an syntax checker of some sort (try downloading "Komodo Edit" which is free and made for a variety of platforms if you don't have TextMate).
  • Access to your database and a SQL import tool (we used phpMyAdmin)
  • (*note: backup your database BEFORE trying anything which will alter your existing data)

Steps:

1. Get sample sql to model your file after
Download sample sql from your database's "node" table and "node_revisions" table to base the formatting of our own file off of. This can be done by going to the "export" tab in phpMyAdmin after selecting each table individually (you don't want the sql for your entire site).

The idea is to copy out one row from each of the files to use as a template for your own custom sql file. In this article we will only use the "node_revisions" table in our examples, but the concepts are exactly the same for the "node" table data. (the "node" table is actually much simpler to make, which is why we are using the harder example, but you will need make files for them both to successfully add your new content)

Example SQL from phpMyAdmin export:
INSERT INTO `node_revisions` VALUES (116, 116, 1, 'Title of post', 'Body', 'Teaser', '', 1162300341, 3);

With this information your job is now to create a spread sheet file which mirrors the order of this format. Your spreadsheet should have 11 rows. Here is a  screenshot of what this should look like.

2. Prepare/populate your template
Next once you have you have basic spreadsheet template you will need to populate it with your own data. Most of this is extremely easy at this stage (basic spreadsheet skills are assumed):

  • Columns 1, 8, 10, and 11 can likely be just duplicated down all rows unless you have very unique needs (if so you probably realize this already).
  • Columns 2, 3, 4, 9 can be automatically made by simply creating a calculation in the first field that looks like somenumber+1 and dragging/pasting it down all rows (where "somenumber" is the appropriate number to begin on based on looking at the sequences table in your database - your beginning number needs to be higher than whatever number is contained in the corresponding field inside of the "sequences" table) [be sure that the nid, vid, and uid/author between the "node" table file you make and the "node revision" file match!]
  • ...the rest of the columns (5, 6, 7) should be simple cut and paste maneuvers from your existing spreadsheet. Be sure to wrap columns 6 and 7 in whatever html/css markup you would like (if any).

Once you have everything looking right and ordered properly export your file in csv format, and change the extension to from csv to sql, and open the new file in a proper text editor - which really, really, should have sql syntax checking.

3. Finalize formatting
Depending on your spreadsheet program there will be a small variety of punctuation missing and/or added to your file, which will keep it from importing properly. This is where it is up to you to figure out how to effectively do this since each case is slightly different.

Continually comparing/referring to your model (the exported SQL from step one - you know it's correct since it came directly from your database), smart use of search and replace, some forethought, and color coded SQL syntax checking should get you through to the end without too many gray hairs, hopefully. Tinker with your files until you have them formatted EXACTLY like the sample sql you downloaded from your database AND until your SQL syntax checker shows that your are clean. (on sql syntax tip worth noting is that any place an apostrophe appears you will need to have another apostrophe next to it - or else it will cause a fatal error upon import)

4. After you believe your new SQL is totally clean it is time to try and import it into the database using phpMyAdmin's SQL query function. (you need to have the main database page selected and then click the SQL tab):

Simply copy out the code from your file, paste it into the query window and hit go/submit. With a lot of luck it phpMyAdmin will give you a message that the query successfully ran and your perfectly formatted new content will now be in your database. If you are importing a lot of content, it is likely that you'll get an error message remarking something along the lines of

#1064 - You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'blah-blah-blah' at line 1

Don't fret, just simply go to your file and find the line the error message referred to and figure out what the problem is. It's likely to be either a missing/extra apostrophe or perhaps you accidently left out an entire column of information. Once you isolate and fix the problem, simply copy everything from that entry down in your file and paste in the phpMyAdmin SQL query window again and repeat this process until you have successfully entered everything - there's no need to worry about any of the rows which occurred before the error since they were successfully added already. (*tip: bookmark the page for the phpMyAdmin SQL query window - it will save lot waiting an click back and forth)

You can console yourself that though this process can be a bit tedious that all the imposed perfect/error/syntax checking all but ensure the integrity of your newly added data.

5. Update your sequences table and test to be sure everything is working properly
Once you have successfully imported your SQL for both the "node" and "node_revisions" the last step is to update the "sequences" table of your database appropriately so that Drupal 'knows' the new place to start counting from when a used creates new content. Edit appropriately/correspondingly to the nid and vid values you used.

Last of of all test drive your newly infused site and be sure that you can view, edit, and create new node successfully and without errors.

29 January, 2007

Comments

Thank you so much for aid, I need to do a fast import and thought I screwed up my db because I only update node and node_revisiosn (forgot the sequences). Kudos :)

It looks like this becomes prohibitively complex if your content needs to have custom fields and categories.

Thank you! I have successfully imported SQL for both the "node" and "node_revisions" and then updated the the database. I just a little confused with editing to the nid and vid values.
John Edwards, For Sale By Owner Realty Inc. Web-programmer

I don't know what it is about Drupal, but when it comes down to moving info from anything else to it, it just is a pain in the neck... Thanks for the post.

Jenn :)

How much thank's... But i want to say "Thank" to you to!
Drupal is very usefull think and this topic is helpfull for beginers.

I used Drupal for a while, but didn't really like it. It was a pain changing hosts and tweaking it, so I moved to custom build CMS for that site.
Anyway, I may give it a second try after reading this post.

You're in for PAIN! Much pain and suffering await he that dares create a node outside the Drupal framework. What is that you ask? What is the sql statement for... shush your mouth! Drupal cares not what data you wish to bring to the table. This is Drupals table. Over there is a chair for Dries. This chair is for me. There are no chairs left for you and that hilarious data under your arm. Goodbye.

Have you had any experience doing this with Drupal 6? I notice that the sequences table no longer exists, which seems to cause some problems.

...but I haven't done any mass imports into it. Interesting about the sequences table.

I am working on a migration for one of my clients. I looked at blogapi_blogger_new_post and am in the process of modifying it to work with my content types that were created with CCK.

I am not ready to post code yet but if anyone is interested feel free to reach out to me.

Rich

I?ll try this week to import users into drupal 6.4 , they are all companies to generate usernamen and passowrd into drupal 6.4

First, great post. I didn't find it until after I transferred my data, but thought I'd share my $0.02 worth. I'm in the process of migrating a university website (1000+ pages) to Drupal 6. Some of the site is just HTML pages, and we had a Drupal 5 installation used for publishing/archiving press releases.

Our Drupal 5 install was simple: pages, stories, weblinks (http://drupal.org/project/webl...)

Our Drupal 6 install is more complex, but I decided to try the import early in the development process (so if I screwed it all up it wouldn't be a great loss).

In phpMyAdmin, I ran queries to the one below to update my `nid` and `vid` fields in the `node` and `node_revisions` tables.

UPDATE `node`, `node_revisions`
SET `nid` = `nid` + 100
WHERE `node`.`type` = "story"

This ensured that there were no conflicts with my few existing nodes in the new install. I then exported the `node` and `node_revisions` tables and saved them as .txt documents. My Drupal 6 table names are different from Drupal 5. Using notepad, I replaced (Ctrl+H) "`node`" with "`drup_node`" and "`node_revisions`" with "`drup_node_revisions`"

I imported both tables (again using phpMyAdmin), checked my tables and logged into the Drupal 6 site to find my content residing happily in Drupal 6. (sequences table apparently was not needed)

I wouldn't recommend trying this on a production site, or in a database that contains lots of important data. Back up your database first. Or create a copy of your database to try the import.

Is there a module that synchronizes two Drupal sites, migrating nodes created one in the other? My sites are D5 -> D6. I'd like to preserve URL aliases if possible. Mainly I'm synchronizing blogs, with all the related comment/user/node issues. I've seen some ideas out there (Node Export with Node Import), but I'd like to know if anyone has done this simply or has some clever idea

I was pulling my hair out wondering why my database import of rows for the node table was not producing editable nodes. Thank you for sharing this. You saved my Saturday.

Hi,

you can use dbTube.org for a bulk import of excel sheets
into your database. It'S very simple to handle.

greetings

Frank