Import raw content (nodes, users) into Drupal
(This article is made specifically for showing how to get raw content into Drupal, but the methods below can likely be adapted for other use cases [we also imported a list of users using these techniques for instance].)
Importing raw content into Drupal, can be, well let's just say it's not always easy. In two years of experience with Drupal we've imported a MoveableType site, a Blogspot site, content from one Drupal site to the other, and now raw data from a csv file (raw meaning that it did not come from some other CMS/website), and we've used everything from custom scripts, several different Drupal modules along the way including import-exportapi.module, userimport.module, nodeimport.module, WordPresstoDrupal.module.With the exception of the WordPresstoDrupal.module we've been mostly disappointed with the results of our experiments with the Drupal based import options. That said, this may not be the best method to start off with if you can find a ready-made solution that works for you, since it is somewhat time consuming.
Besides being a somewhat complex thing to begin with, the issues around importing content into a Drupal installation are exacerbated by the wide array of variables any point and click solution has to contend with. It's impossible for any module developer to be able to keep up with the number of different/custom Drupal content types for even one version of Drupal, let alone for all new versions of Drupal.
In short, for the serious site-operator/Drupal developer getting raw content imported to Drupal is always going to be a problem because one size is never going to fit all (the 'bad' side of such a customizable platform). This is important to understand in order to fully appreciate the workaround below, which is not particularly easy or fast, but is 99.9% reliable from a data integriy standpoint.
"How's that," you say? Well, because the following method does not actually "import" anything at all, at least not in the traditional sense that most people think of when adding new content into a database. With this method you'll create complete Drupal nodes with SQL, which you will then simply place it in your existing database. Nothing is automatically calculated, scripted, or extrapolated - which is exactly why this method is not easy or particularly fast, but also why it is precise:
Create two separate sql import files. One for the "node" table of your database, and another for the "node_revisions" table. The last step will be to update the "sequences" table to let the system 'know' about the new content.
(these "node" and "node_revisions" tables are used only for certain module based content types not for cck nodes - though you could adapt these instructions to create new cck nodes by figuring out what tables you need to populate and using the techniques below)
What you need:
- A spreadsheet program. This is likely where you content originated from since it's raw. Used to manipulate/arrange the data and create numbering sequences for the sql file.
- A good non rich-text editor. We used a combination of BBEdit (to prefix/suffix columms and search and replace) as well as Textmate (to syntax check the sql before importing). We don't recommend trying this without an syntax checker of some sort (try downloading "Komodo Edit" which is free and made for a variety of platforms if you don't have TextMate).
- Access to your database and a SQL import tool (we used phpMyAdmin)
- (*note: backup your database BEFORE trying anything which will alter your existing data)
1. Get sample sql to model your file after
Download sample sql from your database's "node" table and "node_revisions" table to base the formatting of our own file off of. This can be done by going to the "export" tab in phpMyAdmin after selecting each table individually (you don't want the sql for your entire site).
The idea is to copy out one row from each of the files to use as a template for your own custom sql file. In this article we will only use the "node_revisions" table in our examples, but the concepts are exactly the same for the "node" table data. (the "node" table is actually much simpler to make, which is why we are using the harder example, but you will need make files for them both to successfully add your new content)
Example SQL from phpMyAdmin export:
INSERT INTO `node_revisions` VALUES (116, 116, 1, 'Title of post', 'Body', 'Teaser', '', 1162300341, 3);
With this information your job is now to create a spread sheet file which mirrors the order of this format. Your spreadsheet should have 11 rows. Here is a screenshot of what this should look like.
2. Prepare/populate your template
Next once you have you have basic spreadsheet template you will need to populate it with your own data. Most of this is extremely easy at this stage (basic spreadsheet skills are assumed):
- Columns 1, 8, 10, and 11 can likely be just duplicated down all rows unless you have very unique needs (if so you probably realize this already).
- Columns 2, 3, 4, 9 can be automatically made by simply creating a calculation in the first field that looks like somenumber+1 and dragging/pasting it down all rows (where "somenumber" is the appropriate number to begin on based on looking at the sequences table in your database - your beginning number needs to be higher than whatever number is contained in the corresponding field inside of the "sequences" table) [be sure that the nid, vid, and uid/author between the "node" table file you make and the "node revision" file match!]
- ...the rest of the columns (5, 6, 7) should be simple cut and paste maneuvers from your existing spreadsheet. Be sure to wrap columns 6 and 7 in whatever html/css markup you would like (if any).
Once you have everything looking right and ordered properly export your file in csv format, and change the extension to from csv to sql, and open the new file in a proper text editor - which really, really, should have sql syntax checking.
3. Finalize formatting
Depending on your spreadsheet program there will be a small variety of punctuation missing and/or added to your file, which will keep it from importing properly. This is where it is up to you to figure out how to effectively do this since each case is slightly different.
Continually comparing/referring to your model (the exported SQL from step one - you know it's correct since it came directly from your database), smart use of search and replace, some forethought, and color coded SQL syntax checking should get you through to the end without too many gray hairs, hopefully. Tinker with your files until you have them formatted EXACTLY like the sample sql you downloaded from your database AND until your SQL syntax checker shows that your are clean. (on sql syntax tip worth noting is that any place an apostrophe appears you will need to have another apostrophe next to it - or else it will cause a fatal error upon import)
4. After you believe your new SQL is totally clean it is time to try and import it into the database using phpMyAdmin's SQL query function. (you need to have the main database page selected and then click the SQL tab):
Simply copy out the code from your file, paste it into the query window and hit go/submit. With a lot of luck it phpMyAdmin will give you a message that the query successfully ran and your perfectly formatted new content will now be in your database. If you are importing a lot of content, it is likely that you'll get an error message remarking something along the lines of
#1064 - You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'blah-blah-blah' at line 1
Don't fret, just simply go to your file and find the line the error message referred to and figure out what the problem is. It's likely to be either a missing/extra apostrophe or perhaps you accidently left out an entire column of information. Once you isolate and fix the problem, simply copy everything from that entry down in your file and paste in the phpMyAdmin SQL query window again and repeat this process until you have successfully entered everything - there's no need to worry about any of the rows which occurred before the error since they were successfully added already. (*tip: bookmark the page for the phpMyAdmin SQL query window - it will save lot waiting an click back and forth)
You can console yourself that though this process can be a bit tedious that all the imposed perfect/error/syntax checking all but ensure the integrity of your newly added data.
5. Update your sequences table and test to be sure everything is working properly
Once you have successfully imported your SQL for both the "node" and "node_revisions" the last step is to update the "sequences" table of your database appropriately so that Drupal 'knows' the new place to start counting from when a used creates new content. Edit appropriately/correspondingly to the nid and vid values you used.
Last of of all test drive your newly infused site and be sure that you can view, edit, and create new node successfully and without errors.