ExpressionEngine CMS
Open, Free, Amazing

Thread

This is an archived forum and the content is probably no longer relevant, but is provided here for posterity.

The active forums are here.

Need to import a very large CSV file

October 24, 2011 2:58pm

Subscribe [6]
  • #1 / Oct 24, 2011 2:58pm

    Envision

    18 posts

    We need to create channel entries from a CSV file that has about half a million entries. We’ve tried using the DataGrab add-on module (http://brandnewbox.co.uk/products/details/datagrab)to do it but our server keeps timing out after only a few hundred entries are imported.

    We’ve increased our PHP memory limit to 256MB and are still having the time-out problem. Are there any other add-ons someone could suggest that they’ve had success importing LARGE CSV files with besides DataGrab?

  • #2 / Oct 25, 2011 8:15am

    Andrew Weaver

    206 posts

    Hi,

    I am the author of DataGrab. DataGrab is not going to be the most efficient way of importing large datasets. It effectively does a publish for each entry, so if your channel contains a number of 3rd party field types it will have to load those for each entry (due to the way EE’s channel entries API works).

    That said, I know DataGrab has been used for imports of that size and I’m sure we can make some improvements for you. if you’d like to start a thread on the brandnewbox forums or .(JavaScript must be enabled to view this email address) I’ll see what can be done.

    Andrew

  • #3 / Oct 26, 2011 1:21pm

    Envision

    18 posts

    Thanks Andrew! I’ll start a thread there. much appreciated.

  • #4 / Oct 26, 2011 2:03pm

    Envision

    18 posts

    Does anyone have any other recommendations for importing very large CSV files with something other than DataGrab since this really isn’t the tool for the job at the moment?

  • #5 / Oct 26, 2011 5:41pm

    Mark Bowen

    12637 posts

    Would it be possible to split up the file into smaller chunks maybe and import each one in that way instead maybe?

    Mark

  • #6 / Oct 27, 2011 8:27am

    Rob Allen

    3114 posts

    I’d usually do what mark suggested and split your data into smaller chunks. Whenever I import large datasets into systems such as shopping carts this is the approach i take to prevent things like timeouts.

  • #7 / Oct 27, 2011 8:40am

    Envision

    18 posts

    Yes. I did try splitting them up into smaller files. I can successfully import files with just over 300 entries. The problem is that I need to import almost 500,000 entries. That would mean that I would have to import more than 1500 CSVs. We don’t have that kind of time unfortunately.

  • #8 / Nov 04, 2011 12:41pm

    Darren Miller

    103 posts

    I think you might need a whole new approach for that amount of data.

    If you’re doing this all via EE then you’ve got overhead from Apache, PHP, EE and any add-ons.

    You might have to consider using a script direct on the server to parse and import the data directly into the MySQL tables. That’s going to run much faster, can be scheduled to run at night and won’t have timeout problems. You also get a bigger choice of coding languages that way. You might find something like Perl is better at coping with a huge text file *accidentally starts endless debate about coding languages*

  • #9 / Dec 13, 2011 9:20am

    BPeddle

    31 posts

    On this topic I have run into same issue and may just go with direct data load.

    Do you just need to import to the two tables channel_data and channel_titles? 

    Do the url_titles need to be unique truly if you are going to use the entry id instead to pull the entry up?

    Thanks

  • #10 / Feb 03, 2012 9:35am

    Envision

    18 posts

    This actually turned out to be fairly complex due to mapping the existing data to EE fields. In the end it was easiest just to write a SQL query to suck in the data.

.(JavaScript must be enabled to view this email address)

ExpressionEngine News!

#eecms, #events, #releases