Importing weblog entries to SQL: anything to beware of?
Posted: 16 April 2008 06:12 AM   [ Ignore ]  
Summer Student
Total Posts:  5
Joined  10-05-2007

Hi,
I have a site with multiple entries in one weblog (actually a list of churches). I want to add in a whole load more by importing into the database directly (table exp_weblog_data). Before I go delving around, I was just wondering if anyone had experience of mass uploads like this and is there anything I should watch out for? Particularly thinking of any ID fields and the like. Are there other tables I should be careful to keep in line with?

thanks, Jim

Profile
 
 
Posted: 16 April 2008 07:47 AM   [ Ignore ]   [ # 1 ]  
Lab Technician
Avatar
RankRankRankRank
Total Posts:  2419
Joined  12-06-2002

I’d recommend investigating CSVGrab before attempting the SQL level work.  If CSVGrab doesn’t work for you let me know..I’ve done the SQL level stuff in the past and have some notes saved.

 Signature 

Sign up for Southern Fried ExpressionEngine - a 4-day EE class in San Antonio, TX starting on January 20th.

Profile
 
 
Posted: 11 May 2008 11:25 PM   [ Ignore ]   [ # 2 ]  
Research Assistant
Avatar
RankRankRank
Total Posts:  423
Joined  12-15-2005

Michael, I’d be very interested to hear how you did this. I have far too many entries for CSVGrab to efficiently handle. I have been trying of course, but it just keeps bogging down. I have approaching 1,000,000 entries, so the only good way to do this is SQL level querying…

Thanks for any help you can offer!

 Signature 

System Info: Production

EE Version:  1.66
Build
:       20081114
PHP Version
: 5.2.5

Profile
 
 
Posted: 12 May 2008 05:35 AM   [ Ignore ]   [ # 3 ]  
Lab Technician
Avatar
RankRankRankRank
Total Posts:  2419
Joined  12-06-2002

OK - I can’t promise this process will work for you.  I did years of MSAccess development so learned SQL in that environment—so was most comfortable figuring out the EE database things out there. 

What I did was install a WAMP setup locally, and then ODBC drivers and a Machine Data Source to connect to be able to connect to MySQL from MSAccess. 

It’s not an incredibly efficient method - but does work.  You will need to be comfortable mucking around at the database level, and figuring out what fields you need to target in EE.

Here are the steps I have noted:

1. backup database from EE Control Panel
2. Look at edit tab to determine next entry_id number
3. Check that category names in existing and new data match exactly (no misspellings on one side or the other)
4. Check for empty fields in new data.
5.  log into phpMyAdmin
6.  export exp_category_posts, exp_weblog_titles, exp_weblog_data.
7.  use BigDump.php to load downloaded tables into local MySQL database http://www.ozerov.de/bigdump.php
8.  Open MSAccess “ConnectToMySQL.mdb”.
9.  Adjust 3 queries to calculate entry_id.
10. Check that fields that are required on EE side are not null in new data.
11. Check that all three queries have same number of records.
12. Run three append queries - weblog data, weblog entries, category posts.
13. Go into local phpMyAdmin and export 3 tables (Use Complete Inserts).
14. Open up .sql files generated and change following code on each table:

ENGINE=MyISAM DEFAULT CHARSET=latin1;

to

TYPE=MyISAM;

15. Upload .sql files to site
16. upload bigdump.php to site - make sure its configured with target database info
17. Take the site offline
18. Clear all the caches.
19. Run bigdump.php on site.
20. Pray.


The heart of the process are the three queries to update exp_category_posts, exp_weblog_titles and exp_weblog_data.

In each I have a calculated field that creates the entry_id’s that EE needs to tie everything together.  What I do is have Access create a key field on the new imported data.  That field is an integer and starts at 1, so I just add the last entry_id (noted in step 2) from EE to it.

If you have categories to deal with you’ll probably need to pull down exp_categories from EE as well, and do some query work to figure out what category_id to pull.

For exp_weblog_titles I have a query that assigns all the proper info - the weblog ID, the author ID, ip address, ect.  You’ll need to look through the fields there and get the proper values in. The trickiest for me was the url_title.  I had to write a function in MSAccess to create that from the data, and call the function from the query.

For exp_weblog_data the main thing is to look at an existing post on the EE side and see what fields you need to get your data into.  All others need to be assigned an empty string (vs. a NULL).

The other key is the bigdump.php file - it chunks up large .sql files so the process doesn’t time out or run into memory issues on the import.

It’s a pretty one-off process, but if it’s helpful at all I could zip up the MSAccess stuff and send it - if you can use it directly maybe some of the logic would be helpful.

 Signature 

Sign up for Southern Fried ExpressionEngine - a 4-day EE class in San Antonio, TX starting on January 20th.

Profile
 
 
Posted: 25 November 2008 02:35 PM   [ Ignore ]   [ # 4 ]  
Grad Student
Avatar
Rank
Total Posts:  64
Joined  03-24-2007

Hi there,

I just imported a number of records batched into a temp table in the following steps. ALL DATA is good when I edit entries in CP. When I save, though, nothing takes. Anything I might’ve missed? Version 1.6.4. My previous, manually-entered entries are all saving OK.

Insert into exp_weblog_titles, grab insert ID
Insert into exp_weblog_data, grab insert ID
Create category if doesn’t exist, else grab matching ID (only 1 parent cat)
Create subcategory if doesn’t exist, else grab matching ID (only 1 subcat)
Insert cat/post with entry ID
Create tag(s) if doesn’t exist, else grab matching ID
Insert tag/post with tag/entry ID

Thanks!
Mark

 Signature 

.  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
Mark J. Reeves
Principal, Technology

Profile
 
 
Posted: 25 November 2008 02:36 PM   [ Ignore ]   [ # 5 ]  
Grad Student
Avatar
Rank
Total Posts:  64
Joined  03-24-2007

Also - Versioning is enabled for this weblog, but I turned it off (‘n’) for the imported entries.

Mark

 Signature 

.  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
Mark J. Reeves
Principal, Technology

Profile
 
 
Posted: 25 November 2008 02:53 PM   [ Ignore ]   [ # 6 ]  
Grad Student
Avatar
Rank
Total Posts:  64
Joined  03-24-2007

Scratch that request - I hadn’t imported weblog_id in exp_weblog_data. That was causing the failure.

Mark

 Signature 

.  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
Mark J. Reeves
Principal, Technology

Profile
 
 
   
 
 
Post Marker Legend
New Topic New posts Hot Topic Hot Topic with new posts New Poll New Poll Moved Topic Moved Topic Sticky Topic Sticky topic
Old Topic No new posts Hot Old Topic Hot Topic with no new posts Old Poll Old Poll Closed Topic Closed Topic Announcement Announcements
Theme
Change Theme
Visitor Statistics
The most visitors ever was 1149, on July 16, 2007 09:33 AM
Total Registered Members: 65087 Total Logged-in Users: 37
Total Topics: 82229 Total Anonymous Users: 21
Total Replies: 441930 Total Guests: 211
Total Posts: 524159    
Members ( View Memberlist )