ExpressionEngine CMS
Open, Free, Amazing

Thread

This is an archived forum and the content is probably no longer relevant, but is provided here for posterity.

The active forums are here.

Any possible issues with a member database (import) of over 450,000 people?

October 14, 2007 8:30pm

Subscribe [3]
  • #1 / Oct 14, 2007 8:30pm

    Ryan M.

    1511 posts

    I’ll be importing a member list of over 450,000 people (thank goodness for the built-in EE member import utilities…). I’m not expecting there to be many (if any issues) with that (as far as performance, etc) - except for the fact that first and last names have to be custom fields…arrrgh. I’ve done an import before where I brought in the first/last names into the AIM/ICQ fields, then moved them to the correct custom fields with a SQL statement.  A little scary, but doable - and that was with a mere 300 people.

    Anyone have experience with this many members in an EE install? Anything I should keep in mind?

    A second point is that the goal is to have the person’s email address be their login. Seems to me I read where a new registration won’t allow the username to be the same as the email. Is that correct? I might have been reading a post referring to an older build or version. I just tested this on an older install that I happened to have open and it worked fine. Any feedback appreciated!

  • #2 / Oct 23, 2007 10:47pm

    Ryan M.

    1511 posts

    I am having trouble converting a CSV file of only 14,000 people to the XML format. I’ve tried on two different servers. Is there a certain line ending I might need or something? I have over 400,000 entries to import, and 14,000 user chunks will be slow going.

    Anyone done this?

  • #3 / Oct 24, 2007 9:00pm

    Ryan M.

    1511 posts

    Can someone chime in on what possible limit is imposed on importing member records at one time? After downloading OpenOffice and using that to generate my CSV files, I was able to make the XML files rather quickly (Nice app…Excel what?). I’m doing my CSV to XML conversions in 50,000 user chunks.

    So I had 50,000 users (6 total fields) each. I installed a copy of EE locally simply for testing the upload of a file this size before doing the import on the dev server. I uploaded the XML, clicked import, and after 10 minutes I got a blank white screen with no members imported.

    I need to get these people in the DB sooner than later - and really do not want to have to resort to 5,000 user chunks, but you gotta do what you gotta do.

    What the biggest member DB you’ve monkeyed with? How did you deal with importing?

  • #4 / Oct 27, 2007 10:45pm

    Ryan M.

    1511 posts

    Just because I like talking to myself, I thought I’d share observations in this thread, wherein I seem to be so utterly and totally alone.

    Importing members is tedious and slow - at least when you have to do up to 450,000. Especially if you have at least 1 bad record. I don’t think the attempted import stops at the first bad record, so it goes through the whole thing (each 5,000 member XML file taking upwards of 200-300 seconds according to FasterFox) before I know what the errors are. Either that, or the screen whites out. The conversion from CSV to XML goes quick - but the actual XML import is very slow. It would be nice if the CSV to XML import would validate email addresses there (anything between the <email> tags), rather than passing the buck to the XML file.

    I don’t know, I’ve been sitting here for quite some time converting 5,000 member chunks from CSV to XML, and then trying XML imports locally to find errors in them before attempting an import on the real dev server, and it occurs to me to ask: is there an easier way to do this with a SQL statement? There are no custom fields. How would you then generate the unique_id?

    Anyone who could help me do that is a hero. Back at it…

  • #5 / Oct 28, 2007 12:00am

    JT Thompson

    745 posts

    Just a thought, but being familiar with importing to MySQL using the command line it would be MUCH easier to do that way.

    If the tables are the same could you possibly export the existing DB using mysqldump then import them into the existing database? i know for a fact it would be much less cumbersome that way

  • #6 / Nov 16, 2007 1:05pm

    Ryan M.

    1511 posts

    OK, I’m referencing the tail end of this thread called “Member Import doesn’t respect email uniqueness setting?” in which the discussion turned to importing members. I would have posted these questions there, but they make more sense in this thread, because the title is more accurate. Onward…

    We will be attempting to import (read: insert) these members right into mySQL. We pared it down to 380,000 records, but that is still way too many for the built-in XML import utility and PHP to handle. I have a developer who can write a script and/or use a program to generate what needs generating for the fields, but I want to ensure we’re doing this in a way EE can use. I guess I have a few questions relating to this:

    1. The unique_id field and generating that ID
    How do we generate this? Can we use the same “random” function that is in cp.member_import.php around line 714?

    /** -------------------------------------
    /**  Add a unique_id for each member
    /** -------------------------------------*/
      $data['unique_id'] = $FNS->random('encrypt');

    which is referencing “function random” in core.functions.php, line 367:

    function random($type = 'encrypt', $len = 8)
       {...
       }

    I just want to make sure the unique_id we generate can be ‘used’ by EE. Does it need to be a certain amount of characters, etc.

    2. member_id and Duplicates
    The site is live now (yay!), and people are registering as members as I type this. If we were to go and insert these 380,000+ records, I’m assuming that for our automated insert process we can just start the member ID at something like 10000 and have it increment from there. The question I have on that is: everyone registering up to member #9999 should go in without incident, but will the next member registration then automatically jump to member_id #390000? I’m just making sure that the member ID will look for the next available open number and use that.

    3. Password Encryption
    This is similar to question 1. Do we HAVE to hash this? Or can we keep it simple and just import the password as typed? cp.member_import.php around line 586:

    case 'password':
      // encode password if it is type="text"
      $this->members[$i][$tag->tag] = ($tag->attributes['type'] == 'text') 
      ? $FNS->hash($tag->value) : $tag->value;
    break;

    which is referencing “function hash” in core.functions.php, line 327:

    function hash($str)
      {...
      }

    I don’t recall what hash setting they used by default when they installed. What’s the easiest way to figure that out from the CP?

    4. The Database Tables to Insert Into
    Need to make sure we hit and insert into every table that a new member goes into. I think this list is complete:
    exp_members
    exp_member_data
    exp_member_homepage (even though these people will never see this, I’m assuming a record with their member_id needs to be inserted here as well)

    Any advice or experience with importing members through directly inserting them would be much appreciated. Thanks!

  • #7 / Nov 16, 2007 1:37pm

    Ryan M.

    1511 posts

    I guess the only other thing to add is the proper way to generate the join_date. I see that it is generated with “$LOC->now;” I assume that if we could just figure out what the 10 digits are (this is a UNIX timestamp?) for the current day (because join date isn’t important for these members) we could just hard-code that as the value for the join_date.

  • #8 / Dec 06, 2007 8:15pm

    Hop Studios

    500 posts

    mdesign: It’s probably too late, but I can offer a few observations.

    1) You can hardcode the join_date

    2) if you don’t create an exp_member_homepage record (or even worse, if you create 2), you get all sorts of weirdnesses that will take ages to track down and will make you lose hair.  Certainly, I don’t blame EE for expecting that this table will have a matching entry, but it is a little difficult to diagnose.

    3) yes, you have to hash the password if you expect it to work; on the other hand, you can insert anything random valke in that field and then ask your users to reset their password upon first use. That’s what I do sometimes.

    4) Once you import your members, the very next member to register will get an id number larger than the largest one—so you’ll have a gap in your numbering scheme.  EE doesn’t “fill in” or reuse missing or deleted member IDs, either.

    Lastly, even without using the XML import, importing 400,000 records all in one swipe might be difficult, depending on your host’s memory restrictions for PHP and CPU restrictions.  If it were me, I’d probably do batches of 50,000.

    TTFN
    Travis

  • #9 / Dec 09, 2007 11:46am

    Ryan M.

    1511 posts

    Thanks for your feedback, however delayed, Travis! I ended up cobbling together a page that I could submit a CSV file to that would do all the hashing, join date, etc. I found that, at least on the box I was on, I could only do 4000 entries at a time, because the script would time out around 30 seconds in. So, I had about 100 CSV files, all sequentially numbered, and imported them one after the other, and it went pretty quickly.

    I took care to make sure all three tables were written to for the new members - including the homepage table.

    All in all I think it went pretty well.

.(JavaScript must be enabled to view this email address)

ExpressionEngine News!

#eecms, #events, #releases