ExpressionEngine CMS
Open, Free, Amazing

Thread

This is an archived forum and the content is probably no longer relevant, but is provided here for posterity.

The active forums are here.

Attempts to clone a large site via MSM keep failing.

September 16, 2009 10:11am

Subscribe [3]
  • #1 / Sep 16, 2009 10:11am

    ian Pitts

    175 posts

    We have a fairly large site that we’re trying to clone as a baseline for a localized version. Some weblogs are getting copied in bulk while others are being created and left empty.

    The problem I’m having is that we are running out of PHP memory before the cloning process is completed.

    We initially had 128Mb allocated. The cloning process failed stating it needed more memory. I contacted EngineHosting and had them raise the PHP memory limit to 256Mb. After tediously cleaning out the partially created site, I tried again, after hours to do the clone. This time it failed again:

    Fatal error: Allowed memory size of 268435456 bytes exhausted (tried to allocate 66388 bytes) in /www/account/public_html/pbmsys/db/db.mysql.php on line 389

    Is there something that can be done? The whole reason we purchased MSM was so that we could more easily build out localized versions of our main site and if I can’t get a clone to complete without running out of PHP memory, I don’t know what to do.

    I’m also up against a tight deadline… I’m on the hook to get this site cloned this week and have lost two days due to these cloning issues. Any help you can provide would be greatly appreciated.

  • #2 / Sep 16, 2009 10:16am

    Sue Crocker

    26054 posts

    Hmmm.. Solspace as a module that I’ve used to import .xls files into the database. It might be something to look at.

    Solspace:// Importer

  • #3 / Sep 16, 2009 10:29am

    ian Pitts

    175 posts

    Thanks Sue, that may work… but they state that it’s limited to about 10Mb at a time…

    Any insight about why so much memory is required to copy data into the new site? Massive arrays?

  • #4 / Sep 16, 2009 10:33am

    Sue Crocker

    26054 posts

    I have no idea—but I’ll see if I can get any more info for you.

    Just for my info - how many weblog entries are we talking about needing to clone, and how many custom fields in each?

    Also how many weblogs are you attempting to clone?

  • #5 / Sep 16, 2009 10:50am

    ian Pitts

    175 posts

    We have about 10 different weblogs where 7 or so are getting all the data copied.

    The largest weblog has 308 entries and 47 custom fields. Another has 377 entries but only 4 custom fields.

    I estimate about 1800 entries would be copied, each with a varying number of custom fields.

  • #6 / Sep 16, 2009 11:51am

    Derek Jones

    7561 posts

    iso100 what version and build are you on at present?  And what are the exact options you are selecting in the create new site form?  I’m helping Nevin on this one, and one concern is the 1.1GB relationships table (egad!), but I need to know exactly what’s in play, so the answers to those two questions will be informative.

  • #7 / Sep 16, 2009 12:04pm

    ian Pitts

    175 posts

    We are on version 1.6.7 Build:  20090211.

    In the create new site form I’m actually picking and choosing weblogs to either ignore, create with no data or create with all data.

    Do you need to know my specific weblog names and settings for each? I can’t imagine that information would be very useful unless you knew my weblog/custom field setup.

    I was just thinking that maybe clearing relationship caches (GASP) would reduce the size of exp_relationships to something that can be copied.

  • #8 / Sep 16, 2009 12:16pm

    Derek Jones

    7561 posts

    I’m more interested in knowing if you are moving data or copying, and what combination, the names are immaterial.  Relationships are only handled if some content is moved.  What is the build number for MSM?

    That size of data in the relationships table is out of proportion with the size of your entry data.  How are relationships being used?  You have two relationship extensions in your extensions table, so I’m curious if either of them are ballooning that data.

  • #9 / Sep 16, 2009 1:35pm

    ian Pitts

    175 posts

    All content that goes into the new site is only being copied. Nothing is getting moved.

    MSM is version: 1.1 Build: 20081024.

    Relationships are used across the site to relate resources (demos, whitepapers, documentation) as well as products and solutions with other items.

    On our home page (http://www.pbinsight.com/), relationships are used to add in the content for featured products, some items on the product support tab, etc.

    In our Products section, all content down the right side is via relationships, as well as most of the content on the Resources tab:
    http://www.pbinsight.com/products/location-intelligence/applications/mapping-analytical/mapinfo-professional/

    Curious, are you working with EngineHosting and have a view to my database or something? What two relationship extensions are you seeing?

    I’m almost 100% certain that the relationship table is getting filled with relationship cache data… and the size is the side effect of: lots of records, lots of relationships and then lots of custom fields.

    When I had mistakenly cleared the relationship caches just prior to launch, the size of that table dropped to ~500mb or less. Of course, the site was super slow then as well, because the relationship caches were rebuilt on page load… causing 30-120 second page load times.

  • #10 / Sep 16, 2009 1:45pm

    Derek Jones

    7561 posts

    Yes, Nevin and I have been working together on this this morning, and when he relayed the sizes of each table to me, relationships caught my eye.  There are 24 different extensions in the extensions table, two of which are multi_relationship and Playa.  I haven’t looked at your data so I don’t know how you’re utilizing them or what version you’re on.  But unless you are constantly relating entries over and over to other entries, I don’t see how that size of data can be normal.  The permutations required to connect the dots between 10M of data to end up with over 1G of cached relationships is staggering.

    It may be the case that everything is working “correctly” here, but it gives me pause to wonder if there isn’t a bug or serious optimization problems with one or more of the extensions you are utilizing.  Ae you using both of those, and are they up to date?

    If you aren’t moving any data, though, the relationships table shouldn’t be coming into play.

  • #11 / Sep 16, 2009 2:01pm

    ian Pitts

    175 posts

    I’m actually not using multi_relationships right now… I will disable it as I didn’t consider a “dormant” unused extension could affect things.

    I agree, under normal conditions, if I was hand coding some relationships via an intersection table in SQL, I would never expect it to get that large. Of course, the fact that relationship data is cached for quick access combined with many custom fields could be doing something I didn’t expect.

    Playa is an older version (1.3.3) if I recall correctly because it has some interface modifications we required and applied ourselves. Recent updates don’t have that feature which is something we can’t live without right now in terms of content entry from content owners. (our modifications allow the locking of specific weblogs and categories to display for a relation down at the time of field creation, minimizing end user error)

    We can’t upgrade yet because we’d also have to change template markup on all other sites on this MSM installation to move to the new Playa 2 syntax.

  • #12 / Sep 16, 2009 2:13pm

    Derek Jones

    7561 posts

    If it’s not active, it wouldn’t be affecting anything; again, I didn’t look at the data, just asked Nevin for the names of classes sitting in the table.

    I would wager that there’s something not being stored correctly in that table.  That’s an average of half a megabyte of cached relationship data per weblog entry, I can’t imagine that there are that many legitimate relationships.  And if there are, then a better custom solution might be required for sake of sanity.

    But as you are on a deadline to get through this, solving that problem will need to occur later.  Either I or Nevin will update you with information when we’re ready for you to try to duplicate your site again.

  • #13 / Sep 16, 2009 2:50pm

    ian Pitts

    175 posts

    If there are many fields in a weblog and that weblog is related to other weblogs, isn’t that data cached in the rel_cache? I can see how many entries each with relationships to other entries can result in quite a bit of stored content in that cache.

    I agree though that a custom solution might be the way to go. I believe Brandon might be working on that though… so I think we might want to leave that to the expert.

    I appreciate your and Nevin’s work on this issue for me.

  • #14 / Sep 16, 2009 2:54pm

    Derek Jones

    7561 posts

    Yes, all of the relevant data is cached to avoid dozens of queries and JOINs to large tables when that data is needed, but again, just doing the straight math, it doesn’t quite match up that 10M of data spread over 1800 or so entries through relationships would balloon to over 1G.  We’ll have you going soon if things work out as we hope.

  • #15 / Sep 21, 2009 4:59pm

    ian Pitts

    175 posts

    Thanks everyone! I successfully created a new site and copied content, templates, etc from the existing site last night. Apparently 2048Mb of PHP memory did the trick.

    Curious, it appears none of our content’s page_uris were copied over. The other data is there (not relationship information, but Brandon warned me about that anyway) but all the pages uris are empty. This is going to take quite some time rebuilding them.

.(JavaScript must be enabled to view this email address)

ExpressionEngine News!

#eecms, #events, #releases