ExpressionEngine CMS
Open, Free, Amazing

Thread

This is an archived forum and the content is probably no longer relevant, but is provided here for posterity.

The active forums are here.

Serving off of Network Filesystems

July 03, 2014 12:51pm

Subscribe [1]
  • #1 / Jul 03, 2014 12:51pm

    dcmbrown

    1 posts

    We are near completion of a new site in EE which runs on AWS using an ELB in front of a pair of m3.medium web servers whose content is being served from a pair of m1.medium database server instances.  The web servers are running Apache and serving off of a replicated GlusterFS filesystem which is shared across both web servers.  This is so we can at some point, should the traffic on the site warrant it, push the web servers off the file servers and make them scalable. GlusterFS is a network filesystem which can be setup much like hard drive RAID arrays depending on the number of servers being used.

    The issue we have come across however is that EE’s performance on a networked filesystem is absolutely horrendous, even one locally mounted via NFS instead of via the GlusterFS protocol.
    This appears to be primarily because of the function directory_map() located in:

    system/codeigniter/system/helpers/directory_helper.php

    From the debugging output I’ve dumped into this section of code, on our client’s front page alone, directory_map() is called 2,227 times, and ends up calling is_dir() 5,570 times.  The best time which I’ve seen the front page respond is 10.74s, but averages more often at roughly 15.4s

    For comparison, the current site for this company runs off of an ancient custom CMS we are trying to get rid of, but comparatively it runs rings around EE in the same setup using a slightly less powerful set of m1.medium web servers.  Response times in this custom CMS are typically 150ms-600ms depending on server load.

    We have several other site which also use EE, and through creative uses of the various caching systems available, have brought this time down to as little as 2s, but that is still far higher than desired for a server with little to no traffic.
    I’ve already rewritten the function to use scandir instead of readdir, which over the course of a page provides up to 0.5s of improvement but clearly this still isn’t going to make the page load acceptable from 12.5s to 12s.  Given the number of redundant times different parts of EE use directory_map while loading a page, I am considering running a simple cron job to store all of this information in a file once per minute (or more), and rewriting directory_map() to read and parse that file to pass back the same information.

    Are there better options than this?

    [UPDATE: 20140704 17:17:34pm] - Forgot to attach screenshot

     

  • #2 / Jul 07, 2014 5:36pm

    dcmbrown

    1 posts

    [EDIT] Removed most of directory_map() code, even though it isn’t remotely original, to avoid any possible licensing issues.
    An addendum to this.  A coworker added a few lines of code to directory_map() which improves performance on a network filesystem by introducing a searchable recursive global (ie. static) variable:

    It adds the following code.

    Immediately after the opening function brace:

    static $directory_map = array();
                if (array_key_exists($source_dir . $directory_depth, $directory_map) ) {
                       return $directory_map[$source_dir . $directory_depth];
                }

    Also right after closedir($fp) and before return $filedata add:

    $directory_map[$source_dir . $directory_depth] = $filedata;

    The introduction of the static variable to cache the data being sought halves the amount of time taken doing is_dir() and opendir() calls.  This brings the number of calls which proceed through the entire function down from 2,227 to 438.  I have not timed the in_array() search as of yet but obviously significantly less.

.(JavaScript must be enabled to view this email address)

ExpressionEngine News!

#eecms, #events, #releases