ExpressionEngine CMS
Open, Free, Amazing

Thread

This is an archived forum and the content is probably no longer relevant, but is provided here for posterity.

The active forums are here.

Programming Challenge - gmail address variations

September 08, 2009 4:28pm

Subscribe [2]
  • #1 / Sep 08, 2009 4:28pm

    sophistry

    906 posts

    A while back there was a flurry of interesting programming challenges. I’ve got my own now. i would be pleased if you took a look and gave feedback or sent up some code. i’ll post my answer in a few days.

    This is something I have solved, but it took me a while to get to something reasonably fast and flexible.

    Here’s the challenge:

    gmail.com allows you to make many variations of your email address. you may already know that you can use plus-addressing (adding a plus sign and then another string) and gmail will deliver it. for example, a gmail address like this: .(JavaScript must be enabled to view this email address) will also get mail for [email protected] and [email protected], etc…

    what you may not know is that gmail allows you to use many dot-variations of the mailbox. for example all of these addresses are delivered by gmail to .(JavaScript must be enabled to view this email address): .(JavaScript must be enabled to view this email address), .(JavaScript must be enabled to view this email address), and .(JavaScript must be enabled to view this email address)

    challenge: list ALL of the dot-variations for an arbitrary mailbox. (edited to clarify dot-variations, not plus-addressing)

    starting with:
    .(JavaScript must be enabled to view this email address)

    and ending with:
    .(JavaScript must be enabled to view this email address)

    and all the variations in between.

  • #2 / Sep 08, 2009 5:02pm

    garymardell

    315 posts

    You mean with just the .‘s in various places? Or with some + aswell. Surely if we use + then it could be infinite.

  • #3 / Sep 08, 2009 6:27pm

    sophistry

    906 posts

    right, plus addressing is not the challenge. dot-addressing is.

  • #4 / Sep 09, 2009 8:35am

    Michael Wales

    2070 posts

    I started working on my PHP solution to this challenge (and will probably do one in Python as well). I also presented this to a few of my coworkers, I’ll post their responses if/when I get them. I presented the challenge as follows:

    As you may know, periods within the Inbox alias of a GMail account don’t matter. For instance, an address of .(JavaScript must be enabled to view this email address) would receive messages sent to .(JavaScript must be enabled to view this email address), .(JavaScript must be enabled to view this email address), even .(JavaScript must be enabled to view this email address).

    Your challenge is to write a script that outputs every possible combination of a dot-notated Gmail address (recurring periods are not allowed, only a single period between letters). The input address is .(JavaScript must be enabled to view this email address).

  • #5 / Sep 09, 2009 9:03am

    Michael Wales

    2070 posts

    <?php
        $email = '[email protected]';
        
        // An array to store all possible solutions
        $solutions = array();
        
        // Get the flat inbox alias - remove any preexisting dots
        $inbox = str_replace('.', '', current(explode('@', $email)));
        $domain = end(explode('@', $email));
        
        // No dots is our first solution
        $solutions[] = $inbox;
        
        // Dump each character into an array, just for easier looping
        $letters = array();
        $inbox_len = strlen($inbox);
        for ($i = 0; $i < $inbox_len; $i++) {
            $letters[] = substr($inbox, $i, 1);
        }
        
        // Maximum number of dots is $inbox_len - 1, loop through all totals to of dots
        $current_dots = 1;
        $max_dots = $inbox_len - 1;
        while ($current_dots <= $max_dots) {
            $current_position = 1;
            // Loop through all possible positions of dots
            // Take into account the starting position and how many dots will be positioned
            // in relation to the number of positions remaining
            // Prevents (.(JavaScript must be enabled to view this email address)) on a 2-dot run
            while ($current_position + $current_dots <= $inbox_len) {
                $str = '';
                $dot_count = 0;
                // Loop through all of the letters, placing dots as appropriate
                foreach ($letters as $pos => $letter) {
                    if (($pos >= $current_position) && ($dot_count < $current_dots)) {
                        $str .= '.' . $letter;
                        $dot_count++;
                    } else {
                        $str .= $letter;
                    }
                }
                // Place our string into our solution array and move to the next viable position in string
                $solutions[] = $str;
                $current_position++;
            }
            // Place more dots on the next pass
            $current_dots++;
        }
        
        foreach ($solutions as $s) {
            echo $s . '@' . $domain . '
    ';
        }
  • #6 / Sep 09, 2009 9:30am

    sophistry

    906 posts

    hi michael,

    thanks for making the wording for the challenge clearer! i also realized (though it is implied) that the email address has to be valid - so no multi-dots and no dots starting or ending the mailbox.

    the script above generates quite a few valid email addresses, but not the full complement. it leaves out .(JavaScript must be enabled to view this email address) for instance.

    the mailbox “thisemail” has 9 characters so there are 2^8 solutions.

  • #7 / Sep 09, 2009 9:34am

    wiredesignz

    2882 posts

    Not actually having the patience to code this, But… I would convert the character count of the username to a binary number (as sophistry indicates 2^8), and then count up from 0 to that number placing a ‘.’ between characters for each count in the sequence.

  • #8 / Sep 09, 2009 9:46am

    sophistry

    906 posts

    hmmmm…. wired… i’m not sure how to do a speed test on pseudocode. 😊

    also, i just noticed that the EE forum script did not convert the plus-addressing email addresses on the original post. that has been fixed in the CI codebase - the plus character is now in the auto_link() regex. time to backport auto_link() to the EE forum, guys. 😉

    [email protected]
    .(JavaScript must be enabled to view this email address)

  • #9 / Sep 09, 2009 10:02am

    Michael Wales

    2070 posts

    [email protected]

    Dang… have to rethink it :(

  • #10 / Sep 09, 2009 10:03am

    wiredesignz

    2882 posts

    ...I’m not sure how to do a speed test on pseudocode.

    Me neither. But where is xwero when you need him?, he loved this stuff! 😊

  • #11 / Sep 09, 2009 10:14am

    sl3dg3hamm3r

    223 posts

    So, the amount of end-result is simple to calculate, in this example 2^8.

    I would insert the periods randomly, and add it to an array (of course only, if it doesn’t exist yet in the array). When the array fullfills the size of 2^8, I have all permutations.

    ... ah, it should be fast and optimized? 😉))

  • #12 / Sep 09, 2009 12:00pm

    n0xie

    1381 posts

    We came up with this solution, although we just read wiredesignz had something similar.

    function calculate_email($email = '[email protected]')
        {
            $inbox = str_replace('.', '', current(explode('@', $email)));
            $domain = end(explode('@', $email));
            $length = strlen($inbox) -1;
            $posibilities = pow(2,$length)-1;
    
            // dump the inbox in an array
            for ($k = 0; $k <= $length; $k++)
            {
                $letterarray[] = substr($inbox, $k, 1);
            }
    
            // start counting in binary for all the possible solutions
            for ($i = 0; $i <= $posibilities; $i++)
            {
                echo $i .' : ';
                $bin = decbin($i);
                $binlength = $length - strlen($bin);
    
                $j = 0;
                
                // zerofill empty space
                while ($j <= $binlength)
                {
                    $bin = '0'.$bin;
                    $j++;
                }
    
                // replace all binary 1's for dots
                foreach($letterarray as $key=>$value)
                {
                  if (substr($bin, $key, 1) == '1')
                  {
                      echo '.';
                      echo $value;
                  }
                  else
                  {
                    echo $value;
                  }
                 
               }
               echo '@'.$domain.'
    ';
            }
        }

    Also note the use of 3 different forms of loops 😉

    Total Execution Time      0.0321
  • #13 / Sep 10, 2009 11:42am

    sophistry

    906 posts

    thanks for playing guys! that was fun.

    i started attacking the problem the same way MW did - loops.

    then i thought along the lines of sledgehammer - fill randomly and cull.

    then, only after looking at some of the output, i noticed that there was a binary pattern going on - and then it hit me: bits!

    that was the solution proposed by wiredesignz and implemented by noxie. nicely done.

    here’s the solution i had come up with after quite a bit of tinkering and tidying… there are some nice tricks in there if i do say so myself. 😉 and it is a teeny-tiny bit faster than noxie’s excellent submission.

    // build the variations for a gmail address
        function gb($em='[email protected]')
        {
            $mb = str_replace('.','',strtok($em,'@'));
            
            $letters_array = preg_split('//', $mb, -1, PREG_SPLIT_NO_EMPTY);
            $last_letter = array_pop($letters_array);
            $bits = strlen($mb)-1;
            $mask_array = range(0,pow(2,$bits)-1);
            foreach($mask_array as $decimal)
            {
                // zero pad to make the foreach easier to read
                $bin = substr(str_repeat('0',$bits) . decbin($decimal), -$bits);
                $out = '';
                foreach($letters_array as $key => $letter)
                {
                    $out .= $letter . (($bin[$key]) ? '.' : '');
                }
                $arr[] = $out . $last_letter;
            }
            print_r($arr);
        }
  • #14 / Sep 10, 2009 7:09pm

    wiredesignz

    2882 posts

    That’s a sexy piece of code sophistry. I think even Derek Jones would get orgasmic looking at that. 😊

    Good work.

.(JavaScript must be enabled to view this email address)

ExpressionEngine News!

#eecms, #events, #releases