ExpressionEngine CMS
Open, Free, Amazing

Thread

This is an archived forum and the content is probably no longer relevant, but is provided here for posterity.

The active forums are here.

yayparser - Yet Another Yaml parser - updated

May 31, 2008 7:02pm

Subscribe [5]
  • #1 / May 31, 2008 7:02pm

    m4rw3r's avatar

    m4rw3r

    647 posts

    NOTE: a new version in the Wiki

    Had some time to kill, and I thought it would be fun to make a YAML parser (OK, I’ll get back to MPTtree soon :red: ).
    It relies heavily on regex, and I haven’t built it as the YAML specification says, but it supports many parts of the YAML standard.

    It is built as a helper, so make a file in your helpers dir and copy and paste:

    <?php 
    /*
     * Created on 2008 May 31
     * by Martin Wernstahl <m4rw3r@gmail.com>
     */
    /*
     * Copyright (c) 2008, Martin Wernstahl
     * All rights reserved.
     *
     * Redistribution and use in source and binary forms, with or without
     * modification, are permitted provided that the following conditions are met:
     *     * Redistributions of source code must retain the above copyright
     *       notice, this list of conditions and the following disclaimer.
     *     * Redistributions in binary form must reproduce the above copyright
     *       notice, this list of conditions and the following disclaimer in the
     *       documentation and/or other materials provided with the distribution.
     *     * The name of Martin Wernstahl may not be used to endorse or promote products
     *       derived from this software without specific prior written permission.
     *
     * THIS SOFTWARE IS PROVIDED BY Martin Wernstahl ``AS IS'' AND ANY
     * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
     * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
     * DISCLAIMED. IN NO EVENT SHALL Martin Wernstahl BE LIABLE FOR ANY
     * DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
     * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
     * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
     * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
     * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
     * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
     */
    if(!defined('YAYPARSER_key'))
        define('YAYPARSER_key','([^\n\S]*)(?:\?[^\S\n]*)?(\S*?)(?:[^\S]*)?:(?:[^\S\n]*)?');
    
    if(!defined('YAYPARSER_list'))
        define('YAYPARSER_list','([^\n]*?)?-(?:[\s]*?)');
    
    if(!defined('YAYPARSER_multiline'))
        define('YAYPARSER_multiline','(?:(?:\||>)(?:[^\n]*)?\n((?:(?:\1)([^\n\S]+)(?:[^#\n\s][^#\n]*)(?:#.*?)?)(?:\n(?:\1)(?:[^\n\S]+?)(?:[^#\n\s][^#\n]*)(?:#.*?)?)*))');
    
    if(!defined('YAYPARSER_val'))
        define('YAYPARSER_val','([^#\n\s][^#\n]*)');
    
    if(!defined('YAYPARSER_comment'))
        define('YAYPARSER_comment','(?:#.*?)?');
    
    if(!function_exists('yayparser')){
        /**
         * YAYParser - Yet Anoter Yaml Parser.
         * 
         * @param $string The string to be processed
         * 
         * @return An array which contains data built from the string
         */
        function yayparser($string){
            $ret = array();
            if(!strlen($string))
                return array();
            // try to match single "key: value" first
            $num = preg_match('@^'.YAYPARSER_key.'(?:'.YAYPARSER_multiline.'|'.YAYPARSER_val.')'.YAYPARSER_comment.'(?:(?:\n([\w\W]*))|$)@',$string,$result);
            if($num < 1){
                /* if no found, try to match "key:
                 *                                   key: value"
                 */
                $num = preg_match('@'.YAYPARSER_key.'(#[^\n]*?)?\n((?:(?:\1)([^\n\S]+?)(?:(?:(?:\?[^\S\n]*)?(?:\S*?)(?:[^\S]*)?:(?:[^\S\n]*)?(?:(?:\||>)(?:[^\S]*)?((?:\1\4(?:[^\n\S]+?)(?:[^#\n\s][^#\n]*)(?:#.*?)?\n)+))|(?:(?:[^#]*?)?))|(?:#[^\n]*?)|(?:-(?:\s*)?(?:[^#\n\s]*)?))(?:#[^\n]*?)?\n)*)([\w\W]*)@',$string,$result);
                if($num){
                    $ret[$result[2]] = yayparser($result[4]);
                    $ret = $ret + yayparser($result[7]);
                    return $ret;
                }
                // try to match all "-listvalue"
                $result = array();
                preg_match_all('@'.YAYPARSER_list.'(?:'.YAYPARSER_multiline.'|'.YAYPARSER_val.')'.YAYPARSER_comment.'\n@',$string,$result);
                foreach($result[4] as $key => $val){
                    // do we have a multiline?
                    if(!empty($result[2][$key])){
                        // yes
                        $str = explode("\n",$result[2][$key]);
                        $indent_len = strlen($result[1][$key]) + strlen($result[3][$key]);
                        array_walk($str,'YAYPARSER_substr',$indent_len);
                        $ret[$key] = implode("\n",$str);
                    }
                    else
                        $ret[$key] = $val;
                    // if the line is surrounded with [], remove them and split the line into an array
                    if($ret[$key][0] == '[' && substr($ret[$key],-1) == ']'){
                        $ret[$key] = explode(',',trim($ret[$key],'[]'));
                    }
                }
                return $ret;
            }
            if(strlen($result[3])){
                // we have multiline
                $str = explode("\n",$result[3]);
                $indent_len = strlen($result[1]) + strlen($result[4]);
                array_walk($str,'YAYPARSER_substr',$indent_len);
                $ret[$result[2]] = implode("\n",$str);
            }
            else // normal
                $ret[$result[2]] = trim($result[5]);
            // if the line is surrounded with [], remove them and split the line into an array
            if($ret[$result[2]][0] == '[' && substr($ret[$result[2]],-1) == ']'){
                $ret[$result[2]] = explode(',',trim($ret[$result[2]],'[]'));
            }
            if(isset($result[6]) && strlen($result[6]))
                $ret = $ret + yayparser($result[6]);
            return $ret;
        }
    }
    if(!function_exists('YAYPARSER_substr')){
        /**
         * A substr wrapper for use with array_walk().
         */
        function YAYPARSER_substr(&$value,&$key,&$len){
            $value = substr($value,$len);
        }
    }
    ?>

    Whew, only 121 chars left…

  • #2 / May 31, 2008 7:04pm

    m4rw3r's avatar

    m4rw3r

    647 posts

    Example yaml:

    class:
        blah: |  # multiline <- comment
          function($lol){
              return;
          }
        alist:
         -foo
         -bar
        anotherlist: [foo,bar]
    name: foo

    php array:

    Array
    (
        [class] => Array
            (
                [blah] => function($lol){
        return;
    }
                [alist] => Array
                    (
                        [0] => foo
                        [1] => bar
                    )
    
                [anotherlist] => Array
                    (
                        [0] => foo
                        [1] => bar
                    )
    
            )
    
        [name] => foo
    )
  • #3 / May 31, 2008 7:34pm

    Jamie Rumbelow's avatar

    Jamie Rumbelow

    546 posts

    Great!

    I was thinking about making a YAML parser actually, and this one is nicely done! I think there might be a better way however, when dealing with larger files this would struggle.

    And what about a “Array to YAML” function (writer)?

    Jamie

  • #4 / May 31, 2008 7:54pm

    m4rw3r's avatar

    m4rw3r

    647 posts

    Ok, sorry, the uploaded code has a lot of errors.
    This because of me merging a lot of multiline regexes (and converting from a class to a helper).

    Going to fix it.

  • #5 / May 31, 2008 7:59pm

    Jamie Rumbelow's avatar

    Jamie Rumbelow

    546 posts

    Cool 😊

  • #6 / May 31, 2008 8:15pm

    m4rw3r's avatar

    m4rw3r

    647 posts

    Ok, I fixed it.

    Now everything should work (except from the parts I haven’t implemented yet).

    I’m going to look at creating YAYWriter in a while, I made YAYParser for IgnitedRecord (so it will have at least three ways of configuring (php class,factory and now YAML)).

  • #7 / Jun 05, 2008 10:43am

    m4rw3r's avatar

    m4rw3r

    647 posts

    This parser is actually very bad, I’m currently making a newer and better one.

  • #8 / Jun 05, 2008 10:51am

    Majd Taby's avatar

    Majd Taby

    637 posts

    make me a YAML parser that doesn’t break on tabs 😛

  • #9 / Jun 05, 2008 11:04am

    m4rw3r's avatar

    m4rw3r

    647 posts

    Well, this one doesn’t (but it has a lot of other flaws).

    But if you combine tabs and spaces in the indentation it may break.

  • #10 / Jun 08, 2008 2:48am

    evanwalsh's avatar

    evanwalsh

    22 posts

    I use spyc for my YAML parsing needs.

  • #11 / Jun 08, 2008 12:24pm

    m4rw3r's avatar

    m4rw3r

    647 posts

    I have now finished all my work with school for this term 😊
    And I’ve also made a lot of improvements to YAYparser:

    - It now supports a lot more of the YAML standard
    - support for multiple documents in a single string
    - It does not mess up on empty lines
    - Fewer parsing errors
    - Better performance (a bit better than Spyc, and YAYparser supports the YAML standard to a greater extent)

    drawbacks:
    - multiline data may be trimmed too much
    - no unicode support
    - no escaping of characers in strings (scalars), yet

    Please give me some feedback.

    Here is the code (copy and paste to app/helpers/yayparser_helper.php):

  • #12 / Jun 08, 2008 12:24pm

    m4rw3r's avatar

    m4rw3r

    647 posts

    <?php
    /*
    * Created on 2008 May 31
    * by Martin Wernstahl <m4rw3r@gmail.com>
    */
    /*
    * Copyright (c) 2008, Martin Wernstahl
    * All rights reserved.
    *
    * Redistribution and use in source and binary forms, with or without
    * modification, are permitted provided that the following conditions are met:
    *     * Redistributions of source code must retain the above copyright
    *       notice, this list of conditions and the following disclaimer.
    *     * Redistributions in binary form must reproduce the above copyright
    *       notice, this list of conditions and the following disclaimer in the
    *       documentation and/or other materials provided with the distribution.
    *     * The name of Martin Wernstahl may not be used to endorse or promote products
    *       derived from this software without specific prior written permission.
    *
    * THIS SOFTWARE IS PROVIDED BY Martin Wernstahl ``AS IS'' AND ANY
    * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
    * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
    * DISCLAIMED. IN NO EVENT SHALL Martin Wernstahl BE LIABLE FOR ANY
    * DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
    * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
    * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
    * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
    * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
    * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
    */
    if(!function_exists('YAYPARSER_parse')){
        /**
         * YAYParser - Parser function.
         *
         * @param $str The string to be processed
         *
         * @return An array which contains data built from the string
         */
        function YAYPARSER_parse($str){
            $ret = array();
            if(!strlen($str))
                return $ret;
            $num = preg_match_all(
                // ONE REGEX TO RULE THEM ALL
                '@(?:^|\n)([^\S\n]*)(?:((?:\?\s*)?(\S*)\s*(?:#.*)?(?:\n[^\S\n]*#[^\n]*)*(?:\n[^\S\n]*)?:(?:[^\S\n]*)?)|(-(?:[^\S\n]*)?))(?:(?:\&(\S*))(?:[^\S\n]*)?|(?:\*(\S*))(?:[^\S\n]*)?)?(?:((?:[^\|>\'"\n]*)?(?:\n(?:(?:(?:\1)(?:[^\n\S]+)[^\n]*)|(?:[^\n\S]*#[^\n]*)|(?:[^\n\S]*(?=\n))))+)|(?:(\||>).*\n((?:[^\n\S]*\n)*(?:(?:\1)([^\n\S]+)[^\n]*)(?:(?:\n(?:\1)(?:[^\n\S]+?)[^\n]*)|(?:\n[^\n\S]*(?=\n)))*))|(?:("|\')([\w\W]*?)\11)|(?:\{([\w\W]*?)\})|(?:\[([\w\W]*?)\])|(?:([^#\n\s][^#\n]*))?)(?:#[^\n]*)?@'
                ,$str,$matches,PREG_SET_ORDER);
            foreach($matches as $match){
                // value:
                $data = array();
                if(isset($match[9]) && strlen($match[9])){
                    // multiline
                    $s = explode("\n",$match[9]);
                    $indent_len = strlen($match[1]) + strlen($match[10]);
                    array_walk($s,'YAYPARSER_substr',$indent_len);
                    array_walk($s,'YAYPARSER_trim');
                    if($match[8] == '|'){    // "|" preserves newlines
                        $data = implode("\n",$s);
                    }
                    else{
                        $data = implode(' ',$s);
                    }
                }
                elseif(isset($match[7]) && strlen($match[7])){
                    // block
                    if(isset($match[4]) && strlen($match[4]))
                        $match[7] = str_repeat(' ',strlen($match[4]) + strlen($match[1])) . $match[7];
                    elseif(isset($match[2]) && strlen($match[2]))
                        $match[7] = str_repeat(' ',strlen($match[2]) + strlen($match[1])) . $match[7];
                    $data = YAYPARSER_parse($match[7]);
                }
                elseif(isset($match[12]) && strlen($match[12])){
                    // flow scalar
                    $arr = explode("\n",$match[12]);
                    array_walk($arr,'YAYPARSER_trim');
                    $data = trim(implode("\n",$arr));
                }
                elseif(isset($match[13]) && strlen($match[13])){
                    // flow node with key/value
                    $list = explode(',',$match[13]);
                    array_walk($list,'YAYPARSER_trim');
                    foreach($list as $l){
                        $data = array_merge($data,(Array)YAYPARSER_parse($l));
                    }
                }
                elseif(isset($match[14]) && strlen($match[14])){
                    $data = explode(',',$match[14]);
                    array_walk($data,'YAYPARSER_trim');
                }
                elseif(isset($match[15])){
                    // value
                    $data = $match[15];
                }
                // create anchor
                if(isset($match[5]) && strlen($match[5]))
                    $GLOBALS['YAYPARSER_anchors'][$match[5]] = $data;
                // load anchor
                if(isset($match[6]) && strlen($match[6])){
                    if(isset($GLOBALS['YAYPARSER_anchors'][$match[6]]))
                        $data = $GLOBALS['YAYPARSER_anchors'][$match[6]];
                    else
                        log_message('warning','yayparser: The anchor '.$match[6].' was not found.');
                }
                // key / list saving
                if(isset($match[3]) && strlen($match[3])){
                    $ret[$match[3]] = $data;
                }
                elseif(isset($match[4]) && strlen($match[4])){
                    // list
                    $ret[] = $data;
                }
            }
            if($num)
                return $ret;
            // a block without any tags/lists is regarded as a string (newline = space)
            $arr = explode("\n",$str);
            array_walk($arr,'YAYPARSER_trim');
            return trim(implode(' ',$arr));
        }
    }
  • #13 / Jun 08, 2008 12:25pm

    m4rw3r's avatar

    m4rw3r

    647 posts

    The rest of yayparser_helper.php:

    if(!function_exists('yayparser')){
        /**
         * YAYParser - Yet Anoter Yaml Parser.
         *
         * @param $str The string to be processed
         *
         * @return An array which contains data built from the string
         */
        function yayparser($str){
            if(!strlen($str))
                return array();
            $ret = array();
            // split the string into many documents according to the YAML standard
            preg_match_all('@(?:(?:^|\n)-{3}([\w\W]*?)(?:\n\.{3}(?:[\w\W]*?)(?=(?:\n-{3})|$))?(?=(?:\n-{3})|(?:$)))@',
                            "---\n".$str,        // make $str an explicit document by adding a --- in the beginning
                            $matches,PREG_SET_ORDER);
            foreach($matches as $match){
                $GLOBALS['YAYPARSER_anchors'] = array();
                if(isset($match[1]) && strlen(trim($match[1]))){
                    if(count($data = YAYPARSER_parse($match[1])))
                        $ret[] = $data;
                }
            }
            if(count($ret) == 1)
                $ret = $ret[0];
            return $ret;
        }
    }
    if(!function_exists('YAYPARSER_substr')){
        /**
         * A substr wrapper for use with array_walk().
         */
        function YAYPARSER_substr(&$value,&$key,&$len){
            $value = substr($value,$len);
        }
    }
    if(!function_exists('YAYPARSER_trim')){
        /**
         * A trim wrapper for use with array_walk().
         */
        function YAYPARSER_trim(&$value,&$key){
            $value = trim($value);
        }
    }
    ?>
  • #14 / Jun 08, 2008 7:29pm

    evanwalsh's avatar

    evanwalsh

    22 posts

    You should make a wiki article

  • #15 / Jun 09, 2008 10:41am

    m4rw3r's avatar

    m4rw3r

    647 posts

    Done: Wiki

.(JavaScript must be enabled to view this email address)

ExpressionEngine News!

#eecms, #events, #releases