<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-3528393554015655906</id><updated>2011-07-29T00:02:29.913-07:00</updated><category term='hack'/><category term='irrefutable patterns'/><category term='javascript'/><category term='human readable'/><category term='python'/><category term='encoding'/><category term='haskell'/><category term='FSF'/><category term='arc'/><category term='perl'/><category term='query languages'/><category term='utf-8'/><category term='elisp'/><category term='regular expression'/><category term='go language'/><category term='coreutils'/><category term='sort'/><category term='batch'/><title type='text'>The programmable programmer</title><subtitle type='html'>I bet you thought this was about lisp.</subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://michaelspeer.knome.net/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3528393554015655906/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://michaelspeer.knome.net/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>Michael Speer</name><uri>http://www.blogger.com/profile/05288923100813819592</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>18</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-3528393554015655906.post-3176124189350784902</id><published>2010-05-13T15:13:00.000-07:00</published><updated>2010-05-13T17:50:46.013-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='javascript'/><title type='text'>A Pattern for Non-Local Returns in Javascript</title><content type='html'>A week or two ago, someone on reddit was complaining about javascripts lack of non-local returns and stating the language was poor as it couldn't be extended to include them.&lt;br /&gt;&lt;br /&gt;So, for fun, I wrote a quick implementation of non-local returns in javascript.  Probably inefficient as hell, but functional nontheless.&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;// javascript non-local return&lt;br /&gt;&amp;lt;html&amp;gt;&amp;lt;body&amp;gt;&amp;lt;script language="javascript"&amp;gt;&lt;br /&gt;&lt;br /&gt;var WithNonLocalReturn = function( block ){&lt;br /&gt;     var ex_t = function(){&lt;br /&gt;         var hidden = null ;&lt;br /&gt;&lt;br /&gt;         this.set = function( retval ){&lt;br /&gt;             hidden = retval ;&lt;br /&gt;         };&lt;br /&gt;&lt;br /&gt;         this.get = function( retval ){&lt;br /&gt;             return hidden ;&lt;br /&gt;         };&lt;br /&gt;     };&lt;br /&gt;&lt;br /&gt;     var ex = new ex_t() ;&lt;br /&gt;&lt;br /&gt;     var valid = true ;&lt;br /&gt;&lt;br /&gt;     var leave = function( retval ){&lt;br /&gt;         if( valid ){&lt;br /&gt;             ex.set( retval ) ;&lt;br /&gt;             throw ex ;&lt;br /&gt;         } else {&lt;br /&gt;             throw ( ""&lt;br /&gt;               + "It is invalid to call the non-local return"&lt;br /&gt;               + "outside the stack of the creating WithNonLocalReturn" &lt;br /&gt;               ) ;&lt;br /&gt;         }&lt;br /&gt;     };&lt;br /&gt;&lt;br /&gt;     try {&lt;br /&gt;         return block( leave ) ;&lt;br /&gt;     } catch( e ) {&lt;br /&gt;         if( e instanceof ex_t ){&lt;br /&gt;             return e.get() ;&lt;br /&gt;         } else {&lt;br /&gt;             throw e ;&lt;br /&gt;         }&lt;br /&gt;     } finally {&lt;br /&gt;         valid = false ;&lt;br /&gt;     }&lt;br /&gt; };&lt;br /&gt;&lt;br /&gt;var Each = function( iterable , fn ){&lt;br /&gt;    for( k in iterable ){&lt;br /&gt;        fn( iterable[ k ] ) ;&lt;br /&gt;    }&lt;br /&gt;};&lt;br /&gt;&lt;br /&gt;alert( WithNonLocalReturn( function( nonLocallyReturn ){&lt;br /&gt;            Each( [ 1, 2, 3, 4, 5, 6, 7, 8, 9] ,&lt;br /&gt;                  function( i ){&lt;br /&gt;                      alert( 'Testing : ' + i ) ;&lt;br /&gt;                      if( i == 6 ){&lt;br /&gt;                          nonLocallyReturn( 'Success : ' + i ) ;&lt;br /&gt;                      }&lt;br /&gt;                  });&lt;br /&gt;&lt;br /&gt;            return 'Failure' ;&lt;br /&gt;        }));&lt;br /&gt;&lt;br /&gt;&amp;lt;/script&amp;gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;For fun they're nestable and since you name the value-returning function, you can easily and readably select what level nesting to return to.  Or horrifically pass a non-local-return through as a sort of continuation down the stack of an otherwise rational bit of logic.&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;alert( WithNonLocalReturn( function( dontDoThis ){&lt;br /&gt;            return 'lolwut'.replace( /lol/g , dontDoThis ) ;&lt;br /&gt;        }));&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;One should probably avoid doing that.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3528393554015655906-3176124189350784902?l=michaelspeer.knome.net' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://michaelspeer.knome.net/feeds/3176124189350784902/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3528393554015655906&amp;postID=3176124189350784902' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3528393554015655906/posts/default/3176124189350784902'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3528393554015655906/posts/default/3176124189350784902'/><link rel='alternate' type='text/html' href='http://michaelspeer.knome.net/2010/05/pattern-for-non-local-returns-in.html' title='A Pattern for Non-Local Returns in Javascript'/><author><name>Michael Speer</name><uri>http://www.blogger.com/profile/05288923100813819592</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3528393554015655906.post-4556948387434147373</id><published>2010-03-08T15:59:00.000-08:00</published><updated>2010-03-08T16:09:45.224-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='go language'/><title type='text'>Go Language is lovely</title><content type='html'>I've been playing with Google's go language for the last week or so, and I have to say its a lovely language.  I generally grab the mailing list for anything new I get into, and today someone was looking for a way to wait for a set of go-routines to complete before continuing on in the main thread.  Seeing as how common this sort of thing is, I decided to write up a quick library for it.&lt;br /&gt;&lt;br /&gt;To use the library first issue a &lt;code&gt;dispatch.New()&lt;/code&gt; to acquire a new dispatch mechanism.  Then attach any number ( technically upto &lt;code&gt;uint&lt;/code&gt; ) of processes to it it via &lt;code&gt;dispatchInstance.Go( func(){ other_process_here() } )&lt;/code&gt;.  Any number of other processes ( still technically &lt;code&gt;uint&lt;/code&gt; ) can wait on the runners to finish before executing.&lt;br /&gt;&lt;br /&gt;Anyway, here it is, and hope it's useful for someone.&lt;br /&gt;&lt;br /&gt;In dispatch.go :&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;package dispatch&lt;br /&gt;&lt;br /&gt;import "sync"&lt;br /&gt;&lt;br /&gt;type Manager interface {&lt;br /&gt;        Go( func() )&lt;br /&gt;        Wait()&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;type manager struct {&lt;br /&gt;        lock sync.Mutex&lt;br /&gt;        running uint&lt;br /&gt;        waiting uint&lt;br /&gt;        wakeup chan bool&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;func New() *manager {&lt;br /&gt;        m := new(manager)&lt;br /&gt;        m.wakeup = make(chan bool)&lt;br /&gt;        return m&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;func (m *manager) Go( fn func() ) {&lt;br /&gt;        m.lock.Lock()&lt;br /&gt;        m.running++&lt;br /&gt;        m.lock.Unlock()&lt;br /&gt;&lt;br /&gt;        go func(){&lt;br /&gt;                fn()&lt;br /&gt;&lt;br /&gt;                m.lock.Lock()&lt;br /&gt;                m.running--&lt;br /&gt;                if (m.running == 0) &amp;&amp; (m.waiting &gt; 0) {&lt;br /&gt;                        oc := m.wakeup&lt;br /&gt;                        nc := make(chan bool)&lt;br /&gt;                        i := m.waiting&lt;br /&gt;                        go func(){&lt;br /&gt;                                for ; i &gt; 0 ; i-- {&lt;br /&gt;                                        oc &lt;- true&lt;br /&gt;                                }&lt;br /&gt;                        }()&lt;br /&gt;                        m.wakeup = nc&lt;br /&gt;                        m.waiting = 0&lt;br /&gt;                }&lt;br /&gt;                m.lock.Unlock()&lt;br /&gt;        }()&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;func (m *manager) Wait() {&lt;br /&gt;        wait := false&lt;br /&gt;&lt;br /&gt;        m.lock.Lock()&lt;br /&gt;        if m.running &gt; 0 {&lt;br /&gt;                m.waiting++&lt;br /&gt;                wait = true&lt;br /&gt;        }&lt;br /&gt;        m.lock.Unlock()&lt;br /&gt;&lt;br /&gt;        if wait {&lt;br /&gt;                &lt;- m.wakeup&lt;br /&gt;        }&lt;br /&gt;}&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;And some example usage in main.go :&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;package main&lt;br /&gt;&lt;br /&gt;import "fmt"&lt;br /&gt;import "rand"&lt;br /&gt;import "time"&lt;br /&gt;&lt;br /&gt;import "./dispatch"&lt;br /&gt;&lt;br /&gt;func main () {&lt;br /&gt;        w := dispatch.New()&lt;br /&gt;&lt;br /&gt;        for i := 0 ; i &lt; 100 ; i++ {&lt;br /&gt;                c := i&lt;br /&gt;                w.Go( func(){&lt;br /&gt;                        time.Sleep( rand.Int63n( 1000000000 ) )&lt;br /&gt;   fmt.Print( c , "\n" )&lt;br /&gt;                        w.Go( func(){&lt;br /&gt;                                time.Sleep( rand.Int63n( 1000000000 ) )&lt;br /&gt;                                fmt.Print( c , " - second effect\n")&lt;br /&gt;                        })&lt;br /&gt;                })&lt;br /&gt; }&lt;br /&gt;&lt;br /&gt;        fmt.Print( "All Launched\n" )&lt;br /&gt;&lt;br /&gt;        w2 := dispatch.New()&lt;br /&gt;&lt;br /&gt;        for i := 0 ; i &lt; 5 ; i++ {&lt;br /&gt;                c := i&lt;br /&gt;  w2.Go( func(){&lt;br /&gt;                        w.Wait()&lt;br /&gt;                        time.Sleep( rand.Int63n( 1000000000 ) )&lt;br /&gt;                        fmt.Print("[ " , c , "] This should happen after the first set\n")&lt;br /&gt;                })&lt;br /&gt;        }&lt;br /&gt;&lt;br /&gt;        fmt.Print( "Second set all launched\n" )&lt;br /&gt;&lt;br /&gt;        w.Wait()&lt;br /&gt;&lt;br /&gt;        for i := 10 ; i &lt; 15 ; i++ {&lt;br /&gt;                c := i&lt;br /&gt;                w.Go( func(){&lt;br /&gt;                        time.Sleep( rand.Int63n( 1000000000 ) )&lt;br /&gt;                        fmt.Print("[ " , c , "] reusing first queue\n")&lt;br /&gt;                })&lt;br /&gt;        }&lt;br /&gt;&lt;br /&gt;        fmt.Print( "Main thread past first set\n" )&lt;br /&gt;&lt;br /&gt; w2.Wait()&lt;br /&gt;&lt;br /&gt;        fmt.Print( "Main thread past second set\n" )&lt;br /&gt;&lt;br /&gt;        w.Wait()&lt;br /&gt;&lt;br /&gt;        fmt.Print( "Main thread past reuse of first queue\n" )&lt;br /&gt;&lt;br /&gt;}&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;I really like this language.  It feels like somebody took many of the best facets of C, javascript and erlang and tucked them into a single package.  The static duck-typing is beautiful.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3528393554015655906-4556948387434147373?l=michaelspeer.knome.net' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://michaelspeer.knome.net/feeds/4556948387434147373/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3528393554015655906&amp;postID=4556948387434147373' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3528393554015655906/posts/default/4556948387434147373'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3528393554015655906/posts/default/4556948387434147373'/><link rel='alternate' type='text/html' href='http://michaelspeer.knome.net/2010/03/go-language-is-lovely.html' title='Go Language is lovely'/><author><name>Michael Speer</name><uri>http://www.blogger.com/profile/05288923100813819592</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3528393554015655906.post-8019095454453889922</id><published>2009-12-20T19:36:00.001-08:00</published><updated>2009-12-20T19:44:50.490-08:00</updated><title type='text'>I'm pretty sure it was a joke.</title><content type='html'>HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA &lt;br /&gt;&lt;br /&gt;&lt;pre&gt;      &amp;lt;?php&lt;br /&gt;      &lt;br /&gt;      // This is a source file, it ends in ?&amp;gt; ok?&lt;br /&gt;      &lt;br /&gt;      print "ha ha ha HAHAAHAHAHAHAHAHA" ;&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA &lt;br /&gt;&lt;br /&gt;&lt;pre&gt;    $ php test.php &lt;br /&gt;      ok?&lt;br /&gt;      &lt;br /&gt;      print "ha ha ha HAHAAHAHAHAHAHAHA" ;&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;  &lt;br /&gt;HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA &lt;br /&gt;&lt;br /&gt;This is obviously a practical joke left in the parser to amuse those of use that dare comment out a line containing a string that starts an xml document.&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;    //            echo '&amp;lt;?xml version="1.0" encoding="UTF-8" ?&amp;gt;'&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;Oh PHP, you wonderful bastard.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3528393554015655906-8019095454453889922?l=michaelspeer.knome.net' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://michaelspeer.knome.net/feeds/8019095454453889922/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3528393554015655906&amp;postID=8019095454453889922' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3528393554015655906/posts/default/8019095454453889922'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3528393554015655906/posts/default/8019095454453889922'/><link rel='alternate' type='text/html' href='http://michaelspeer.knome.net/2009/12/im-pretty-sure-it-was-joke.html' title='I&apos;m pretty sure it was a joke.'/><author><name>Michael Speer</name><uri>http://www.blogger.com/profile/05288923100813819592</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3528393554015655906.post-3929604600679235324</id><published>2009-07-13T13:37:00.001-07:00</published><updated>2010-05-07T19:24:01.967-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='utf-8'/><category scheme='http://www.blogger.com/atom/ns#' term='regular expression'/><category scheme='http://www.blogger.com/atom/ns#' term='encoding'/><title type='text'>Garbage to UTF-8</title><content type='html'>I had a problem last year with a legacy database filled with a mix of utf-8, windows cp-1252, extended and regular ascii.  I needed a way to clean up the information without losing any of the information contained in it.&lt;br /&gt;&lt;br /&gt;Being familiar with regular expressions, I looked up how UTF-8 was formatted, made a couple of assumptions about the malformations I would find therein, figured out which code points were the ones I should replace and the following function was born.&lt;br /&gt;&lt;br /&gt;Or something like that.  Anyway here is the code :&lt;br /&gt;&lt;br /&gt;&lt;code style="font-family:monospace;white-space:pre;"&gt;&lt;br /&gt;  function garbage_to_utf8_character_replacement_function( $matches )&lt;br /&gt;  {&lt;br /&gt;&lt;br /&gt;    // converts binary 10000000 -&amp;gt; 11111111 that do not appear&lt;br /&gt;    // as part of a unicode character into a unicode character&lt;br /&gt;    // under the assumption that a portion of them are windows&lt;br /&gt;    // cp-1252 characters, and the rest are exteneded ascii&lt;br /&gt;    // characters&lt;br /&gt;&lt;br /&gt;    $o = ord( $matches[ 0 ] ) ;&lt;br /&gt;&lt;br /&gt;    switch( $o )&lt;br /&gt;      {&lt;br /&gt;        // check for windows code page 1252 characters&lt;br /&gt;      case 130 : return "\xe2\x80\x94" ; // Single Low-9 Quotation Mark&lt;br /&gt;      case 131 : return "\xc6\x92" ;     // Latin Small Letter F With Hook&lt;br /&gt;      case 132 : return "\xe2\x80\x9e" ; // Double Low-9 Quotation Mark&lt;br /&gt;      case 133 : return "\xe2\x80\xa6" ; // Horizontal Ellipsis&lt;br /&gt;      case 134 : return "\xe2\x80\xa0" ; // Dagger&lt;br /&gt;      case 135 : return "\xe2\x80\xa1" ; // Double Dagger&lt;br /&gt;      case 136 : return "\xcb\x86" ;     // Modifier Letter Circumflex Accent&lt;br /&gt;      case 137 : return "\xe2\x80\xb0" ; // Per Mille Sign&lt;br /&gt;      case 138 : return "\xc5\xa0" ;     // Latin Capital Letter S With Caron&lt;br /&gt;      case 139 : return "\xe2\x80\xb9" ; // Single Left-Pointing Angle Quotation Mark&lt;br /&gt;      case 140 : return "\xc5\x92" ;     // Latin Capital Ligature OE&lt;br /&gt;        //gap&lt;br /&gt;      case 145 : return "\xe2\x80\x98" ; // Left Single Quotation Mark&lt;br /&gt;      case 146 : return "\xe2\x80\x99" ; // Right Single Quotation Mark&lt;br /&gt;      case 147 : return "\xe2\x80\x9c" ; // Left Double Quotation Mark&lt;br /&gt;      case 148 : return "\xe2\x80\x9d" ; // Right Double Quotation Mark&lt;br /&gt;      case 149 : return "\xe2\x80\xa2" ; // Bullet&lt;br /&gt;      case 150 : return "\xe2\x80\x93" ; // En Dash&lt;br /&gt;      case 151 : return "\xe2\x80\x94" ; // Em Dash&lt;br /&gt;      case 152 : return "\xcb\x9c" ;     // Small Tilde&lt;br /&gt;      case 153 : return "\xe2\x84\xa2" ; // Trade Mark Sign&lt;br /&gt;      case 154 : return "\xc5\xa1" ;     // Latin Small Letter S With Caron&lt;br /&gt;      case 155 : return "\xe2\x80\xba" ; // Single Right-Pointing Angle Quotation Mark&lt;br /&gt;      case 156 : return "\xc5\x93" ;     // Latin Small Ligature OE&lt;br /&gt;        //gap&lt;br /&gt;      case 159 : return "\xc5\xb8" ;     // Latin Capital Letter Y With Diaeresis&lt;br /&gt;&lt;br /&gt;      default :&lt;br /&gt;        return chr( 192 | ( 3 &amp;amp; ( $o &amp;gt;&amp;gt; 6 ) ) ) . chr( $o &amp;amp; 191 ) ;&lt;br /&gt;&lt;br /&gt;      }&lt;br /&gt;&lt;br /&gt;  }&lt;br /&gt;&lt;br /&gt;  function garbage_to_utf8( $text )&lt;br /&gt;  {&lt;br /&gt;&lt;br /&gt;    // locate all bytes with 0x80 set that are not a proper&lt;br /&gt;    // component of a unicode character.  pass them to&lt;br /&gt;    // garbage_to_utf8_character_replacement_function&lt;br /&gt;    // to convert them to unicode under the assumptions they&lt;br /&gt;    // are either windows characters or extended ascii&lt;br /&gt;&lt;br /&gt;  $bad_replace = ''&lt;br /&gt;    . '/('&lt;br /&gt;    .     '('&lt;br /&gt;    // find 1111110x not followed by 5 10xxxxxx&lt;br /&gt;    .        '[\\xFC-\\xFD](?![\\x80-\\xBF][\\x80-\\xBF][\\x80-\\xBF][\\x80-\\xBF][\\x80-\\xBF])'&lt;br /&gt;    .     '|'&lt;br /&gt;    // find 111110xx not followed by 4 10xxxxxx&lt;br /&gt;    .        '[\\xF8-\\xFB](?![\\x80-\\xBF][\\x80-\\xBF][\\x80-\\xBF][\\x80-\\xBF])'&lt;br /&gt;    .     '|'&lt;br /&gt;    // find 11110xxx not followed by 3 10xxxxxx&lt;br /&gt;    .        '[\\xF0-\\xF7](?![\\x80-\\xBF][\\x80-\\xBF][\\x80-\\xBF])'&lt;br /&gt;    .     '|'&lt;br /&gt;    // find 1110xxxx not followed by 2 10xxxxxx&lt;br /&gt;    .        '[\\xE0-\\xEF](?![\\x80-\\xBF][\\x80-\\xBF])'&lt;br /&gt;    .     '|'&lt;br /&gt;    // find 110xxxxx not followed by 1 10xxxxxx&lt;br /&gt;    .        '[\\xC0-\\xDF](?![\\x80-\\xBF])'&lt;br /&gt;    .     '|'&lt;br /&gt;    // find 10xxxxxx not part of code point&lt;br /&gt;    .       '(?&amp;lt;!'&lt;br /&gt;    .         '[\\xFC-\\xFD][\\x80-\\xBF][\\x80-\\xBF][\\x80-\\xBF][\\x80-\\xBF][\\x80-\\xBF]'&lt;br /&gt;    .       '|'&lt;br /&gt;    .         '[\\xFC-\\xFD][\\x80-\\xBF][\\x80-\\xBF][\\x80-\\xBF][\\x80-\\xBF]'&lt;br /&gt;    .       '|'&lt;br /&gt;    .         '[\\xFC-\\xFD][\\x80-\\xBF][\\x80-\\xBF][\\x80-\\xBF]'&lt;br /&gt;    .       '|'&lt;br /&gt;    .         '[\\xFC-\\xFD][\\x80-\\xBF][\\x80-\\xBF]'&lt;br /&gt;    .       '|'&lt;br /&gt;    .         '[\\xFC-\\xFD][\\x80-\\xBF]'&lt;br /&gt;    .       '|'&lt;br /&gt;    .         '[\\xFC-\\xFD]'&lt;br /&gt;    .       '|'&lt;br /&gt;    .         '[\\xF8-\\xFB][\\x80-\\xBF][\\x80-\\xBF][\\x80-\\xBF]'&lt;br /&gt;    .       '|'&lt;br /&gt;    .         '[\\xF8-\\xFB][\\x80-\\xBF][\\x80-\\xBF]'&lt;br /&gt;    .       '|'&lt;br /&gt;    .         '[\\xF8-\\xFB][\\x80-\\xBF]'&lt;br /&gt;    .       '|'&lt;br /&gt;    .         '[\\xF8-\\xFB]'&lt;br /&gt;    .       '|'&lt;br /&gt;    .         '[\\xF0-\\xF7][\\x80-\\xBF][\\x80-\\xBF]'&lt;br /&gt;    .       '|'&lt;br /&gt;    .         '[\\xF0-\\xF7][\\x80-\\xBF]'&lt;br /&gt;    .       '|'&lt;br /&gt;    .         '[\\xF0-\\xF7]'&lt;br /&gt;    .       '|'&lt;br /&gt;    .         '[\\xE0-\\xEF][\\x80-\\xBF]'&lt;br /&gt;    .       '|'&lt;br /&gt;    .         '[\\xE0-\\xEF]'&lt;br /&gt;    .       '|'&lt;br /&gt;    .         '[\\xC0-\\xDF]'&lt;br /&gt;    .       ')'&lt;br /&gt;    .       '[\\x80-\\xBF]'&lt;br /&gt;    .     ')'&lt;br /&gt;    . ')/'&lt;br /&gt;    ;&lt;br /&gt;&lt;br /&gt;    return preg_replace_callback( $bad_replace ,&lt;br /&gt;                                  'garbage_to_utf8_character_replacement_function' ,&lt;br /&gt;                                  $text ) ;&lt;br /&gt;&lt;br /&gt;  }&lt;br /&gt;&lt;/code&gt;&lt;br /&gt;&lt;br /&gt;A quick search shows I'm not the only one to have solved this using regular expressions.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://search.cpan.org/~grantm/Encoding-FixLatin-1.00/lib/Encoding/FixLatin.pm"&gt;FixLatin&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Too bad he didn't post sooner, it would've saved me having to figure out the encoding transformation on my own.  Ah well, at least I'm not the only one.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3528393554015655906-3929604600679235324?l=michaelspeer.knome.net' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://michaelspeer.knome.net/feeds/3929604600679235324/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3528393554015655906&amp;postID=3929604600679235324' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3528393554015655906/posts/default/3929604600679235324'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3528393554015655906/posts/default/3929604600679235324'/><link rel='alternate' type='text/html' href='http://michaelspeer.knome.net/2009/07/garbage-to-utf-8.html' title='Garbage to UTF-8'/><author><name>Michael Speer</name><uri>http://www.blogger.com/profile/05288923100813819592</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3528393554015655906.post-45811913296517781</id><published>2009-05-14T19:12:00.001-07:00</published><updated>2009-05-14T19:15:17.418-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='FSF'/><category scheme='http://www.blogger.com/atom/ns#' term='human readable'/><category scheme='http://www.blogger.com/atom/ns#' term='coreutils'/><category scheme='http://www.blogger.com/atom/ns#' term='sort'/><title type='text'>Human Readable Sort</title><content type='html'>I was using sort the other day at work and got annoyed that it wouldn't sort by human-readable units.  I wrote a patch that night, signed up for coreutils mailing list the next day and emailed it in.&lt;br /&gt;&lt;br /&gt;I worked with one of the developers trading the patch back and forth the next two nights.&lt;br /&gt;&lt;br /&gt;Now, `du -hs * | sort -h` is ready to be added to the coreutils.  My first FSF contribution.&lt;br /&gt;&lt;br /&gt;http://www.nabble.com/Human-readable-sort-td23223205.html&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3528393554015655906-45811913296517781?l=michaelspeer.knome.net' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://michaelspeer.knome.net/feeds/45811913296517781/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3528393554015655906&amp;postID=45811913296517781' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3528393554015655906/posts/default/45811913296517781'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3528393554015655906/posts/default/45811913296517781'/><link rel='alternate' type='text/html' href='http://michaelspeer.knome.net/2009/05/human-readable-sort.html' title='Human Readable Sort'/><author><name>Michael Speer</name><uri>http://www.blogger.com/profile/05288923100813819592</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3528393554015655906.post-7015165784413174008</id><published>2008-03-06T08:15:00.000-08:00</published><updated>2008-03-06T08:35:19.819-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='elisp'/><title type='text'>Hidden backup files with emacs</title><content type='html'>Ever since I started using emacs the directories full of `whatever~' files have annoyed me.  No more!  I put this into my .emacs ( along with a (require 'cl) ) and voila, backups for `whatever' end up in `.whatever~' in the same directory.  .emacs?  `..emacs~'.  Thus when I run ls my directory listing is clean.&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;;; hidden backup files - i hate seeing them in listings ...                                                                                                                                                       &lt;br /&gt;;; prefix with a dot as well as postfix with a tilde                                                                                                                                                              &lt;br /&gt;(defun custom-make-backup-file-name ( file )&lt;br /&gt;  (let ((d (file-name-directory file))&lt;br /&gt;        (f (file-name-nondirectory file)))&lt;br /&gt;    (concat d "." f "~")))&lt;br /&gt;(setq make-backup-file-name-function 'custom-make-backup-file-name)&lt;br /&gt;&lt;br /&gt;(defun backup-file-name-p ( file )&lt;br /&gt;  (let ((letters (string-to-list (file-name-nondirectory file))))&lt;br /&gt;    (and (&gt; 2 (length letters))&lt;br /&gt;         (equal "." (first letters))&lt;br /&gt;         (equal "~" (last letters)))))&lt;br /&gt;&lt;br /&gt;(defun file-name-sans-versions ( file )&lt;br /&gt;  (if (not (backup-file-name-p file))&lt;br /&gt;      file&lt;br /&gt;    (let ((d (file-name-directory file))&lt;br /&gt;          (f (file-name-nondirectory file)))&lt;br /&gt;      (let ((letters (string-to-list f)))&lt;br /&gt;        (concat d (subseq letters 1 (- (length f) 1)))))))&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;While I'm busy dumping from my .emacs file, I like the truncated lines when I use ( C-x 3 ) to divide the display vertically except when I'm running a shell.  Then I want to see everything.&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;;; do not truncate lines in shell                                                                                                                                                                                 &lt;br /&gt;(add-hook 'shell-mode-hook (lambda () (progn&lt;br /&gt;                                        (make-local-variable 'truncate-partial-width-windows)&lt;br /&gt;                                        (setq truncate-partial-width-windows nil))))&lt;br /&gt;&lt;/pre&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3528393554015655906-7015165784413174008?l=michaelspeer.knome.net' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://michaelspeer.knome.net/feeds/7015165784413174008/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3528393554015655906&amp;postID=7015165784413174008' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3528393554015655906/posts/default/7015165784413174008'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3528393554015655906/posts/default/7015165784413174008'/><link rel='alternate' type='text/html' href='http://michaelspeer.knome.net/2008/03/hidden-backup-files-with-emacs.html' title='Hidden backup files with emacs'/><author><name>Michael Speer</name><uri>http://www.blogger.com/profile/05288923100813819592</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3528393554015655906.post-1673605506990313215</id><published>2008-02-25T06:58:00.000-08:00</published><updated>2008-02-25T07:39:01.966-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='batch'/><category scheme='http://www.blogger.com/atom/ns#' term='hack'/><title type='text'>Batch-fu</title><content type='html'>I don't know how many avid windows batchers are out there ( I was one some years ago ), but perhaps a few of you can use / be horrified by this little helper.  Ever get annoyed because you can't easily reuse functions between scripts since they'll stomp all over each others environment variables?  Probably not.  Just in case, here's how to create lexically scoped batch file functions.&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;::#&lt;br /&gt;:ServerName_Service&lt;br /&gt;setlocal&lt;br /&gt;&lt;br /&gt;:: blah blah do anything to namespace blah&lt;br /&gt;&lt;br /&gt;::now pass the full name / status back out of the setlocal&lt;br /&gt;for /F "usebackq tokens=1,2 delims=~" %%a in (`echo.%ServiceName%~"%Status%"`) do (&lt;br /&gt;  endlocal&lt;br /&gt;  set CACHE~%%a=%%b&lt;br /&gt;)&lt;br /&gt;goto :eof&lt;/pre&gt;&lt;br /&gt;Tada!&lt;br /&gt;&lt;br /&gt;Even better if you structure them such that the first argument is the name of the variable to receive the value from the function call and then write your exit similar to this :&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;::now pass the full name / status back out of the setlocal&lt;br /&gt;for /F "usebackq tokens=1,2 delims=~" %%a in (`echo.%ServiceName%~"%Status%"`) do (&lt;br /&gt;  endlocal&lt;br /&gt;  set %1=%%b&lt;br /&gt;)&lt;/pre&gt;&lt;br /&gt;BTW, I don't really recommend writing large programs in batch, but if draconian network policies make it all you've got, good luck.&lt;br /&gt;&lt;br /&gt;Remember that it is two phase, first variable expansion happens, then execution occurs.  Execution of lines starting with `:' makes these lines into labels.  Lines that start `::' are label errors and dropped ( making for better comments than rem, which executes and freaks out all to hell if special characters are in its argument list / comment area ).  Lines starting `%%en_var%%' where the environment variable `en_var' has the value `::' are label errors and dropped. ( This can be used to great effect.  I can't take credit for this hack though.  I found it on &lt;a href="http://www.robvanderwoude.com/"&gt;Rob van der Woude&lt;/a&gt;'s scripting site ).&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3528393554015655906-1673605506990313215?l=michaelspeer.knome.net' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://michaelspeer.knome.net/feeds/1673605506990313215/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3528393554015655906&amp;postID=1673605506990313215' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3528393554015655906/posts/default/1673605506990313215'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3528393554015655906/posts/default/1673605506990313215'/><link rel='alternate' type='text/html' href='http://michaelspeer.knome.net/2008/02/batch-fu.html' title='Batch-fu'/><author><name>Michael Speer</name><uri>http://www.blogger.com/profile/05288923100813819592</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3528393554015655906.post-7465503386171704900</id><published>2008-02-01T14:01:00.000-08:00</published><updated>2008-02-01T14:33:52.923-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='haskell'/><category scheme='http://www.blogger.com/atom/ns#' term='arc'/><title type='text'>Function chaining in arc.</title><content type='html'>I noticed some complaints about a lack of the equivalent of Haskell's point-free programming style for Arc, I decided to implement one of Haskell's more useful operators.&lt;br /&gt;&lt;br /&gt;The $ operator in Haskell breaks a statement into two statements.  It executes the second and then uses the output as the final variable to the first.&lt;br /&gt;&lt;br /&gt;&lt;pre&gt; e.g. &lt;br /&gt;main = print $ "hello" ++ " " ++ "world"&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;is the same as &lt;br /&gt;&lt;br /&gt;&lt;pre&gt;main = print ( "hello" ++ " " ++ "world )&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;Lacking infix operators I cannot recreate the functionality exactly.  Nor, given that Arc is a lisp and doing so would be decidedly unlispy, would I want to.  Instead I have created the following &lt;code&gt;chain&lt;/code&gt; macro that will allow a similar form of composition.  I do not believe it would be difficult to turn the following into a mutli-argument &lt;code&gt;compose&lt;/code&gt; macro either.&lt;br /&gt;&lt;br /&gt;&lt;pre&gt; ;; chains similar to haskells $ operator.                                                                                                                                                                         &lt;br /&gt;(mac chain args&lt;br /&gt;  (if (is 0 (len args))&lt;br /&gt;        nil&lt;br /&gt;      (is 1 (len args))&lt;br /&gt;        (args 0)&lt;br /&gt;      t&lt;br /&gt;        (let n (uniq)&lt;br /&gt;          `(let ,n ,(eval `(chain ,@(cdr args)))&lt;br /&gt;             (eval (+ ',(args 0) (list ,n)))))))&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;This allows for the following :&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;arc&amp;gt;  (chain (+ 3) (+ 4 5) (- 2) (+ 6) 8)&lt;br /&gt;0&lt;br /&gt;arc&amp;gt; (chain)&lt;br /&gt;nil&lt;br /&gt;arc&amp;gt; (chain (+ 4))&lt;br /&gt;4&lt;br /&gt;arc&amp;gt; (chain (+ 4) (+ 4 8))&lt;br /&gt;16&lt;br /&gt;arc&amp;gt; (chain (+ 4) (+ 5 6) (- 5) 50)&lt;br /&gt;-30&lt;br /&gt;arc&amp;gt;  (chain (+ 3) ([/ _ 3]) ([- _ 2]) (+ 100) 12)&lt;br /&gt;119/3&lt;br /&gt;arc&amp;gt; &lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;If it didn't wrap a (list ...) around each return value and required each return value to be a list in its own right ( or only wrapped non-cons structures ) it would have similar semantics to perls execution model, compressing each list into the current argument chain.&lt;br /&gt;&lt;br /&gt;The chain macro needs all of the function calls to be wrapped in paren for it to operate properly, since it takes each value and appends it to the next function.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3528393554015655906-7465503386171704900?l=michaelspeer.knome.net' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://michaelspeer.knome.net/feeds/7465503386171704900/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3528393554015655906&amp;postID=7465503386171704900' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3528393554015655906/posts/default/7465503386171704900'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3528393554015655906/posts/default/7465503386171704900'/><link rel='alternate' type='text/html' href='http://michaelspeer.knome.net/2008/02/function-chaining-in-arc.html' title='Function chaining in arc.'/><author><name>Michael Speer</name><uri>http://www.blogger.com/profile/05288923100813819592</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3528393554015655906.post-3308730395837757421</id><published>2008-01-22T07:10:00.000-08:00</published><updated>2008-01-22T08:06:25.159-08:00</updated><title type='text'>Irrefutable Pattern Love.</title><content type='html'>Irrefutable patterns are awesome.  If anyone tells you differently, well, they're probably right.  I'm 0 for 1 regarding challenging people on the inner workings of Haskell thus far.  Anyway, this exerpt from my code is the exact reason irrefutable patterns rock.&lt;br /&gt;&lt;pre&gt; rexn ns pps = let ( ~( xs , rps ) ,&lt;br /&gt;                          ~( ~( nxs ) ,&lt;br /&gt;                             ~( rxs , rrps ) ) ) = ( exn nxs pps ,&lt;br /&gt;                                                       case rps of&lt;br /&gt;                                                         ('?':'?':rr) -&gt; ( ( ns ) ,&lt;br /&gt;                                                                           ( ns ++ xs , rr ) )&lt;br /&gt;                                                         ('?':rr)     -&gt; ( ( ns ) ,&lt;br /&gt;                                                                           ( xs ++ ns , rr ) )&lt;br /&gt;                                                         ('*':'?':rr) -&gt; ( ( ns ++ xs ) ,&lt;br /&gt;                                                                           ( ns ++ xs , rr ) )&lt;br /&gt;                                                         ('*':rr)     -&gt; ( ( xs ++ ns ) ,&lt;br /&gt;                                                                           ( xs ++ ns , rr ) )&lt;br /&gt;                                                         ('+':'?':rr) -&gt; ( ( ns ++ xs ) ,&lt;br /&gt;                                                                           ( xs , rr ) )&lt;br /&gt;                                                         ('+':rr)     -&gt; ( ( xs ++ ns ) ,&lt;br /&gt;                                                                           ( xs , rr ) )&lt;br /&gt;                                                         _            -&gt; ( ( ns ) ,&lt;br /&gt;                                                                           ( xs , rps ) )&lt;br /&gt;                                                     )&lt;br /&gt;                    in ( rxs , rrps )&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;reading key :: xs - extracted nodes , rps - remaining-pattern-string , nxs - next-nodes ( what the nodes being created will use as the next step when executing the regex ) , rxs - the set of nodes generated for the section of pattern being interpreted , rrps - remaining remaining pattern string - the final amount after accounting for repitition operators.&lt;br /&gt;&lt;br /&gt;See those tildes?  Those tell the haskell compiler that I guarantee that the variables being bound there will eventually be filled in and that I want it to assume the patterns will match irrefutably.  Guess thats where they got the name.  This allows me to do some fun things.  If you stare into that code for a few seconds, you'll see that the function exn depends on the variable nxs, which is generated in the case statement, which determines its value based on the remaining pattern string ( rps ) output by exn.  For bonus points the next-nodes ( nxs ) sometimes contain the node being generated to create a loop.&lt;br /&gt;&lt;br /&gt;This sort function interdependance would explode the stack of any other language with ease.  Not Haskell.&lt;br /&gt;&lt;br /&gt;The source to the newest iteration of my regex engine is follows.  It is incomplete, but the parts that have been implemented seem to be functioning fine.&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;-- regular expression engine -- (c) 2008 michael speer &lt;knomenet@gmail.com&gt;&lt;br /&gt;&lt;br /&gt;import Char ( isSpace )&lt;br /&gt;-- import Debug.Trace ( trace )&lt;br /&gt;&lt;br /&gt;xor :: Bool -&gt; Bool -&gt; Bool&lt;br /&gt;xor True  a = not a&lt;br /&gt;xor False a = a&lt;br /&gt;&lt;br /&gt;data RxToken = RxStart     -- matchable start of target string&lt;br /&gt;             | RxChar Char -- a literal character to match&lt;br /&gt;             | RxBound     -- inserted wherever alphanums touch whitespace&lt;br /&gt;             | RxEnd       -- matchable end of target string&lt;br /&gt;             | RxEOF       -- an additional token to push through to catch anything trying for RxEnd&lt;br /&gt;                           -- RxEOF is never matched.&lt;br /&gt;               deriving ( Show )&lt;br /&gt;&lt;br /&gt;rxTokenize tts = RxStart : case tts of&lt;br /&gt;                             []        -&gt; RxEnd : RxEOF : []&lt;br /&gt;                             tts@(t:_) -&gt; case not $ isSpace t of&lt;br /&gt;                                            True  -&gt; RxBound : rxt tts&lt;br /&gt;                                            False -&gt; rxt tts&lt;br /&gt;    where&lt;br /&gt;      rxt (t:[]) | not $ isSpace t = RxChar t : RxBound : RxEnd : RxEOF : []&lt;br /&gt;                 | otherwise       = RxChar t : RxEnd : RxEOF : []&lt;br /&gt;      rxt (t:ts@(t':_)) | isSpace t `xor` isSpace t' = RxChar t : RxBound : rxt ts&lt;br /&gt;                        | otherwise                  = RxChar t : rxt ts&lt;br /&gt;&lt;br /&gt;data RxTransform = RxTransform ( RxNode -&gt; RxToken -&gt; [ RxNode ] )&lt;br /&gt;                 | RxNullTransform&lt;br /&gt;&lt;br /&gt;data RxNode = RxActive { rxTransforms :: [RxTransform] , &lt;br /&gt;                         rxMatched    :: String ,&lt;br /&gt;                         rxNumSubs    :: Integer ,&lt;br /&gt;                         rxSubExprs   :: [ String ] }&lt;br /&gt;            | RxComplete { rxMatched :: String ,&lt;br /&gt;                           rxNumSubs :: Integer ,&lt;br /&gt;                           rxSubExprs :: [ String ] }&lt;br /&gt;&lt;br /&gt;instance Show RxNode where&lt;br /&gt;    show (RxComplete matched _ _) = "&amp;lt;rx|matched:[" ++ matched ++ "]&amp;gt;"&lt;br /&gt;&lt;br /&gt;data RxDepth = RxTop | RxSub deriving ( Show )&lt;br /&gt;&lt;br /&gt;rxCompile pps = let ( xs , rps ) = oexn [success] RxTop pps&lt;br /&gt;                in case length rps of&lt;br /&gt;                     0 -&gt; RxActive { rxTransforms = xs ,&lt;br /&gt;                                     rxMatched    = [] ,&lt;br /&gt;                                     rxNumSubs    =  0 ,&lt;br /&gt;                                     rxSubExprs   = [] }&lt;br /&gt;                     _ -&gt; error $ "Not all of pattern consumed : remains : " ++ rps&lt;br /&gt;    where&lt;br /&gt;      -- or together different expression sections -- (a|b|c)&lt;br /&gt;      oexn ns  RxTop [] = ( ns , [] )&lt;br /&gt;      oexn _  RxSub  [] = error "Pattern ended while still in sub expression"&lt;br /&gt;      oexn ns     d pps = let ( ~( xs  , rps  ) ,&lt;br /&gt;                                ~( nxs , nrps ) ) = ( aexn ns pps ,&lt;br /&gt;                                                      case rps of&lt;br /&gt;                                                        ('|':rr) -&gt; let ( inxs , irps ) = oexn ns d rr&lt;br /&gt;                                                                    in ( xs ++ inxs , irps )&lt;br /&gt;                                                        (')':rr) -&gt; case d of&lt;br /&gt;                                                                      RxTop -&gt; error "Erroneous close parenthesis in pattern "&lt;br /&gt;                                                                      RxSub -&gt; ( xs , rr )&lt;br /&gt;                                                        []       -&gt; case d of&lt;br /&gt;                                                                      RxTop -&gt; ( xs , [] )&lt;br /&gt;                                                                      RxSub -&gt; error "End of pattern while still in sub expression" )&lt;br /&gt;                          in ( nxs , nrps )&lt;br /&gt;      -- and together extracted nodes in a given expression segment -- abd?dfs&lt;br /&gt;      aexn ns pps = let ( ~( xs  , rps  ) ,&lt;br /&gt;                          ~( nxs , nrps ) ) = ( rexn nxs pps ,&lt;br /&gt;                                                case rps of&lt;br /&gt;                                                  ('|':_) -&gt; ( ns , rps )&lt;br /&gt;                                                  (')':_) -&gt; ( ns , rps )&lt;br /&gt;                                                  []      -&gt; ( ns , rps )&lt;br /&gt;                                                  _       -&gt; aexn ns rps )&lt;br /&gt;                    in ( xs , nrps )&lt;br /&gt;      -- replication application - weee!&lt;br /&gt;      rexn ns pps = let ( ~( xs , rps ) ,&lt;br /&gt;                          ~( ~( nxs ) ,&lt;br /&gt;                             ~( rxs , rrps ) ) ) = ( exn nxs pps ,&lt;br /&gt;                                                       case rps of&lt;br /&gt;                                                         ('?':'?':rr) -&gt; ( ( ns ) ,&lt;br /&gt;                                                                           ( ns ++ xs , rr ) )&lt;br /&gt;                                                         ('?':rr)     -&gt; ( ( ns ) ,&lt;br /&gt;                                                                           ( xs ++ ns , rr ) ) &lt;br /&gt;                                                         ('*':'?':rr) -&gt; ( ( ns ++ xs ) ,&lt;br /&gt;                                                                           ( ns ++ xs , rr ) )&lt;br /&gt;                                                         ('*':rr)     -&gt; ( ( xs ++ ns ) ,&lt;br /&gt;                                                                           ( xs ++ ns , rr ) )&lt;br /&gt;                                                         ('+':'?':rr) -&gt; ( ( ns ++ xs ) ,&lt;br /&gt;                                                                           ( xs , rr ) )&lt;br /&gt;                                                         ('+':rr)     -&gt; ( ( xs ++ ns ) ,&lt;br /&gt;                                                                           ( xs , rr ) )&lt;br /&gt;                                                         _            -&gt; ( ( ns ) ,&lt;br /&gt;                                                                           ( xs , rps ) )&lt;br /&gt;                                                     )&lt;br /&gt;                    in&lt;br /&gt;                      ( rxs , rrps )&lt;br /&gt;      -- extract node ( including an entire subexpression as a single node )&lt;br /&gt;      exn _  ('?':_)  = error "Bad question mark operator"&lt;br /&gt;      exn _  ('*':_)  = error "Bad splat operator"&lt;br /&gt;      exn _  ('+':_)  = error "Bad plus sign operator"&lt;br /&gt;      exn ns ('(':ps) = oexn ns RxSub ps&lt;br /&gt;      exn ns (p:ps)   = ( [ RxTransform ( \rxn k -&gt; case k of&lt;br /&gt;                                                      (RxChar c) -&gt; if c == p &lt;br /&gt;                                                                    then &lt;br /&gt;                                                                        [ RxActive { rxTransforms = ns ,&lt;br /&gt;                                                                                     rxMatched    = c : rxMatched rxn ,&lt;br /&gt;                                                                                     rxNumSubs    = rxNumSubs rxn ,&lt;br /&gt;                                                                                     rxSubExprs   = rxSubExprs rxn } ]&lt;br /&gt;                                                                    else&lt;br /&gt;                                                                        []&lt;br /&gt;                                                      (RxStart)  -&gt; [rxn]&lt;br /&gt;                                                      (RxBound)  -&gt; [rxn]&lt;br /&gt;                                                      _          -&gt; [] &lt;br /&gt;                                        ) ] ,&lt;br /&gt;                          ps )&lt;br /&gt;&lt;br /&gt;      exn ns []       = error "can this be reached?"&lt;br /&gt;&lt;br /&gt;      success = RxTransform ( \ rxn k -&gt; [ RxComplete { rxMatched  = reverse $ rxMatched rxn ,&lt;br /&gt;                                                        rxNumSubs  = rxNumSubs rxn , &lt;br /&gt;                                                        rxSubExprs = map reverse $ rxSubExprs rxn } ] )&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;rxExec n tts = iexec [n] (rxTokenize tts)&lt;br /&gt;    where&lt;br /&gt;      iexec (win@(RxComplete _ _ _ ):_) _      = Just win&lt;br /&gt;      iexec []                          _      = Nothing&lt;br /&gt;      iexec nns                         (k:ks) = iexec ( concatMap ( \n -&gt; case n of&lt;br /&gt;                                                                             (RxComplete _ _ _)   -&gt; [n]&lt;br /&gt;                                                                             a@(RxActive _ _ _ _) -&gt; concatMap (\xf -&gt; case xf of &lt;br /&gt;                                                                                                                         (RxTransform f)   -&gt; f a k &lt;br /&gt;                                                                                                                         (RxNullTransform) -&gt; []&lt;br /&gt;                                                                                                               ) (rxTransforms a) ) nns ) ks&lt;br /&gt;&lt;br /&gt;main = do&lt;br /&gt;  print $ rxTokenize "this is a test"&lt;br /&gt;  print $ rxExec (rxCompile "hello|world") "hello"&lt;br /&gt;  print $ rxExec (rxCompile "hello|world") "world"&lt;br /&gt;  print $ rxExec (rxCompile "abcde|ab")    "abcd"&lt;br /&gt;  print $ rxExec (rxCompile "ab?c")        "ac"&lt;br /&gt;  print $ rxExec (rxCompile "ab?c")        "abc"&lt;br /&gt;  print $ rxExec (rxCompile "ab*c")        "ac"&lt;br /&gt;  print $ rxExec (rxCompile "ab*c")        "abc"&lt;br /&gt;  print $ rxExec (rxCompile "ab*c")        "abbbc"&lt;br /&gt;  print $ rxExec (rxCompile "ab+c")        "ac"&lt;br /&gt;  print $ rxExec (rxCompile "ab+c")        "abc"&lt;br /&gt;  print $ rxExec (rxCompile "ab+c")        "abbbbbc"&lt;br /&gt;  print $ rxExec (rxCompile "(a|b)+")      "aaabbbabaaababaaabbbabbaba"&lt;br /&gt;  print $ rxExec (rxCompile "abc|")        "zyx"&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;Sweet.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3528393554015655906-3308730395837757421?l=michaelspeer.knome.net' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://michaelspeer.knome.net/feeds/3308730395837757421/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3528393554015655906&amp;postID=3308730395837757421' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3528393554015655906/posts/default/3308730395837757421'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3528393554015655906/posts/default/3308730395837757421'/><link rel='alternate' type='text/html' href='http://michaelspeer.knome.net/2008/01/irrefutable-pattern-love.html' title='Irrefutable Pattern Love.'/><author><name>Michael Speer</name><uri>http://www.blogger.com/profile/05288923100813819592</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3528393554015655906.post-6805492710404763262</id><published>2008-01-15T11:23:00.000-08:00</published><updated>2008-01-15T11:43:12.298-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='haskell'/><title type='text'>Twin naming golf</title><content type='html'>I saw &lt;a href="http://brad.livejournal.com/2354680.html"&gt;Brad Fitzpatrick's post on anagram twin names&lt;/a&gt; and decided to join in the golf.  Weeee!&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;import Control.Arrow ( first , second , (&amp;&amp;&amp;) , (***) )&lt;br /&gt;import List ( sortBy , groupBy , intersperse )&lt;br /&gt;&lt;br /&gt;main = interact $ unlines&lt;br /&gt;       . map ( uncurry (++) )&lt;br /&gt;       . map ( ( ++ " : " ) . fst . head &amp;&amp;&amp; concat . intersperse ", " . map snd )&lt;br /&gt;       . filter ( (&gt; 1) . length )&lt;br /&gt;       . groupBy ( curry $ uncurry (==) . (***) fst fst )&lt;br /&gt;       . sortBy (curry $ uncurry compare . (***) fst fst )&lt;br /&gt;       . map ( sortBy compare &amp;&amp;&amp; id )&lt;br /&gt;       . filter ( not . null )&lt;br /&gt;       . map ( fst . break (== ' ') )&lt;br /&gt;       . lines&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;hehehe&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3528393554015655906-6805492710404763262?l=michaelspeer.knome.net' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://michaelspeer.knome.net/feeds/6805492710404763262/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3528393554015655906&amp;postID=6805492710404763262' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3528393554015655906/posts/default/6805492710404763262'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3528393554015655906/posts/default/6805492710404763262'/><link rel='alternate' type='text/html' href='http://michaelspeer.knome.net/2008/01/twin-naming-golf.html' title='Twin naming golf'/><author><name>Michael Speer</name><uri>http://www.blogger.com/profile/05288923100813819592</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3528393554015655906.post-5510396344221806236</id><published>2007-11-29T09:01:00.000-08:00</published><updated>2007-11-29T12:12:30.314-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='python'/><category scheme='http://www.blogger.com/atom/ns#' term='hack'/><title type='text'>A context manager for temporary memoization.</title><content type='html'>This is very much a fragile hack at the moment.  It's an interesting idea I think.  I was disappointed when I initially found that the with_statement syntax does not restore the value of the `as var` upon completion.&lt;br /&gt;&lt;br /&gt;This made doing something along the lines of&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;   with temporarily_memoized( func ) :&lt;br /&gt;     for datum in data :&lt;br /&gt;       func( datum )&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;unattainable.&lt;br /&gt;&lt;br /&gt;Well, just a lot hackier actually.&lt;br /&gt;&lt;br /&gt;Thus temporarily_memoized( ... ) was born :&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;#!/usr/bin/python&lt;br /&gt;&lt;br /&gt;# a context manager for temporarily turning any function into&lt;br /&gt;# a memoized version of itself.&lt;br /&gt;&lt;br /&gt;from __future__ import with_statement&lt;br /&gt;import contextlib , sys , types&lt;br /&gt;&lt;br /&gt;def memoize ( func ) :&lt;br /&gt;     """ In haskell mutability must be handled explicitly.&lt;br /&gt;     Only fair that Python do the same to transparent functionality&lt;br /&gt;     """&lt;br /&gt;     remembered = {}&lt;br /&gt;     def memoized ( *args ) :&lt;br /&gt;          """ memoized version of function """&lt;br /&gt;          if args in remembered :&lt;br /&gt;               return remembered[ args ]&lt;br /&gt;          else :&lt;br /&gt;               new = func( *args )&lt;br /&gt;               remembered[ args ] = new&lt;br /&gt;               return new&lt;br /&gt;     return memoized&lt;br /&gt;    &lt;br /&gt;@contextlib.contextmanager&lt;br /&gt;def temporarily_memoized ( func ) :&lt;br /&gt;     """ memoize the given function for the duration of this block&lt;br /&gt;     save anything in the local variables that gets in the way of&lt;br /&gt;     using this so that it can be restored afterward , the memoized&lt;br /&gt;     version is found in the locals.  use on actual functions only.&lt;br /&gt;     no members. """&lt;br /&gt;&lt;br /&gt;     # this is being called, there has to be a frame above it&lt;br /&gt;     frame = sys._getframe().f_back.f_back&lt;br /&gt;     &lt;br /&gt;     if func.func_name in frame.f_locals :&lt;br /&gt;          f = frame.f_locals[ func.func_name ]&lt;br /&gt;          frame.f_locals[ func.func_name ] = memoize( func )&lt;br /&gt;          try :&lt;br /&gt;               # this hack replaces whatever in the local scope&lt;br /&gt;               # has the name of the variable.  if you try to use&lt;br /&gt;               # the 'as whatever' portion of the syntax , you&lt;br /&gt;               # are doing it wrong&lt;br /&gt;               yield None &lt;br /&gt;          finally :&lt;br /&gt;               frame.f_locals[ f.func_name ] = f&lt;br /&gt;     else :&lt;br /&gt;          frame.f_locals[ func.func_name ] = memoize( func )&lt;br /&gt;          try :&lt;br /&gt;               yield None&lt;br /&gt;          finally :&lt;br /&gt;               del frame.f_locals[ func.func_name ]&lt;br /&gt;&lt;br /&gt;def fib(n):&lt;br /&gt;     """ biggus fibbus """&lt;br /&gt;     if n == 0 or n == 1:&lt;br /&gt;          return n&lt;br /&gt;     else:&lt;br /&gt;          return fib(n-1) + fib(n-2)&lt;br /&gt;&lt;br /&gt;if __name__ == '__main__' :&lt;br /&gt;     print fib.__doc__&lt;br /&gt;     with temporarily_memoized( fib ) :&lt;br /&gt;          print fib.__doc__&lt;br /&gt;          for i in xrange( 36 ) :&lt;br /&gt;               print "n=%d =&gt; %d" % (i, fib(i))&lt;br /&gt;          print fib.__doc__&lt;br /&gt;     print fib.__doc__&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;I've played around with replacing the func_code and related variables in order to create a true proxy, but the requirement that the func_code and func_closure agree exactly on the number of local variables and that they refuse to set independently or be tricked into simultaneous setting by calling types.FunctionType.__init__( target , proxy-code , ... ) has thwarted my attempts at this.  I considered creating a variety of proxy functions each requiring a different number of free variables and then selecting the appropriate one, which would probably work, but it seems too much the wrong way to do things.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3528393554015655906-5510396344221806236?l=michaelspeer.knome.net' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://michaelspeer.knome.net/feeds/5510396344221806236/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3528393554015655906&amp;postID=5510396344221806236' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3528393554015655906/posts/default/5510396344221806236'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3528393554015655906/posts/default/5510396344221806236'/><link rel='alternate' type='text/html' href='http://michaelspeer.knome.net/2007/11/context-manager-for-temporary.html' title='A context manager for temporary memoization.'/><author><name>Michael Speer</name><uri>http://www.blogger.com/profile/05288923100813819592</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3528393554015655906.post-7387822383411304912</id><published>2007-11-28T13:26:00.000-08:00</published><updated>2007-11-28T19:32:10.366-08:00</updated><title type='text'>Don's benchmark.</title><content type='html'>&lt;blockquote&gt;&lt;br /&gt;  It appears the mob has spoken and that I am a wrong headed twit.&lt;br&gt;&lt;br /&gt;  My apologies to Don on the misunderstanding.  When I first saw the numbers I assumed that the speed had to be due to memoization.  It's not.&lt;br /&gt;&lt;/blockquote&gt;&lt;br /&gt;&lt;br /&gt;I saw a link earlier to a post by &lt;a href="http://cgi.cse.unsw.edu.au/~dons/blog/2007/11/29#smoking"&gt;Don Stewart&lt;/a&gt; in that he asserts that Haskell is 40 times faster than python for a given algorithm.&lt;br /&gt;&lt;br /&gt;He then goes on to point out how easy it is to drop hints to the code to parallelize the code.&lt;br /&gt;&lt;br /&gt;While I won't go into any threading issues, the core point that it takes python almost thirty seconds to generate each Fibonacci number up to 36 is misplaced.&lt;br /&gt;&lt;br /&gt;Haskell is a language built around the idea of functions that can be assumed to be functionally transparent.  If you give them the same set of arguments they'll give the same answer.&lt;br /&gt;&lt;br /&gt;The upside to this is that the compiler writers can automatically memoize every function if they want to unless its been explicitly marked as being otherwise.&lt;br /&gt;&lt;br /&gt;Python, being based upon the idea of dynamic objects rather than that of transparent functions has no need for such optimizing. The developers time is better spent ensuring rapid dictionary insertion and deletion over transparent function memoization.&lt;br /&gt;&lt;br /&gt;But, as he doesn't seem to mind having to annotate concurrancy, I'm sure that annotating transparency is just as fair.&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;#!/usr/bin/python&lt;br /&gt;&lt;br /&gt;# Don's original python&lt;br /&gt;# def fib(n):&lt;br /&gt;#     if n == 0 or n == 1:&lt;br /&gt;#          return n&lt;br /&gt;#     else:&lt;br /&gt;#         return fib(n-1) + fib(n-2)&lt;br /&gt;#     &lt;br /&gt;# for i in range(36):&lt;br /&gt;#     print "n=%d =&gt; %d" % (i, fib(i))&lt;br /&gt;&lt;br /&gt;def memoize ( func ) :&lt;br /&gt;    """ In haskell mutability must be handled explicitly.&lt;br /&gt;    Only fair that Python do the same to transparent functionality&lt;br /&gt;    """&lt;br /&gt;    remembered = {}&lt;br /&gt;    def memoized ( *args ) :&lt;br /&gt;        if args in remembered :&lt;br /&gt;            return remembered[ args ]&lt;br /&gt;        else :&lt;br /&gt;            new = func( *args )&lt;br /&gt;            remembered[ args ] = new&lt;br /&gt;            return new&lt;br /&gt;    return memoized&lt;br /&gt;&lt;br /&gt;@memoize&lt;br /&gt;def fib(n):&lt;br /&gt;    if n == 0 or n == 1:&lt;br /&gt;         return n&lt;br /&gt;    else:&lt;br /&gt;        return fib(n-1) + fib(n-2)&lt;br /&gt;    &lt;br /&gt;for i in range(36):&lt;br /&gt;    print "n=%d =&gt; %d" % (i, fib(i))&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;The above code runs at a comfortable&lt;br /&gt;&lt;pre&gt;  real    0m0.020s&lt;br /&gt;  user    0m0.012s&lt;br /&gt;  sys     0m0.008s&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;which compared to his times&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;  Ruby (1.8.5)   64.26s&lt;br /&gt;  Python (2.4) 25.16s&lt;br /&gt;  Haskell (GHC 6.8)  0.78s&lt;br /&gt;  Parallel Haskell (GHC 6.8)  0.67s&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;fairs reasonably.&lt;br /&gt;&lt;br /&gt;Since Python allows mutable variables a looping implementation would have been more idiomatic python.  Even better to use a generator.  Then we can start output before the sequence is complete.&lt;br /&gt;&lt;br /&gt;All his benchmark really showed was that memoization on transparent functions is really great.  Thanks guy.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3528393554015655906-7387822383411304912?l=michaelspeer.knome.net' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://michaelspeer.knome.net/feeds/7387822383411304912/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3528393554015655906&amp;postID=7387822383411304912' title='11 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3528393554015655906/posts/default/7387822383411304912'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3528393554015655906/posts/default/7387822383411304912'/><link rel='alternate' type='text/html' href='http://michaelspeer.knome.net/2007/11/dons-benchmark.html' title='Don&apos;s benchmark.'/><author><name>Michael Speer</name><uri>http://www.blogger.com/profile/05288923100813819592</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>11</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3528393554015655906.post-2801905796351932609</id><published>2007-11-13T06:40:00.000-08:00</published><updated>2007-11-13T06:49:58.951-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='query languages'/><title type='text'>Random notes on query languages.</title><content type='html'>I started to include this in my post concerning my new language, but it relly isn't very helpful to the article.  I want to keep the information around for myself later. Not all of these are generic information query languages, but all are examples of languages designed to ask for something. &lt;br /&gt;&lt;br /&gt;QL :&lt;br /&gt;Compiles to SQL.  A combination of Datalog and SQL. Looks like it has a form of list comprehension.  That could be useful.&lt;br /&gt;&lt;br /&gt;Common Query Language :&lt;br /&gt;This seems to be an attempt to standardize what thed user has to type into the innumerably implemented searches online and off.  Simplicity is always nice to consider.&lt;br /&gt;&lt;br /&gt;D : &lt;br /&gt;Apparently this was a truly relational language.  Google tells me concerning the site pointed at for it from wikipedia, "This site may harm your computer".  I think this one best left alone.&lt;br /&gt;&lt;br /&gt;Data Definition Language :&lt;br /&gt;This appears to be a fuzzy logic belief system buried in a syntax molded to look like SQL.&lt;br /&gt;&lt;br /&gt;Datalog :&lt;br /&gt;Logic programming. ancestor(X,Y) := Parent(X,Y).  The ancestor of X is Y if the Parent of X is Y.  Could be useful.  Looks rather constraining in what it could do easily.&lt;br /&gt;&lt;br /&gt;ERROL :&lt;br /&gt;This just led me to read about entity-relationship models, finally giving name to a project I created out of frustration with data manipulation about a year ago.  This is a great system for data, this particular language is a natural language version interface.  Better for people that machines to query with.&lt;br /&gt;&lt;br /&gt;ISBL :&lt;br /&gt;I found references to this but could not locate an example of its syntax.&lt;br /&gt;&lt;br /&gt;LDAP :&lt;br /&gt;This isn't a query language I hear you saying.  Well, you're right.  But I'm looking at data query, and this does query data, albeit not relational.  The manner LDAP uses to filter datasets is nice.  Very combinable, readable and easily created programatically.  On being nice, forcing everything into a strict hierarchy is not.&lt;br /&gt;&lt;br /&gt;MQL : &lt;br /&gt;Not so much a language as a search API for finding structures and substructures in molecular simulations.&lt;br /&gt;&lt;br /&gt;MDX : MultiDimensional eXpressions&lt;br /&gt;This appears to have syntax borrowed from SQL with the addition of prepartitioned data and some niceties in reaching it.  There are some interesting ideas at work here, I will have to look into them in the future.&lt;br /&gt;&lt;br /&gt;OQL : &lt;br /&gt;The binding of SQL tables and their associated columns to devices and their exposed attributes.  The interesting difference is that instead of just returning scalars it can also return objects that can be traversed using dot-member syntax.  The aggregation method differs from standard SQL in dividing items into groups that can then be referenced by values and subqueries using the keyword partition.&lt;br /&gt;&lt;br /&gt;OPath :&lt;br /&gt;A combination of API and string segment arguments that allow for locating items in the WinFS store.  Apparently there will be a natural language to OPath translator.  It looks like you acquire a query-object into a potion of the WinFS store programatically and then pass it a string specifying the boolean operations to indicate which items from the current partition of the store are desired.&lt;br /&gt;&lt;br /&gt;QUEL :&lt;br /&gt;SQL with a more complex from statement involving manipulating ranges and dropping the select keyword.&lt;br /&gt;&lt;br /&gt;SPARQL :&lt;br /&gt;This is a language designed for querying information found in RDF files.  I applaud them for not merely repurposing the syntax of SQL to the task.  It seems that many think this is what creating a query language means.  The interesting part of the query consists of the where clause.  Apparently the items returned are generated by specifying a series of interconnected assertions that whenever met return a clump of information consisting of all the specified variables that where intermixed with the assertions being bound to the values that allowed the assertions to match.  It looks neat, but it bothered me that just glancing at it the query did not produce a mental model of how it operated.  Maybe it was just me.&lt;br /&gt;&lt;br /&gt;SQL :&lt;br /&gt;The current cream of the crop in relational data manipulation, SQL has been allowing us to keep an external dataset and query and alter it in standard ways for more than thirty years.  Of course the primary part that seems to be standard is mostly in the fashion that queries are marked up and some general agreement on what terms to use to indicate the various ways of stitching together tables of data.  In general anything so clever as to be particularly interesting or useful is custom to only a given database and the few that might choose to copy them.  For a standard language, the implementation specific libraries do much to unstandardize it.  The primary limitation of the language in my experience is that anything you want to do must be expressed through the metaphor of stitching together tables of information and of course returning a single table of information resulting from this.  Many clever hacks are used to bypass these limitations in practice of coarse, but trying to do anything other than simple column binds can become very confusing to look at very quickly.&lt;br /&gt;&lt;br /&gt;WQL : &lt;br /&gt;Microsofts query only SQL for WMI.  A hidden SQL server on every server and workstation of a domain that only understands select queries.  Useful.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3528393554015655906-2801905796351932609?l=michaelspeer.knome.net' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://michaelspeer.knome.net/feeds/2801905796351932609/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3528393554015655906&amp;postID=2801905796351932609' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3528393554015655906/posts/default/2801905796351932609'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3528393554015655906/posts/default/2801905796351932609'/><link rel='alternate' type='text/html' href='http://michaelspeer.knome.net/2007/11/random-notes-on-query-languages.html' title='Random notes on query languages.'/><author><name>Michael Speer</name><uri>http://www.blogger.com/profile/05288923100813819592</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3528393554015655906.post-55576682420883610</id><published>2007-11-11T13:53:00.000-08:00</published><updated>2007-11-13T08:16:58.990-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='query languages'/><title type='text'>Restructuring query languages.</title><content type='html'>SQL is the de facto language for the manipulation of relational data.  Designed in the early 70s and standardized in the mid-80s, SQL has dozens of commercial and open-source servers written for it and enjoys being a language with almost no major competitors in its field.&lt;br /&gt;&lt;br /&gt;It seems that all major relational databases are basically interpreters for various flavors of SQL.  I've looked around from time to time for a decent alternative to SQL.&lt;br /&gt;&lt;br /&gt;I haven't found anything interesting that was particularly useful out there.&lt;br /&gt;&lt;br /&gt;I'm going to present for comments some of the ideas I have come up with to fix this.  As you read below, know there is not yet an actual interpreter for my language.  There will be, but I am not yet finished with it.  I am writing an interpreter based on sqlite3 using lemon to create the parser and targeting the sqlite3 virtual machine which I will extend very minimally to handle some functionality I want that doesn't quite fit into their model.&lt;br /&gt;&lt;br /&gt;My language is very much based on the entity-relationship model.  SQL is often used to model this approach.  It can be clumsy to fit into the tables of data paradigm, but it can.&lt;br /&gt;&lt;br /&gt;The easiest way to explain my language of course will be to show a brief interpreter session.  So here it goes.&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;Interpreter for AsYetUnnamedProgrammingLanguage by MichaelSpeer v0.-1&lt;br /&gt;&lt;br /&gt;&amp;gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Let's begin by creating an entity to manipulate.  We'll use products, something nicely generic that are probably manipulated by everyone reading this at some point.&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; declare &lt;br /&gt;.   Products&lt;br /&gt;.     :name PHRASE&lt;br /&gt;.     :price MONEY'USD&lt;br /&gt;. ;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;There you are.  We have a database containing products, each of which has a member specifying the name ( PHRASE : a string with no newlines ) and a price ( MONEY : a numeric datatype insured to two decimal places without fear of truncation errors, rounding errors, poor approximations or anything else silly like that.  The tick after the MONEY datatype is something for which I do not yet have a permanent terminology, though I like it.  I'll call it a shadow type.  It is a small comment that must be present on anything that interacts with the particular datatype.  So &lt;span style="font-family:monospace"&gt;3'USD + :price&lt;/span&gt; is good, whereas &lt;span style="font-family:monospace"&gt;3 + :price&lt;/span&gt; is an error.  This is just an enforced helper to prevent &lt;span style="font-family:monospace"&gt;3'YEN + 3'USD&lt;/span&gt; from not being an error and giving a six.&lt;br /&gt;&lt;br /&gt;The syntax for creating some instances of the Products entity is as follows :&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; create Products [ :name , :price ] =&lt;br /&gt;.   [ "DVD Player" , 20'USD   ] ,&lt;br /&gt;.   [ "VCR"        , 15.5'USD ] ,&lt;br /&gt;.   [ "DVD"        , 7'USD    ] ,&lt;br /&gt;.   [ "VCR Tape"   , 4'USD    ]&lt;br /&gt;. ;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;So now we have Products in the system.&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; Products ;&lt;br /&gt;[ 1 , 2 , 3 , 4 ]&lt;br /&gt;&lt;br /&gt;&amp;gt; Products:id ;&lt;br /&gt;[ 1 , 2 , 3 , 4 ]&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;These are selectors.  Selectors select what information from where will be available for call-back to the application.&lt;br /&gt;&lt;br /&gt;As you can see, if you do not give an expressions_list or end the selector with a valuating member, then it will default to the id of whatever the current object is.&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; Products[ :id , :name ] ;&lt;br /&gt;[ 1 , "DVD Player" ]&lt;br /&gt;[ 2 , "VCR"        ]&lt;br /&gt;[ 3 , "DVD"        ]&lt;br /&gt;[ 4 , "VCR Tape"   ]&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;By giving the expressions_list, the values in the selected attributes are returned to the caller.&lt;br /&gt;&lt;br /&gt;So this doesn't allow anything interesting yet.  The path to usefulness starts with filters.&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; Products( :price &amp;lt; 5'USD )[ :name ];&lt;br /&gt;[ "VCR Tape" ]&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;I mentioned earlier the difference between putting an expressions_list at the end of the selector vs placing an object_member there.&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; Products( :price &amp;gt; 10'USD ):name ;&lt;br /&gt;[ "DVD Player" , "VCR" ]&lt;br /&gt;&lt;br /&gt;&amp;gt; Products( :price &amp;gt; 10'USD )[ :name ] ;&lt;br /&gt;[ "DVD Player" ]&lt;br /&gt;[ "VCR"        ]&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Of course all of this would be a huge pain if there was no way to sort them.  Sorting expressions can be interpolated directly into the filters.&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; Products( asc :name ):name ;&lt;br /&gt;[ "DVD" , "DVD Player" , "VCR" , "VCR Tape" ]&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;So far we've looked at how to manipulate a single object type.  Now let's add a new one to put these products into some categories.&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; declare&lt;br /&gt;.   Categories&lt;br /&gt;.     :name PHRASE&lt;br /&gt;.     :sub-products -&amp;gt; Products:category single&lt;br /&gt;. ;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;The declaration of the Categories object-type and the :name member looks the same as last time.  :sub-products is the new item.  This line is declaring a relationship.  The `Products:category single` section indicates that the relationship will interconnect Categories ( the current item ) and Products.  The relationship can be referred to from a set of Categories using the :sub-products member and from a set of Products using the :category member.  Furthermore, the caveat `single` means that any instance of a Products object may only relate to a single Categories set object.  The caveat can appear on either or both sides.&lt;br /&gt;&lt;br /&gt;Setting relationships is accomplished by placing a selector in the respective slot of the expressions_list being used to set the members.  id est &lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; create Categories [ :name , :sub-products ] =&lt;br /&gt;.   [ "Video Players" , Products( :name in [ "DVD Player" , "VCR" ] ) ]&lt;br /&gt;. ;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Here is part of what I like about this syntax.  The `:name` term above is a relative selector.  When a filter has a selector that isn't grounded it is assumed to be relative to the current object in the filters.  So the `:name` above is relative to Products.  A filter of that nature can be read as objects from Products where Products:name is "DVD Player" or Products:name is "VCR".  There isn't a limit to how many filters are applied either.&lt;br /&gt;&lt;br /&gt;If you specify a relationship member as part of the filters then at that point a unique set of items available at that point are selected.&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; Products:category:name ;&lt;br /&gt;[ "Video Players" ]&lt;br /&gt;&lt;br /&gt;&amp;gt; Products[ :category:name ] ;&lt;br /&gt;[]&lt;br /&gt;[ "Video Players" ]&lt;br /&gt;[]&lt;br /&gt;[ "Video Players" ]&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Using the expression_list causes a separate return for each item whereas selecting the attribute directly takes the unique set of values from that attribute.&lt;br /&gt;&lt;br /&gt;The reason that this happens is because in the first item the :category filter is selecting the set of all unique categories referred to by the set of current Products.  The second selector is selecting each of the current set of Products ( all of them since no filter is applied ) and then for each of them returning a list of the :name attributes on the Categories in the set returned by the :category entry.&lt;br /&gt;&lt;br /&gt;One of the things that will be immediately noticed is that I intend for my language to be able to represent and send out complex structures instead of just flat tables.   The instruction to process a sub-list is the alteration to the virtual machine that will need to be made.  I hope to link the language to python first planning to take advantage of the bindings that already exist in the builtin sqlite3 module.&lt;br /&gt;&lt;br /&gt;Here are some more useful examples.&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; Products( :category ):name ; -- returns products that have a category&lt;br /&gt;[ "DVD Player" , "VCR" ]&lt;br /&gt;&lt;br /&gt;&amp;gt; Products( not :category ):name ; -- returns products lacking a category&lt;br /&gt;[ "DVD" , "VCR Tape" ]&lt;br /&gt;&lt;br /&gt;&amp;gt; Products( :id != 3 &amp;amp;&amp;amp; :id != 4 )[ :name ] ;&lt;br /&gt;[ "DVD Player" ]&lt;br /&gt;[ "VCR" ]&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;I will add the other two products to a category to demonstrate some more complex queries.&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; create Categories [ :name , :sub-products ] =&lt;br /&gt;.   [ "Blank Media" , Products( not :category ) ]&lt;br /&gt;. ;&lt;br /&gt;&lt;br /&gt;&amp;gt; Products( :category( :sub-products( :name == "DVD" ) ) ):name ;&lt;br /&gt;[ "DVD" , "VCR Tape" ]&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;That second query is returning the names of Products that have Categories that have products in them named "DVD".  The same query is entirely possible in SQL.  Just far more complex.&lt;br /&gt;&lt;br /&gt;This language doesn't do a whole lot that cannot be accomplished with SQL or other information engines.  It does it easier.  This, to me, is better.&lt;br /&gt;&lt;br /&gt;Before I conclude we'll raise the price of media by a dollar per item and add another entry to Categories.&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; alter Products( :category( :name == "Media" ) )[ :price ] = [ :price += 1'USD ] ;&lt;br /&gt;&lt;br /&gt;&amp;gt; declare Categories:recommended -&amp;gt; :sub-products:recommended-by ;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;This declaration creates an attribute that will be allowed to point at any number of items that are found in the same items :sub-products list.&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; alter Categories( :name == "Media" )[ :recommended ] = [ :sub-products( :name == "DVD" ) ] ;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;There will be a caveat called `automatic` that will allow adding items not already in :sub-products to :recommended wherein the added item would also be added to :sub-products and removing the recommended item from :sub-products would remove it from :recommended.  Otherwise trying these things will result in an error.&lt;br /&gt;&lt;br /&gt;The + sign will be a union operator against sets and the - sign will give all of the first set except those in the second set.  As well union( ... ) and except( ... ) functions will likely exist.&lt;br /&gt;&lt;br /&gt;This should be enough information to gather thoughts from others on.  So, please leave any thoughts you have so I can roll them into the work I am doing.&lt;br /&gt;&lt;br /&gt;The current Categories entity type could have been declared initially as :&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; declare&lt;br /&gt;.   Categories&lt;br /&gt;.     :name PHRASE&lt;br /&gt;.     :sub-products -&gt; Products:category single&lt;br /&gt;.        :recommended -&gt; :sub-products:recommend-by&lt;br /&gt;. ;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;That should be it for now.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3528393554015655906-55576682420883610?l=michaelspeer.knome.net' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://michaelspeer.knome.net/feeds/55576682420883610/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3528393554015655906&amp;postID=55576682420883610' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3528393554015655906/posts/default/55576682420883610'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3528393554015655906/posts/default/55576682420883610'/><link rel='alternate' type='text/html' href='http://michaelspeer.knome.net/2007/11/restructuring-query-languages.html' title='Restructuring query languages.'/><author><name>Michael Speer</name><uri>http://www.blogger.com/profile/05288923100813819592</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3528393554015655906.post-4383746273123457057</id><published>2007-11-01T13:08:00.000-07:00</published><updated>2007-11-16T08:50:28.541-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='perl'/><title type='text'>Multi-Argument map For perl</title><content type='html'>I've been playing with perl recently.  I avoiding the language for a long time because I hit python first and foolishly believed the hoards when they decried everything about perl.  It can be cryptic, but perl is far from being without merit.&lt;br /&gt;&lt;br /&gt;One of the things that kept bugging me when programming in perl was that there was the lack of a multi-argument grabbing map.  I'd want to chain against the values of a hash and found annoyingly that the syntax was unhelpful.&lt;br /&gt;&lt;br /&gt;A hash in list context dumps its keys and values into a zipped list.&lt;br /&gt;&lt;br /&gt;{ 'k1' : 'v1' , 'k2' : 'v2 , 'k3' : 'v3' } -&gt; [ 'k1' , 'v1' , 'k2' , 'v2' , 'k3' , 'v3' ]&lt;br /&gt;&lt;br /&gt;This pretty much screws any sane sort of action that can be taken against the data.  So instead you have to use `keys' to get a list of just keys in a `for' statement and then use the keys to access members of the hash.&lt;br /&gt;&lt;br /&gt;I didn't really appreciate this restriction, as it meant I couldn't easily massage data with an anonymous function by mapping a sub block across it.&lt;br /&gt;&lt;br /&gt;Thus was born mapper.&lt;br /&gt;&lt;br /&gt;mapper takes three arguments.  well, this being perl it takes one huge list of arguments.  But it treats them like (defun mapper ( n f &amp;amp;rest L ) ... ) where&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;br /&gt;&lt;li&gt;n : the number of elements to peel off of the list and hand to the sub function&lt;/li&gt;&lt;br /&gt;&lt;li&gt;f : a sub function&lt;/li&gt;&lt;br /&gt;&lt;li&gt;L : the list of items&lt;/li&gt;&lt;br /&gt;&lt;/ul&gt;&lt;br /&gt;&lt;br /&gt;It isn't long.&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;sub mapper&lt;br /&gt;{&lt;br /&gt; # n - number of items to pass at a time to the subroutine&lt;br /&gt; # s - the subroutine&lt;br /&gt; # r - the list of things to chunk and pass to the subroutine&lt;br /&gt; # x - number of items to chunk&lt;br /&gt; # c - index&lt;br /&gt; # l - mapped values&lt;br /&gt;&lt;br /&gt; local @_ = @_ ;&lt;br /&gt;&lt;br /&gt; my ( $n , $s , @r ) = @_ ;&lt;br /&gt; my $x = @r ;&lt;br /&gt; my $c = 0 ;&lt;br /&gt; my @l ;&lt;br /&gt;&lt;br /&gt; while( $x &gt; $c )&lt;br /&gt; {&lt;br /&gt;     push @l , &amp;amp;$s( @r[ $c .. ( ( $c + $n ) - 1 &gt;= $x ? $x - 1 : ( $c + $n ) - 1 ) ] ) ;&lt;br /&gt;     $c += $n ;&lt;br /&gt; }&lt;br /&gt;&lt;br /&gt; return @l ;&lt;br /&gt;&lt;br /&gt;}&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;From this I derived a quick transposition function just to play with it ( demonstrating its use )&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;sub transpose&lt;br /&gt;{&lt;br /&gt;   local @_ = @_ ;&lt;br /&gt;   mapper 2 , sub { ( $_[1] , $_[0] ) } , @_ ;&lt;br /&gt;}&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;The function is rather obfuscatory when nested ...&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;print join ' ' , mapper 3 , sub { '&lt;' . ( join '' , mapper 1 , sub { '[' . $_[0] . ']' } , @_ ) . '&gt;' } , qw( the quick brown fox jumped over the lazy brown dog ) ;&lt;br /&gt;print "\n" ;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;You'll see from the last that mapper when presented with fewer than the expected number of items at the end of a list just passes along what it has, which is likely unhelpful for debugging, but helpful for applying n-arary functions across lists.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3528393554015655906-4383746273123457057?l=michaelspeer.knome.net' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://michaelspeer.knome.net/feeds/4383746273123457057/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3528393554015655906&amp;postID=4383746273123457057' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3528393554015655906/posts/default/4383746273123457057'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3528393554015655906/posts/default/4383746273123457057'/><link rel='alternate' type='text/html' href='http://michaelspeer.knome.net/2007/11/multi-argument-map-for-perl.html' title='Multi-Argument map For perl'/><author><name>Michael Speer</name><uri>http://www.blogger.com/profile/05288923100813819592</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3528393554015655906.post-3710035589153417568</id><published>2007-07-26T07:53:00.000-07:00</published><updated>2007-07-26T13:48:23.627-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='haskell'/><category scheme='http://www.blogger.com/atom/ns#' term='irrefutable patterns'/><title type='text'>The impossible is possible : Irrefutable patterns.</title><content type='html'>Earlier this year I wrote about a regular expression code engine I was writing in Haskell that didn't work because of a recursive call problem. I was fairly sure the code could work if I had a way to inform the compiler of the needed laziness, a way to allow a number of different functions that each depended on the output of the each other as their inputs to execute, but believed that the way Haskell worked would permanently prevent this.  Earlier today, &lt;a href="http://programming.reddit.com/user/linuxer/"&gt;linuxer&lt;/a&gt; on &lt;a href="http://programming.reddit.com/"&gt;Reddit&lt;/a&gt; linked to an article concerning &lt;a href="http://therning.org/magnus/archives/311" rel="bookmark" title="Permanent Link: Irrefutable patterns for the ignorant"&gt;Irrefutable patterns&lt;/a&gt;, and I had what felt like an epiphany.&lt;br /&gt;&lt;br /&gt;Remembering the code, I went back and corrected it only by adding the needed tilde to specify that the patterns should be irrefutable in the oexn function definitions.  A few bugs fixes later, and the code works.&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;-- rx : a regular expression engine for haskell&lt;br /&gt;--    : (c) 2007 by michael speer , released GPL v2.0 license or later&lt;br /&gt;&lt;br /&gt;import Maybe&lt;br /&gt;&lt;br /&gt;-- get-function : gather a nodes worth of pattern from a pattern-string&lt;br /&gt;--                returns the segment and the remaining pattern in a tuple&lt;br /&gt;gf :: [Char] -&gt; ( [Char] , [Char] )&lt;br /&gt;gf ('(':ps)     = let ( segment , remaining ) = gpf [] ps&lt;br /&gt;                 in ( '(':segment , remaining )&lt;br /&gt;    where&lt;br /&gt;      -- get-parenthesis'd fragment&lt;br /&gt;      gpf :: [Char] -&gt; [Char] -&gt; ( [Char] , [Char] )&lt;br /&gt;      gpf segment []       = ( segment , [] )&lt;br /&gt;      gpf segment (')':ps) = ( segment++")" , ps )&lt;br /&gt;      gpf segment ('(':ps) = let ( subsegment , remaining ) = gpf [] ps&lt;br /&gt;                             in gpf ( segment ++ ('(':subsegment) ) remaining&lt;br /&gt;      gpf segment (p:ps)   = gpf (segment ++ p:[]) ps&lt;br /&gt;gf ('\\':p:ps)  = ( '\\':p:[] , ps )&lt;br /&gt;gf ('?':'?':ps) = ( "??" , ps )&lt;br /&gt;gf ('*':'?':ps) = ( "*?" , ps )&lt;br /&gt;gf (p:ps)       = ( p:[] , ps )&lt;br /&gt;&lt;br /&gt;-- list-functions : creates a list of function-pattern-segments from a regular expression&lt;br /&gt;lfn :: [Char] -&gt; [[Char]]&lt;br /&gt;lfn []  = []&lt;br /&gt;lfn pps = let ( fn , remaining ) = gf pps&lt;br /&gt;          in fn : lfn remaining&lt;br /&gt;&lt;br /&gt;-- postfix-to-prefix : moves all repitition characters to before their targets&lt;br /&gt;--                     in addition running the pattern through this function&lt;br /&gt;--                     ensures that the pattern is legal.&lt;br /&gt;p2p :: [[Char]] -&gt; [[Char]]&lt;br /&gt;p2p []                 = []&lt;br /&gt;p2p (('?':_):_)        = error "Misplaced repitition operator `?' "&lt;br /&gt;p2p (('*':_):_)        = error "Misplaced repitition operator `*' "&lt;br /&gt;p2p ("\\":_)           = error "Escape sequence targetting nothing"&lt;br /&gt;p2p ((')':_):[])       = ")" : []                                          -- allows for a close-parum at end of pattern &lt;br /&gt;                                                                           -- later processing must test for this and throw error if it is unexpected ( eg top-level&lt;br /&gt;p2p ((')':_):_)        = error "Unmatched terminating group marker , `)' " -- catches mid pattern erroneous close-para&lt;br /&gt;p2p ("|":('?':_):_)    = error "Repitition operator `?' applied to procedural alternation operator `|' "&lt;br /&gt;p2p (s1:s2@("??"):ps)  = ('?':'?': s1) : p2p ps&lt;br /&gt;p2p (s1:s2@("?"):ps)   = ('?': s1) : p2p ps&lt;br /&gt;p2p (s1:s2@("*?"):ps)  = ('*':'?': s1) : p2p ps&lt;br /&gt;p2p (s1:s2@("*"):ps)   = ('*': s1) : p2p ps&lt;br /&gt;p2p (s1:ps)            = s1 : p2p ps&lt;br /&gt;&lt;br /&gt;-- group-split : break the contents of a parenthesized group into peices&lt;br /&gt;--               build a current list of segments until | is encountered&lt;br /&gt;--               then append the result of calling this again on the &lt;br /&gt;--               remainder&lt;br /&gt;data PatternDepth = Top | Sub -- Pattern depth&lt;br /&gt;gs :: PatternDepth -&gt; [ [ Char ] ] -&gt; [ [ Char ] ] -&gt; [ [ [ Char ] ] ]&lt;br /&gt;gs Top segments (")":[])   = error "Unmatched terminating group marker `)' at end of pattern"&lt;br /&gt;gs Sub segments (")":[])   = gs Sub segments []&lt;br /&gt;gs _   segments []         = if segments /= [] then&lt;br /&gt;                                         segments : []&lt;br /&gt;                                     else&lt;br /&gt;                                         [""] : []&lt;br /&gt;gs pd  segments ("|":rest) = segments : gs pd [] rest&lt;br /&gt;gs pd  segments (s:rest)   = gs pd (segments++[s]) rest &lt;br /&gt;&lt;br /&gt;-- many to single nodes&lt;br /&gt;m2s :: [ [ ( Maybe Char , Integer ) ] ] -&gt; Integer -&gt; ( ( Maybe Char , Integer ) , [ [ ( Maybe Char , Integer ) ] ] , Integer )&lt;br /&gt;m2s (g@(ni:[]):gs) n = ( ni                  , gs  , n   ) -- if the first list is one long, pop it off&lt;br /&gt;m2s ggs            n = ( ( Nothing , (n+1) ) , ggs , n+1 ) -- otherwise add a new member at the beginning to act as a pointer to it&lt;br /&gt;&lt;br /&gt;-- or-extracted-nodes&lt;br /&gt;oexn :: [ [ [ Char ] ] ] -&gt; Integer -&gt; Integer -&gt; ( [ ( Maybe Char , Integer ) ] , [ [ ( Maybe Char , Integer ) ] ] , Integer )&lt;br /&gt;oexn (g:[])    n l = let ( ~( ns , x ) ,&lt;br /&gt;                           ~( ni' , ns' , x' ) ) = ( aexn g x' l ,&lt;br /&gt;                                                     m2s ns n )&lt;br /&gt;                     in ( [ ni' ] , ns' , x )&lt;br /&gt;&lt;br /&gt;oexn (g:g':[]) n l = let ( ~( ns    , x            ) ,&lt;br /&gt;                           ~( ni'   , ns'   , x'   ) ,&lt;br /&gt;                           ~( ns''  , x''          ) ,&lt;br /&gt;                           ~( ni''' , ns''' , x''' ) ) = ( aexn g x' l ,&lt;br /&gt;                                                           m2s ns n ,&lt;br /&gt;                                                           aexn g' x''' l ,&lt;br /&gt;                                                           m2s ns'' x )&lt;br /&gt;                     in&lt;br /&gt;                       ( ni' : ni''' : [] , ns' ++ ns''' , (x''-1) )&lt;br /&gt;&lt;br /&gt;oexn (g:gs)    n l = let ( ~( ns   , x          ) ,&lt;br /&gt;                           ~( ni'  , ns'  , x'  ) ,&lt;br /&gt;                           ~( ni'' , ns'' , x'' ) ) = ( aexn g x' l ,&lt;br /&gt;                                                        m2s ns n ,&lt;br /&gt;                                                        oexn gs x l )&lt;br /&gt;                     in&lt;br /&gt;                       ( ni' : ni'' , ns' ++ ns'' , (x''-1) )&lt;br /&gt;&lt;br /&gt;-- and-extracted-nodes&lt;br /&gt;aexn :: [ [ Char ] ] -&gt; Integer -&gt; Integer -&gt; ( [ [ ( Maybe Char , Integer ) ] ] , Integer )&lt;br /&gt;aexn (b:[]) n l = exn b n l &lt;br /&gt;aexn (b:bs) n l = let ( ~( ns  , x  ) ,&lt;br /&gt;                        ~( ns' , x' ) ) = ( exn b n x ,&lt;br /&gt;                                            aexn bs x l )&lt;br /&gt;                  in&lt;br /&gt;                    ( ns ++ ns' , x' )&lt;br /&gt;&lt;br /&gt;-- subgroup&lt;br /&gt;exn :: [ Char ] -&gt; Integer -&gt; Integer -&gt; ( [ [ ( Maybe Char , Integer ) ] ] , Integer )&lt;br /&gt;exn ('(':cs) n l = let ~( ni , ns , x ) = oexn (gs Sub [] $ p2p $ lfn cs ) n l&lt;br /&gt;                   in&lt;br /&gt;                     ( [ ni ] ++ ns , x )&lt;br /&gt;&lt;br /&gt;-- non-greedy&lt;br /&gt;exn ('?':'?':cs) n l = let ~( ni , ns , x ) = oexn (gs Top [] $ p2p $ lfn cs ) (n+1) x&lt;br /&gt;                   in&lt;br /&gt;                     ( [ [ ( Nothing , l ) , ( Nothing , (n+1) ) ] ] ++ [ ni ] ++ ns , x )&lt;br /&gt;&lt;br /&gt;-- greedy&lt;br /&gt;exn ('?':cs) n l = let ~( ni , ns , x ) = oexn (gs Top [] $ p2p $ lfn cs ) (n+1) x&lt;br /&gt;                   in&lt;br /&gt;                     ( [ [ ( Nothing , (n+1) ) , ( Nothing , l ) ] ] ++ [ ni ] ++ ns , x )&lt;br /&gt;&lt;br /&gt;-- non-greedy&lt;br /&gt;exn ('*':'?':cs) n l = let ~( ni , ns , x ) = oexn (gs Top [] $ p2p $ lfn cs ) (n+1) n&lt;br /&gt;                   in&lt;br /&gt;                     ( [ [ ( Nothing , l ) , ( Nothing , (n+1) ) ] ] ++ [ ni ] ++ ns , x )&lt;br /&gt;&lt;br /&gt;-- greedy&lt;br /&gt;exn ('*':cs) n l = let ~( ni , ns , x ) = oexn (gs Top [] $ p2p $ lfn cs ) (n+1) n&lt;br /&gt;                   in&lt;br /&gt;                     ( [ [ ( Nothing , (n+1) ) , ( Nothing , l ) ] ] ++ [ ni ] ++ ns , x )&lt;br /&gt;&lt;br /&gt;exn (c:_)    n l = ( [ [ ( Just c , l ) ] ] , (n+1) )&lt;br /&gt;exn []       n l = ( [ [ ( Nothing , l ) ] ] , (n+1) )&lt;br /&gt;&lt;br /&gt;-- compile a regular expression into a DFA array&lt;br /&gt;rx_compile pps = let ~( ni , ns , x ) = oexn (gs Top [] $ p2p $ lfn pps ) 1 x&lt;br /&gt;                 in &lt;br /&gt;                   if ni == [] &lt;br /&gt;                   then&lt;br /&gt;                       ( ns , x )&lt;br /&gt;                   else&lt;br /&gt;                       ( ni : ns , x )  &lt;br /&gt;&lt;br /&gt;main = do&lt;br /&gt;  -- print $ test                      &lt;br /&gt;  -- print $ lfn test                  &lt;br /&gt;  -- print $ p2p $ lfn test&lt;br /&gt;  print $ rx_compile "ab?c" -- arrayed !&lt;br /&gt;  print $ rx_compile "()*" -- if you do something stupid, the system will not protect you.&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;What this will currently do is to convert a string like &lt;code&gt;"ab?c"&lt;/code&gt; into a list containing lists each item of which specify where to go next in matching.  For example, the given pattern would end up as&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;  ( [  [ (Just 'a',2)                ] ,&lt;br /&gt;       [ (Nothing ,3) , (Nothing,4)  ] ,&lt;br /&gt;       [ (Just 'b',4)                ] ,&lt;br /&gt;       [ (Just 'c',5)                ] &lt;br /&gt;    ] ,&lt;br /&gt;    5 &lt;br /&gt;   )&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;This list is returned in a tuple with a number representing one over the length of the list.  This is were the next item would go if the list were still being built.&lt;br /&gt;&lt;br /&gt;A parser that accepts a string and uses this structure as a ruleset in matching it should be trivial to make.&lt;br /&gt;&lt;br /&gt;Thanks for the link linuxer.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3528393554015655906-3710035589153417568?l=michaelspeer.knome.net' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://michaelspeer.knome.net/feeds/3710035589153417568/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3528393554015655906&amp;postID=3710035589153417568' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3528393554015655906/posts/default/3710035589153417568'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3528393554015655906/posts/default/3710035589153417568'/><link rel='alternate' type='text/html' href='http://michaelspeer.knome.net/2007/07/impossible-is-possible-irrefutable.html' title='The impossible is possible : Irrefutable patterns.'/><author><name>Michael Speer</name><uri>http://www.blogger.com/profile/05288923100813819592</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3528393554015655906.post-2132199981138325675</id><published>2007-06-11T11:02:00.000-07:00</published><updated>2007-11-16T08:51:09.476-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='haskell'/><title type='text'>The impossible is only possible sometimes.</title><content type='html'>I've been working at creating a regular expression engine in Haskell again and have come up with what feels like a fairly elegant solution.  Through a series of functions the following conversions happen :&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;  print $ test                        -- "a|b?c|d"&lt;br /&gt;  print $ lfn test                    -- ["a","|","b","?","c","|","d"]&lt;br /&gt;  print $ p2p $ lfn test              -- ["a","|","?b","c","|","d"]&lt;br /&gt;  print $ gs Top [] $ p2p $ lfn test  -- [["a"],["?b","c"],["d"]]&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;Now I will convert this last output into a series of rules for what to do at certain points of evaluation, using the information at &lt;a href="http://swtch.com/~rsc/regexp/"&gt;http://swtch.com/~rsc/regexp/&lt;/a&gt; as a guide to how good regular expressions work.&lt;br /&gt;&lt;br /&gt;I want to perform the following conversion :&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;[["a"],["?b","c"],["d"]] -&gt; ( [ [ (Just 'a', 4 ) , (Nothing , 2) , (Just 'd' , 4 ) ] ,&lt;br /&gt;                                [ (Just 'b', 3 ) , ( Nothing , 3 ) ] ,&lt;br /&gt;                                [ (Just 'c', 4 ) ]&lt;br /&gt;                                ]&lt;br /&gt;                              ,&lt;br /&gt;                              4 )&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;A parser for such a rule set should be simple enough, and once created I will use the Array module to make it go at a decent speed.&lt;br /&gt;&lt;br /&gt;I've coded up through the transformation into a ruleset, and have hit an interesting boundary.  When designing my solution I remembered an interesting article I had seen when initially starting to look at Haskell wherein the author was coding an assembler and passed the output in as one of the initial arguments.  The idea intrigued me.&lt;br /&gt;&lt;br /&gt;The algorithm I derived to make my transformation used this concept judiciously.  Unfortunately it does not work, though perhaps not for the reasons one might guess. &lt;br /&gt;&lt;br /&gt;Here is the code :&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&lt;br /&gt;-- rx : a regular expression engine for haskell&lt;br /&gt;--    : (c) 2007 by michael speer , released GPL v2.0 license or later&lt;br /&gt;&lt;br /&gt;import Maybe&lt;br /&gt;&lt;br /&gt;-- get-function : gather a nodes worth of pattern from a pattern-string&lt;br /&gt;--                returns the segment and the remaining pattern in a tuple&lt;br /&gt;gf :: [Char] -&gt; ( [Char] , [Char] )&lt;br /&gt;gf ('(':ps)     = let ( segment , remaining ) = gpf [] ps&lt;br /&gt;                 in ( '(':segment , remaining )&lt;br /&gt;    where&lt;br /&gt;      -- get-parenthesis'd fragment&lt;br /&gt;      gpf :: [Char] -&gt; [Char] -&gt; ( [Char] , [Char] )&lt;br /&gt;      gpf segment []       = ( segment , [] )&lt;br /&gt;      gpf segment (')':ps) = ( segment++")" , ps )&lt;br /&gt;      gpf segment ('(':ps) = let ( subsegment , remaining ) = gpf [] ps&lt;br /&gt;                             in gpf ( segment ++ ('(':subsegment) ) remaining&lt;br /&gt;      gpf segment (p:ps)   = gpf (segment ++ p:[]) ps&lt;br /&gt;gf ('\\':p:ps)  = ( '\\':p:[] , ps )&lt;br /&gt;gf ('?':'?':ps) = ( "??" , ps )&lt;br /&gt;gf ('*':'?':ps) = ( "*?" , ps )&lt;br /&gt;gf (p:ps)       = ( p:[] , ps )&lt;br /&gt;&lt;br /&gt;-- list-functions : creates a list of function-pattern-segments from a regular expression&lt;br /&gt;lfn :: [Char] -&gt; [[Char]]&lt;br /&gt;lfn []  = []&lt;br /&gt;lfn pps = let ( fn , remaining ) = gf pps&lt;br /&gt;          in fn : lfn remaining&lt;br /&gt;&lt;br /&gt;-- postfix-to-prefix : moves all repitition characters to before their targets&lt;br /&gt;--                     in addition running the pattern through this function&lt;br /&gt;--                     ensures that the pattern is legal.&lt;br /&gt;p2p :: [[Char]] -&gt; [[Char]]&lt;br /&gt;p2p []                 = []&lt;br /&gt;p2p (('?':_):_)        = error "Misplaced repitition operator `?' "&lt;br /&gt;p2p (('*':_):_)        = error "Misplaced repitition operator `*' "&lt;br /&gt;p2p ("\\":_)           = error "Escape sequence targetting nothing"&lt;br /&gt;p2p ((')':_):[])       = ")" : []                                          -- allows for a close-parum at end of pattern &lt;br /&gt;                                                                           -- later processing must test for this and throw error if it is unexpected ( eg top-level&lt;br /&gt;p2p ((')':_):_)        = error "Unmatched terminating group marker , `)' " -- catches mid pattern erroneous close-para&lt;br /&gt;p2p ("|":('?':_):_)    = error "Repitition operator `?' applied to procedural alternation operator `|' "&lt;br /&gt;p2p (s1:s2@("??"):ps)  = ('?':'?': s1) : p2p ps&lt;br /&gt;p2p (s1:s2@("?"):ps)   = ('?': s1) : p2p ps&lt;br /&gt;p2p (s1:s2@("*?"):ps)  = ('*':'?': s1) : p2p ps&lt;br /&gt;p2p (s1:s2@("*"):ps)   = ('*': s1) : p2p ps&lt;br /&gt;p2p (s1:ps)            = s1 : p2p ps&lt;br /&gt;&lt;br /&gt;-- group-split : break the contents of a parenthesized group into peices&lt;br /&gt;--               build a current list of segments until | is encountered&lt;br /&gt;--               then append the result of calling this again on the &lt;br /&gt;--               remainder&lt;br /&gt;data PatternDepth = Top | Sub -- Pattern depth&lt;br /&gt;gs :: PatternDepth -&gt; [ [ Char ] ] -&gt; [ [ Char ] ] -&gt; [ [ [ Char ] ] ]&lt;br /&gt;gs Top segments (")":[])   = error "Unmatched terminating group marker `)' at end of pattern"&lt;br /&gt;gs Sub segments (")":[])   = gs Sub segments []&lt;br /&gt;gs _   segments []         = if segments /= [] then&lt;br /&gt;                                         segments : []&lt;br /&gt;                                     else&lt;br /&gt;                                         [""] : []&lt;br /&gt;gs pd  segments ("|":rest) = segments : gs pd [] rest&lt;br /&gt;gs pd  segments (s:rest)   = gs pd (segments++[s]) rest &lt;br /&gt;&lt;br /&gt;-- many to single nodes&lt;br /&gt;m2s :: [ [ ( Maybe Char , Integer ) ] ] -&gt; Integer -&gt; ( ( Maybe Char , Integer ) , [ [ ( Maybe Char , Integer ) ] ] , Integer )&lt;br /&gt;m2s (g@(ni:[]):gs) n = ( ni                  , gs  , n   ) -- if first group is one long : decrease the count by one so that popping the top item off mess up the counting&lt;br /&gt;m2s ggs            n = ( ( Nothing , (n+1) ) , ggs , (n+1) ) -- otherwise add a new member at the beginning to act as a pointer to it&lt;br /&gt;&lt;br /&gt;-- or-extracted-nodes&lt;br /&gt;oexn :: [ [ [ Char ] ] ] -&gt; Integer -&gt; Integer -&gt; ( [ ( Maybe Char , Integer ) ] , [ [ ( Maybe Char , Integer ) ] ] , Integer )&lt;br /&gt;oexn (g:[])    n l = let ( ns , x ) = ( aexn g x l )&lt;br /&gt;                     in ( [] , ns , x )&lt;br /&gt;&lt;br /&gt;oexn (g:g':[]) n l = let ( ( ns    , x            ) ,&lt;br /&gt;                           ( ni'   , ns'   , x'   ) ,&lt;br /&gt;                           ( ns''  , x''          ) ,&lt;br /&gt;                           ( ni''' , ns''' , x''' ) ) = ( aexn g x' l ,&lt;br /&gt;                                                          m2s ns n ,&lt;br /&gt;                                                          aexn g' x''' l ,&lt;br /&gt;                                                          m2s ns'' x )&lt;br /&gt;                     in&lt;br /&gt;                       ( ni' : ni''' : [] , ns' ++ ns''' , x'' )&lt;br /&gt;&lt;br /&gt;oexn (g:gs)    n l = let ( ( ns   , x          ) ,&lt;br /&gt;                           ( ni'  , ns'  , x'  ) ,&lt;br /&gt;                           ( ni'' , ns'' , x'' ) ) = ( aexn g x' l ,&lt;br /&gt;                                                       m2s ns n ,&lt;br /&gt;                                                       oexn gs x l )&lt;br /&gt;                     in&lt;br /&gt;                       ( ni' : ni'' , ns' ++ ns'' , x'' )&lt;br /&gt;&lt;br /&gt;-- and-extracted-nodes&lt;br /&gt;aexn :: [ [ Char ] ] -&gt; Integer -&gt; Integer -&gt; ( [ [ ( Maybe Char , Integer ) ] ] , Integer )&lt;br /&gt;aexn (b:[]) n l = exn b n l &lt;br /&gt;aexn (b:bs) n l = let ( ( ns  , x  ) ,&lt;br /&gt;                        ( ns' , x' ) ) = ( exn b n x ,&lt;br /&gt;                                           aexn bs x l )&lt;br /&gt;                  in&lt;br /&gt;                    ( ns ++ ns' , x' )&lt;br /&gt;&lt;br /&gt;-- extract-node&lt;br /&gt;exn :: [ Char ] -&gt; Integer -&gt; Integer -&gt; ( [ [ ( Maybe Char , Integer ) ] ] , Integer )&lt;br /&gt;exn ('(':cs) n l = let ( ni , ns , x ) = oexn (gs Sub [] $ p2p $ lfn cs ) n x&lt;br /&gt;                   in&lt;br /&gt;                     if ni == [] then&lt;br /&gt;                         ( ns , x )&lt;br /&gt;                     else&lt;br /&gt;                         ( [ ni ] ++ ns , x )&lt;br /&gt;exn (c:_)    n l = ( [ [ ( Just c , (n+1) ) ] ] , (n+1) )&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;test = "a|b?c|d"&lt;br /&gt;&lt;br /&gt;main = do&lt;br /&gt;  print $ test                      &lt;br /&gt;  print $ lfn test                  &lt;br /&gt;  print $ p2p $ lfn test            &lt;br /&gt;  print $ gs Top [] $ p2p $ lfn test&lt;br /&gt;  --print $ let ( ni , ns , x ) = oexn (gs Top [] $ p2p $ lfn test ) 1 (x+1)&lt;br /&gt;  --        in ( ni : ns , x )  &lt;br /&gt;  print $ let ( ns , x ) = aexn [ "a" , "b" , "c" , "d" ] 1 x&lt;br /&gt;          in ( ns , x )&lt;br /&gt;  --&lt;br /&gt;  --print $ let ( ( ns , x ) ,&lt;br /&gt;  --              ( ni' , ns' , x' ) ) = ( aexn [ "abc", "d" ] x' x ,&lt;br /&gt;  --                                       let ( (ggs@((g@(ni:ns)::[(Maybe Char,Integer)]):gs)) , a ) = ( [ns] , 1 )&lt;br /&gt;  --                                       in if ns == [] then&lt;br /&gt;  --                                              ( ni , gs , a )&lt;br /&gt;  --                                          else&lt;br /&gt;  --                                              ( ( Nothing , a+1 ) , ggs , a+1 ) )&lt;br /&gt;  --        in ( ns , x ) &lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;Following the `print $ aexn ...' function above, you can see how the concept works.  The oexn portion fails however.&lt;br /&gt;&lt;br /&gt;I believe the reason is the way that the compiler handles the mutual dependence of the arguments and outputs of the two functions.  It appears that instead of passing a marker that a value is "in progress" when dealing with two separate functions, each one simply calls the other anew to try to generate it.  Since the functions rely on each others output to work, they loop trying to instantiate one another.&lt;br /&gt;&lt;br /&gt;A single function is allowed to accept as input its own output ( as is demonstrated in the above working code ) but trying to do this with mutually dependent functions causes them to enter an endless loop that smashes the stack.&lt;br /&gt;&lt;br /&gt;For any curious I compile the above with `The Glorious Glasgow Haskell Compilation System, version 6.6' on `Linux version 2.6.20-16-generic (root@terranova) (gcc version 4.1.2 (Ubuntu 4.1.2-0ubuntu4)) #2 SMP Thu Jun 7 20:19:32 UTC 2007'.&lt;br /&gt;&lt;br /&gt;I looked to see if there were options that might allow me to make the system realize that the calls need to be lazy to a single call of the function, but I do not see any.  Instead the system continues being confused into stack overflow by the dependence.&lt;br /&gt;&lt;br /&gt;The last function call ( commented out ) had me trying to inline the m2s function, but it caused the same error ( presumably because the compiler just converted the sample into a lambda that caused the same dissonance in gathering the data. &lt;br /&gt;&lt;br /&gt;Ah well.  Maybe in 6.7 ...&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3528393554015655906-2132199981138325675?l=michaelspeer.knome.net' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://michaelspeer.knome.net/feeds/2132199981138325675/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3528393554015655906&amp;postID=2132199981138325675' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3528393554015655906/posts/default/2132199981138325675'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3528393554015655906/posts/default/2132199981138325675'/><link rel='alternate' type='text/html' href='http://michaelspeer.knome.net/2007/06/impossible-is-only-possible-sometimes.html' title='The impossible is only possible sometimes.'/><author><name>Michael Speer</name><uri>http://www.blogger.com/profile/05288923100813819592</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3528393554015655906.post-3679449850247721597</id><published>2007-05-09T18:15:00.000-07:00</published><updated>2007-05-09T19:20:31.089-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='haskell'/><category scheme='http://www.blogger.com/atom/ns#' term='regular expression'/><title type='text'>Initial code at regular expressions in Haskell</title><content type='html'>Haskell is an interesting language.  It is, perhaps, the most intelligent system that I have programmed in.  The type system is beautiful, the pattern matching makes function definition in other languages seem laborious, the high precedence for functions applying to their arguments just feels right, and the high barrier of correctness needed for compilation along with the errors given is great.&lt;br /&gt;&lt;br /&gt;I am posting in order to share the work I have done thus far in learning Haskell.  I decided I wanted to do something at least a little interesting, so I decided to begin implementing some rudimentary regular expression functions.&lt;br /&gt;&lt;br /&gt;Currently I only have the framework for what I imagined.  As written it operates as follows :&lt;br /&gt;&lt;ul&gt;&lt;br /&gt;&lt;li&gt;the characters from left to right are converted into a list of functions and functions that combine functions&lt;/li&gt;&lt;br /&gt;&lt;li&gt;the combinating functions are used to foldr together the matching functions in from right to left&lt;/li&gt;&lt;br /&gt;&lt;li&gt;a string can be handed into this function chain for matching.&lt;/li&gt;&lt;br /&gt;&lt;/ul&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;-- takes a pattern string, returns two items                                                                                                                           &lt;br /&gt;--  1) a function that takes a target string and returns if it matches and how many characters                                                                         &lt;br /&gt;--  2) how much of the pattern was consumed to create this match function                                                                                              &lt;br /&gt;literal :: [Char] -&gt; ( [Char] -&gt; ( Bool , Int ) , Int )&lt;br /&gt;literal xxs = ( \yys -&gt; case ( xxs , yys ) of&lt;br /&gt;                          ( [] , _ ) -&gt; ( True , 0 )&lt;br /&gt;                          ( _ , [] ) -&gt; ( False, 0 )&lt;br /&gt;                          ((x:_),(y:_)) -&gt; if x == y then&lt;br /&gt;                                                ( True , 1 )&lt;br /&gt;                                            else&lt;br /&gt;                                                ( False , 0 )&lt;br /&gt;              , 1 )&lt;br /&gt;&lt;br /&gt;-- execute the match function to the left first, then the match function to the right second                                                                           &lt;br /&gt;leftCombine :: ( (([Char]-&gt;(Bool,Int)),Int) -&gt; (([Char]-&gt;(Bool,Int)),Int) -&gt; (([Char]-&gt;(Bool,Int)),Int) )&lt;br /&gt;leftCombine xf yf = let ( xmf , pc ) = xf  -- xfunction yfunction xmatchfunction patterncount yougettheidea                                                            &lt;br /&gt;                    in let ( ymf , pc' ) = yf&lt;br /&gt;                       in ( ( \ccs -&gt; let ( b , i ) = xmf ccs -- ccs is the eventual target string, note that \ :: ([Char]-&gt;(Bool,Int))                                &lt;br /&gt;                                      in if b == False then&lt;br /&gt;                                             ( False , 0 )&lt;br /&gt;                                         else&lt;br /&gt;                                             let ( b' , i' ) = ymf ( drop i ccs )&lt;br /&gt;                                             in if b' == False then&lt;br /&gt;                                                    ( False , 0 )&lt;br /&gt;                                                else&lt;br /&gt;                                                    ( True , i + i' ) )&lt;br /&gt;                          , pc + pc' )&lt;br /&gt;&lt;br /&gt;-- for each letter gather a matching function, and a function detailing how to attach the preceding match to it.                                                       &lt;br /&gt;-- right now it only has a single letter match with a straight left combine function                                                                                   &lt;br /&gt;-- future support for repitions will be handled by combining a 0matching true stub and a combinative                                                                   &lt;br /&gt;--   function that sets up the preceding match function to match as many elements to a limit as it can,                                                                &lt;br /&gt;--   backtracking and trying fewer matches on failures                                                                                                                 &lt;br /&gt;rxizer :: [Char] -&gt; [ ( (([Char]-&gt;(Bool,Int)),Int)-&gt;(([Char]-&gt;(Bool,Int)),Int)-&gt;(([Char]-&gt;(Bool,Int)),Int) , ([Char]-&gt;(Bool,Int),Int) ) ]&lt;br /&gt;rxizer [] = []&lt;br /&gt;rxizer pps@(p:_) = let ( cfn , ( fn , pc ) ) = case p of&lt;br /&gt;                                                 _ -&gt; ( leftCombine , literal pps )&lt;br /&gt;                   in ( cfn , ( fn , pc ) ) : rxizer ( drop pc pps )&lt;br /&gt;&lt;br /&gt;-- meshes together the series of generated match-functions by foldr'ing them together by their combinative-functions                                                   &lt;br /&gt;rx :: [Char] -&gt; [Char] -&gt; ( Bool , Int )&lt;br /&gt;rx [] = \ccs -&gt; (True,0)&lt;br /&gt;rx mms@(m:_) = fst $ snd $ foldr fnmerge truestub ( rxizer mms )&lt;br /&gt;    where&lt;br /&gt;      truestub = ( leftCombine , ( \c -&gt; ( True , 0 ) , 0 ) )&lt;br /&gt;      fnmerge x y = let ( cfn , fn ) = x&lt;br /&gt;                        ( cfn' , fn' ) = y&lt;br /&gt;                    in ( cfn , cfn' fn fn' )&lt;br /&gt;&lt;br /&gt;main = do&lt;br /&gt;  -- generate some matches                                                                                                                                             &lt;br /&gt;  --      pattern           target       -- output  &lt;br /&gt;  print $ "HelloWorld" `rx` "HelloWorld" -- (True,10)                                                                                                                  &lt;br /&gt;  print $ "Hello"      `rx` "HelloWorld" -- (True,5)                                                                                                                   &lt;br /&gt;  print $ "HelloWorld" `rx` "Hello"      -- (False,0)                                                                                                                  &lt;br /&gt;  print $ ""           `rx` ""           -- (True,0)                                                                                                                   &lt;br /&gt;&lt;/pre&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3528393554015655906-3679449850247721597?l=michaelspeer.knome.net' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://michaelspeer.knome.net/feeds/3679449850247721597/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3528393554015655906&amp;postID=3679449850247721597' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3528393554015655906/posts/default/3679449850247721597'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3528393554015655906/posts/default/3679449850247721597'/><link rel='alternate' type='text/html' href='http://michaelspeer.knome.net/2007/05/initial-code-at-regular-expressions-in.html' title='Initial code at regular expressions in Haskell'/><author><name>Michael Speer</name><uri>http://www.blogger.com/profile/05288923100813819592</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry></feed>
