The programmable programmer: haskell

Showing posts with label haskell. Show all posts

20080201

Function chaining in arc.

I noticed some complaints about a lack of the equivalent of Haskell's point-free programming style for Arc, I decided to implement one of Haskell's more useful operators.

The $ operator in Haskell breaks a statement into two statements. It executes the second and then uses the output as the final variable to the first.

 e.g. 
main = print $ "hello" ++ " " ++ "world"

is the same as

main = print ( "hello" ++ " " ++ "world )

Lacking infix operators I cannot recreate the functionality exactly. Nor, given that Arc is a lisp and doing so would be decidedly unlispy, would I want to. Instead I have created the following chain macro that will allow a similar form of composition. I do not believe it would be difficult to turn the following into a mutli-argument compose macro either.

 ;; chains similar to haskells $ operator.                                                                                                                                                                         
(mac chain args
  (if (is 0 (len args))
        nil
      (is 1 (len args))
        (args 0)
      t
        (let n (uniq)
          `(let ,n ,(eval `(chain ,@(cdr args)))
             (eval (+ ',(args 0) (list ,n)))))))

This allows for the following :

arc>  (chain (+ 3) (+ 4 5) (- 2) (+ 6) 8)
0
arc> (chain)
nil
arc> (chain (+ 4))
4
arc> (chain (+ 4) (+ 4 8))
16
arc> (chain (+ 4) (+ 5 6) (- 5) 50)
-30
arc>  (chain (+ 3) ([/ _ 3]) ([- _ 2]) (+ 100) 12)
119/3
arc>

If it didn't wrap a (list ...) around each return value and required each return value to be a list in its own right ( or only wrapped non-cons structures ) it would have similar semantics to perls execution model, compressing each list into the current argument chain.

The chain macro needs all of the function calls to be wrapped in paren for it to operate properly, since it takes each value and appends it to the next function.

20080115

Twin naming golf

I saw Brad Fitzpatrick's post on anagram twin names and decided to join in the golf. Weeee!


import Control.Arrow ( first , second , (&&&) , (***) )
import List ( sortBy , groupBy , intersperse )

main = interact $ unlines
       . map ( uncurry (++) )
       . map ( ( ++ " : " ) . fst . head &&& concat . intersperse ", " . map snd )
       . filter ( (> 1) . length )
       . groupBy ( curry $ uncurry (==) . (***) fst fst )
       . sortBy (curry $ uncurry compare . (***) fst fst )
       . map ( sortBy compare &&& id )
       . filter ( not . null )
       . map ( fst . break (== ' ') )
       . lines

hehehe

20070726

The impossible is possible : Irrefutable patterns.

Earlier this year I wrote about a regular expression code engine I was writing in Haskell that didn't work because of a recursive call problem. I was fairly sure the code could work if I had a way to inform the compiler of the needed laziness, a way to allow a number of different functions that each depended on the output of the each other as their inputs to execute, but believed that the way Haskell worked would permanently prevent this. Earlier today, linuxer on Reddit linked to an article concerning Irrefutable patterns, and I had what felt like an epiphany.

Remembering the code, I went back and corrected it only by adding the needed tilde to specify that the patterns should be irrefutable in the oexn function definitions. A few bugs fixes later, and the code works.


-- rx : a regular expression engine for haskell
--    : (c) 2007 by michael speer , released GPL v2.0 license or later

import Maybe

-- get-function : gather a nodes worth of pattern from a pattern-string
--                returns the segment and the remaining pattern in a tuple
gf :: [Char] -> ( [Char] , [Char] )
gf ('(':ps)     = let ( segment , remaining ) = gpf [] ps
                 in ( '(':segment , remaining )
    where
      -- get-parenthesis'd fragment
      gpf :: [Char] -> [Char] -> ( [Char] , [Char] )
      gpf segment []       = ( segment , [] )
      gpf segment (')':ps) = ( segment++")" , ps )
      gpf segment ('(':ps) = let ( subsegment , remaining ) = gpf [] ps
                             in gpf ( segment ++ ('(':subsegment) ) remaining
      gpf segment (p:ps)   = gpf (segment ++ p:[]) ps
gf ('\\':p:ps)  = ( '\\':p:[] , ps )
gf ('?':'?':ps) = ( "??" , ps )
gf ('*':'?':ps) = ( "*?" , ps )
gf (p:ps)       = ( p:[] , ps )

-- list-functions : creates a list of function-pattern-segments from a regular expression
lfn :: [Char] -> [[Char]]
lfn []  = []
lfn pps = let ( fn , remaining ) = gf pps
          in fn : lfn remaining

-- postfix-to-prefix : moves all repitition characters to before their targets
--                     in addition running the pattern through this function
--                     ensures that the pattern is legal.
p2p :: [[Char]] -> [[Char]]
p2p []                 = []
p2p (('?':_):_)        = error "Misplaced repitition operator `?' "
p2p (('*':_):_)        = error "Misplaced repitition operator `*' "
p2p ("\\":_)           = error "Escape sequence targetting nothing"
p2p ((')':_):[])       = ")" : []                                          -- allows for a close-parum at end of pattern 
                                                                           -- later processing must test for this and throw error if it is unexpected ( eg top-level
p2p ((')':_):_)        = error "Unmatched terminating group marker , `)' " -- catches mid pattern erroneous close-para
p2p ("|":('?':_):_)    = error "Repitition operator `?' applied to procedural alternation operator `|' "
p2p (s1:s2@("??"):ps)  = ('?':'?': s1) : p2p ps
p2p (s1:s2@("?"):ps)   = ('?': s1) : p2p ps
p2p (s1:s2@("*?"):ps)  = ('*':'?': s1) : p2p ps
p2p (s1:s2@("*"):ps)   = ('*': s1) : p2p ps
p2p (s1:ps)            = s1 : p2p ps

-- group-split : break the contents of a parenthesized group into peices
--               build a current list of segments until | is encountered
--               then append the result of calling this again on the 
--               remainder
data PatternDepth = Top | Sub -- Pattern depth
gs :: PatternDepth -> [ [ Char ] ] -> [ [ Char ] ] -> [ [ [ Char ] ] ]
gs Top segments (")":[])   = error "Unmatched terminating group marker `)' at end of pattern"
gs Sub segments (")":[])   = gs Sub segments []
gs _   segments []         = if segments /= [] then
                                         segments : []
                                     else
                                         [""] : []
gs pd  segments ("|":rest) = segments : gs pd [] rest
gs pd  segments (s:rest)   = gs pd (segments++[s]) rest 

-- many to single nodes
m2s :: [ [ ( Maybe Char , Integer ) ] ] -> Integer -> ( ( Maybe Char , Integer ) , [ [ ( Maybe Char , Integer ) ] ] , Integer )
m2s (g@(ni:[]):gs) n = ( ni                  , gs  , n   ) -- if the first list is one long, pop it off
m2s ggs            n = ( ( Nothing , (n+1) ) , ggs , n+1 ) -- otherwise add a new member at the beginning to act as a pointer to it

-- or-extracted-nodes
oexn :: [ [ [ Char ] ] ] -> Integer -> Integer -> ( [ ( Maybe Char , Integer ) ] , [ [ ( Maybe Char , Integer ) ] ] , Integer )
oexn (g:[])    n l = let ( ~( ns , x ) ,
                           ~( ni' , ns' , x' ) ) = ( aexn g x' l ,
                                                     m2s ns n )
                     in ( [ ni' ] , ns' , x )

oexn (g:g':[]) n l = let ( ~( ns    , x            ) ,
                           ~( ni'   , ns'   , x'   ) ,
                           ~( ns''  , x''          ) ,
                           ~( ni''' , ns''' , x''' ) ) = ( aexn g x' l ,
                                                           m2s ns n ,
                                                           aexn g' x''' l ,
                                                           m2s ns'' x )
                     in
                       ( ni' : ni''' : [] , ns' ++ ns''' , (x''-1) )

oexn (g:gs)    n l = let ( ~( ns   , x          ) ,
                           ~( ni'  , ns'  , x'  ) ,
                           ~( ni'' , ns'' , x'' ) ) = ( aexn g x' l ,
                                                        m2s ns n ,
                                                        oexn gs x l )
                     in
                       ( ni' : ni'' , ns' ++ ns'' , (x''-1) )

-- and-extracted-nodes
aexn :: [ [ Char ] ] -> Integer -> Integer -> ( [ [ ( Maybe Char , Integer ) ] ] , Integer )
aexn (b:[]) n l = exn b n l 
aexn (b:bs) n l = let ( ~( ns  , x  ) ,
                        ~( ns' , x' ) ) = ( exn b n x ,
                                            aexn bs x l )
                  in
                    ( ns ++ ns' , x' )

-- subgroup
exn :: [ Char ] -> Integer -> Integer -> ( [ [ ( Maybe Char , Integer ) ] ] , Integer )
exn ('(':cs) n l = let ~( ni , ns , x ) = oexn (gs Sub [] $ p2p $ lfn cs ) n l
                   in
                     ( [ ni ] ++ ns , x )

-- non-greedy
exn ('?':'?':cs) n l = let ~( ni , ns , x ) = oexn (gs Top [] $ p2p $ lfn cs ) (n+1) x
                   in
                     ( [ [ ( Nothing , l ) , ( Nothing , (n+1) ) ] ] ++ [ ni ] ++ ns , x )

-- greedy
exn ('?':cs) n l = let ~( ni , ns , x ) = oexn (gs Top [] $ p2p $ lfn cs ) (n+1) x
                   in
                     ( [ [ ( Nothing , (n+1) ) , ( Nothing , l ) ] ] ++ [ ni ] ++ ns , x )

-- non-greedy
exn ('*':'?':cs) n l = let ~( ni , ns , x ) = oexn (gs Top [] $ p2p $ lfn cs ) (n+1) n
                   in
                     ( [ [ ( Nothing , l ) , ( Nothing , (n+1) ) ] ] ++ [ ni ] ++ ns , x )

-- greedy
exn ('*':cs) n l = let ~( ni , ns , x ) = oexn (gs Top [] $ p2p $ lfn cs ) (n+1) n
                   in
                     ( [ [ ( Nothing , (n+1) ) , ( Nothing , l ) ] ] ++ [ ni ] ++ ns , x )

exn (c:_)    n l = ( [ [ ( Just c , l ) ] ] , (n+1) )
exn []       n l = ( [ [ ( Nothing , l ) ] ] , (n+1) )

-- compile a regular expression into a DFA array
rx_compile pps = let ~( ni , ns , x ) = oexn (gs Top [] $ p2p $ lfn pps ) 1 x
                 in 
                   if ni == [] 
                   then
                       ( ns , x )
                   else
                       ( ni : ns , x )  

main = do
  -- print $ test                      
  -- print $ lfn test                  
  -- print $ p2p $ lfn test
  print $ rx_compile "ab?c" -- arrayed !
  print $ rx_compile "()*" -- if you do something stupid, the system will not protect you.

What this will currently do is to convert a string like "ab?c" into a list containing lists each item of which specify where to go next in matching. For example, the given pattern would end up as


  ( [  [ (Just 'a',2)                ] ,
       [ (Nothing ,3) , (Nothing,4)  ] ,
       [ (Just 'b',4)                ] ,
       [ (Just 'c',5)                ] 
    ] ,
    5 
   )

This list is returned in a tuple with a number representing one over the length of the list. This is were the next item would go if the list were still being built.

A parser that accepts a string and uses this structure as a ruleset in matching it should be trivial to make.

Thanks for the link linuxer.

20070611

The impossible is only possible sometimes.

I've been working at creating a regular expression engine in Haskell again and have come up with what feels like a fairly elegant solution. Through a series of functions the following conversions happen :


  print $ test                        -- "a|b?c|d"
  print $ lfn test                    -- ["a","|","b","?","c","|","d"]
  print $ p2p $ lfn test              -- ["a","|","?b","c","|","d"]
  print $ gs Top [] $ p2p $ lfn test  -- [["a"],["?b","c"],["d"]]

Now I will convert this last output into a series of rules for what to do at certain points of evaluation, using the information at http://swtch.com/~rsc/regexp/ as a guide to how good regular expressions work.

I want to perform the following conversion :


[["a"],["?b","c"],["d"]] -> ( [ [ (Just 'a', 4 ) , (Nothing , 2) , (Just 'd' , 4 ) ] ,
                                [ (Just 'b', 3 ) , ( Nothing , 3 ) ] ,
                                [ (Just 'c', 4 ) ]
                                ]
                              ,
                              4 )

A parser for such a rule set should be simple enough, and once created I will use the Array module to make it go at a decent speed.

I've coded up through the transformation into a ruleset, and have hit an interesting boundary. When designing my solution I remembered an interesting article I had seen when initially starting to look at Haskell wherein the author was coding an assembler and passed the output in as one of the initial arguments. The idea intrigued me.

The algorithm I derived to make my transformation used this concept judiciously. Unfortunately it does not work, though perhaps not for the reasons one might guess.

Here is the code :



-- rx : a regular expression engine for haskell
--    : (c) 2007 by michael speer , released GPL v2.0 license or later

import Maybe

-- get-function : gather a nodes worth of pattern from a pattern-string
--                returns the segment and the remaining pattern in a tuple
gf :: [Char] -> ( [Char] , [Char] )
gf ('(':ps)     = let ( segment , remaining ) = gpf [] ps
                 in ( '(':segment , remaining )
    where
      -- get-parenthesis'd fragment
      gpf :: [Char] -> [Char] -> ( [Char] , [Char] )
      gpf segment []       = ( segment , [] )
      gpf segment (')':ps) = ( segment++")" , ps )
      gpf segment ('(':ps) = let ( subsegment , remaining ) = gpf [] ps
                             in gpf ( segment ++ ('(':subsegment) ) remaining
      gpf segment (p:ps)   = gpf (segment ++ p:[]) ps
gf ('\\':p:ps)  = ( '\\':p:[] , ps )
gf ('?':'?':ps) = ( "??" , ps )
gf ('*':'?':ps) = ( "*?" , ps )
gf (p:ps)       = ( p:[] , ps )

-- list-functions : creates a list of function-pattern-segments from a regular expression
lfn :: [Char] -> [[Char]]
lfn []  = []
lfn pps = let ( fn , remaining ) = gf pps
          in fn : lfn remaining

-- postfix-to-prefix : moves all repitition characters to before their targets
--                     in addition running the pattern through this function
--                     ensures that the pattern is legal.
p2p :: [[Char]] -> [[Char]]
p2p []                 = []
p2p (('?':_):_)        = error "Misplaced repitition operator `?' "
p2p (('*':_):_)        = error "Misplaced repitition operator `*' "
p2p ("\\":_)           = error "Escape sequence targetting nothing"
p2p ((')':_):[])       = ")" : []                                          -- allows for a close-parum at end of pattern 
                                                                           -- later processing must test for this and throw error if it is unexpected ( eg top-level
p2p ((')':_):_)        = error "Unmatched terminating group marker , `)' " -- catches mid pattern erroneous close-para
p2p ("|":('?':_):_)    = error "Repitition operator `?' applied to procedural alternation operator `|' "
p2p (s1:s2@("??"):ps)  = ('?':'?': s1) : p2p ps
p2p (s1:s2@("?"):ps)   = ('?': s1) : p2p ps
p2p (s1:s2@("*?"):ps)  = ('*':'?': s1) : p2p ps
p2p (s1:s2@("*"):ps)   = ('*': s1) : p2p ps
p2p (s1:ps)            = s1 : p2p ps

-- group-split : break the contents of a parenthesized group into peices
--               build a current list of segments until | is encountered
--               then append the result of calling this again on the 
--               remainder
data PatternDepth = Top | Sub -- Pattern depth
gs :: PatternDepth -> [ [ Char ] ] -> [ [ Char ] ] -> [ [ [ Char ] ] ]
gs Top segments (")":[])   = error "Unmatched terminating group marker `)' at end of pattern"
gs Sub segments (")":[])   = gs Sub segments []
gs _   segments []         = if segments /= [] then
                                         segments : []
                                     else
                                         [""] : []
gs pd  segments ("|":rest) = segments : gs pd [] rest
gs pd  segments (s:rest)   = gs pd (segments++[s]) rest 

-- many to single nodes
m2s :: [ [ ( Maybe Char , Integer ) ] ] -> Integer -> ( ( Maybe Char , Integer ) , [ [ ( Maybe Char , Integer ) ] ] , Integer )
m2s (g@(ni:[]):gs) n = ( ni                  , gs  , n   ) -- if first group is one long : decrease the count by one so that popping the top item off mess up the counting
m2s ggs            n = ( ( Nothing , (n+1) ) , ggs , (n+1) ) -- otherwise add a new member at the beginning to act as a pointer to it

-- or-extracted-nodes
oexn :: [ [ [ Char ] ] ] -> Integer -> Integer -> ( [ ( Maybe Char , Integer ) ] , [ [ ( Maybe Char , Integer ) ] ] , Integer )
oexn (g:[])    n l = let ( ns , x ) = ( aexn g x l )
                     in ( [] , ns , x )

oexn (g:g':[]) n l = let ( ( ns    , x            ) ,
                           ( ni'   , ns'   , x'   ) ,
                           ( ns''  , x''          ) ,
                           ( ni''' , ns''' , x''' ) ) = ( aexn g x' l ,
                                                          m2s ns n ,
                                                          aexn g' x''' l ,
                                                          m2s ns'' x )
                     in
                       ( ni' : ni''' : [] , ns' ++ ns''' , x'' )

oexn (g:gs)    n l = let ( ( ns   , x          ) ,
                           ( ni'  , ns'  , x'  ) ,
                           ( ni'' , ns'' , x'' ) ) = ( aexn g x' l ,
                                                       m2s ns n ,
                                                       oexn gs x l )
                     in
                       ( ni' : ni'' , ns' ++ ns'' , x'' )

-- and-extracted-nodes
aexn :: [ [ Char ] ] -> Integer -> Integer -> ( [ [ ( Maybe Char , Integer ) ] ] , Integer )
aexn (b:[]) n l = exn b n l 
aexn (b:bs) n l = let ( ( ns  , x  ) ,
                        ( ns' , x' ) ) = ( exn b n x ,
                                           aexn bs x l )
                  in
                    ( ns ++ ns' , x' )

-- extract-node
exn :: [ Char ] -> Integer -> Integer -> ( [ [ ( Maybe Char , Integer ) ] ] , Integer )
exn ('(':cs) n l = let ( ni , ns , x ) = oexn (gs Sub [] $ p2p $ lfn cs ) n x
                   in
                     if ni == [] then
                         ( ns , x )
                     else
                         ( [ ni ] ++ ns , x )
exn (c:_)    n l = ( [ [ ( Just c , (n+1) ) ] ] , (n+1) )


test = "a|b?c|d"

main = do
  print $ test                      
  print $ lfn test                  
  print $ p2p $ lfn test            
  print $ gs Top [] $ p2p $ lfn test
  --print $ let ( ni , ns , x ) = oexn (gs Top [] $ p2p $ lfn test ) 1 (x+1)
  --        in ( ni : ns , x )  
  print $ let ( ns , x ) = aexn [ "a" , "b" , "c" , "d" ] 1 x
          in ( ns , x )
  --
  --print $ let ( ( ns , x ) ,
  --              ( ni' , ns' , x' ) ) = ( aexn [ "abc", "d" ] x' x ,
  --                                       let ( (ggs@((g@(ni:ns)::[(Maybe Char,Integer)]):gs)) , a ) = ( [ns] , 1 )
  --                                       in if ns == [] then
  --                                              ( ni , gs , a )
  --                                          else
  --                                              ( ( Nothing , a+1 ) , ggs , a+1 ) )
  --        in ( ns , x )

Following the `print $ aexn ...' function above, you can see how the concept works. The oexn portion fails however.

I believe the reason is the way that the compiler handles the mutual dependence of the arguments and outputs of the two functions. It appears that instead of passing a marker that a value is "in progress" when dealing with two separate functions, each one simply calls the other anew to try to generate it. Since the functions rely on each others output to work, they loop trying to instantiate one another.

A single function is allowed to accept as input its own output ( as is demonstrated in the above working code ) but trying to do this with mutually dependent functions causes them to enter an endless loop that smashes the stack.

For any curious I compile the above with `The Glorious Glasgow Haskell Compilation System, version 6.6' on `Linux version 2.6.20-16-generic (root@terranova) (gcc version 4.1.2 (Ubuntu 4.1.2-0ubuntu4)) #2 SMP Thu Jun 7 20:19:32 UTC 2007'.

I looked to see if there were options that might allow me to make the system realize that the calls need to be lazy to a single call of the function, but I do not see any. Instead the system continues being confused into stack overflow by the dependence.

The last function call ( commented out ) had me trying to inline the m2s function, but it caused the same error ( presumably because the compiler just converted the sample into a lambda that caused the same dissonance in gathering the data.

Ah well. Maybe in 6.7 ...

20070509

Initial code at regular expressions in Haskell

Haskell is an interesting language. It is, perhaps, the most intelligent system that I have programmed in. The type system is beautiful, the pattern matching makes function definition in other languages seem laborious, the high precedence for functions applying to their arguments just feels right, and the high barrier of correctness needed for compilation along with the errors given is great.

I am posting in order to share the work I have done thus far in learning Haskell. I decided I wanted to do something at least a little interesting, so I decided to begin implementing some rudimentary regular expression functions.

Currently I only have the framework for what I imagined. As written it operates as follows :

the characters from left to right are converted into a list of functions and functions that combine functions

the combinating functions are used to foldr together the matching functions in from right to left

a string can be handed into this function chain for matching.


-- takes a pattern string, returns two items                                                                                                                           
--  1) a function that takes a target string and returns if it matches and how many characters                                                                         
--  2) how much of the pattern was consumed to create this match function                                                                                              
literal :: [Char] -> ( [Char] -> ( Bool , Int ) , Int )
literal xxs = ( \yys -> case ( xxs , yys ) of
                          ( [] , _ ) -> ( True , 0 )
                          ( _ , [] ) -> ( False, 0 )
                          ((x:_),(y:_)) -> if x == y then
                                                ( True , 1 )
                                            else
                                                ( False , 0 )
              , 1 )

-- execute the match function to the left first, then the match function to the right second                                                                           
leftCombine :: ( (([Char]->(Bool,Int)),Int) -> (([Char]->(Bool,Int)),Int) -> (([Char]->(Bool,Int)),Int) )
leftCombine xf yf = let ( xmf , pc ) = xf  -- xfunction yfunction xmatchfunction patterncount yougettheidea                                                            
                    in let ( ymf , pc' ) = yf
                       in ( ( \ccs -> let ( b , i ) = xmf ccs -- ccs is the eventual target string, note that \ :: ([Char]->(Bool,Int))                                
                                      in if b == False then
                                             ( False , 0 )
                                         else
                                             let ( b' , i' ) = ymf ( drop i ccs )
                                             in if b' == False then
                                                    ( False , 0 )
                                                else
                                                    ( True , i + i' ) )
                          , pc + pc' )

-- for each letter gather a matching function, and a function detailing how to attach the preceding match to it.                                                       
-- right now it only has a single letter match with a straight left combine function                                                                                   
-- future support for repitions will be handled by combining a 0matching true stub and a combinative                                                                   
--   function that sets up the preceding match function to match as many elements to a limit as it can,                                                                
--   backtracking and trying fewer matches on failures                                                                                                                 
rxizer :: [Char] -> [ ( (([Char]->(Bool,Int)),Int)->(([Char]->(Bool,Int)),Int)->(([Char]->(Bool,Int)),Int) , ([Char]->(Bool,Int),Int) ) ]
rxizer [] = []
rxizer pps@(p:_) = let ( cfn , ( fn , pc ) ) = case p of
                                                 _ -> ( leftCombine , literal pps )
                   in ( cfn , ( fn , pc ) ) : rxizer ( drop pc pps )

-- meshes together the series of generated match-functions by foldr'ing them together by their combinative-functions                                                   
rx :: [Char] -> [Char] -> ( Bool , Int )
rx [] = \ccs -> (True,0)
rx mms@(m:_) = fst $ snd $ foldr fnmerge truestub ( rxizer mms )
    where
      truestub = ( leftCombine , ( \c -> ( True , 0 ) , 0 ) )
      fnmerge x y = let ( cfn , fn ) = x
                        ( cfn' , fn' ) = y
                    in ( cfn , cfn' fn fn' )

main = do
  -- generate some matches                                                                                                                                             
  --      pattern           target       -- output  
  print $ "HelloWorld" `rx` "HelloWorld" -- (True,10)                                                                                                                  
  print $ "Hello"      `rx` "HelloWorld" -- (True,5)                                                                                                                   
  print $ "HelloWorld" `rx` "Hello"      -- (False,0)                                                                                                                  
  print $ ""           `rx` ""           -- (True,0)

The programmable programmer