20070509

Initial code at regular expressions in Haskell

Haskell is an interesting language. It is, perhaps, the most intelligent system that I have programmed in. The type system is beautiful, the pattern matching makes function definition in other languages seem laborious, the high precedence for functions applying to their arguments just feels right, and the high barrier of correctness needed for compilation along with the errors given is great.

I am posting in order to share the work I have done thus far in learning Haskell. I decided I wanted to do something at least a little interesting, so I decided to begin implementing some rudimentary regular expression functions.

Currently I only have the framework for what I imagined. As written it operates as follows :


  • the characters from left to right are converted into a list of functions and functions that combine functions

  • the combinating functions are used to foldr together the matching functions in from right to left

  • a string can be handed into this function chain for matching.



-- takes a pattern string, returns two items
-- 1) a function that takes a target string and returns if it matches and how many characters
-- 2) how much of the pattern was consumed to create this match function
literal :: [Char] -> ( [Char] -> ( Bool , Int ) , Int )
literal xxs = ( \yys -> case ( xxs , yys ) of
( [] , _ ) -> ( True , 0 )
( _ , [] ) -> ( False, 0 )
((x:_),(y:_)) -> if x == y then
( True , 1 )
else
( False , 0 )
, 1 )

-- execute the match function to the left first, then the match function to the right second
leftCombine :: ( (([Char]->(Bool,Int)),Int) -> (([Char]->(Bool,Int)),Int) -> (([Char]->(Bool,Int)),Int) )
leftCombine xf yf = let ( xmf , pc ) = xf -- xfunction yfunction xmatchfunction patterncount yougettheidea
in let ( ymf , pc' ) = yf
in ( ( \ccs -> let ( b , i ) = xmf ccs -- ccs is the eventual target string, note that \ :: ([Char]->(Bool,Int))
in if b == False then
( False , 0 )
else
let ( b' , i' ) = ymf ( drop i ccs )
in if b' == False then
( False , 0 )
else
( True , i + i' ) )
, pc + pc' )

-- for each letter gather a matching function, and a function detailing how to attach the preceding match to it.
-- right now it only has a single letter match with a straight left combine function
-- future support for repitions will be handled by combining a 0matching true stub and a combinative
-- function that sets up the preceding match function to match as many elements to a limit as it can,
-- backtracking and trying fewer matches on failures
rxizer :: [Char] -> [ ( (([Char]->(Bool,Int)),Int)->(([Char]->(Bool,Int)),Int)->(([Char]->(Bool,Int)),Int) , ([Char]->(Bool,Int),Int) ) ]
rxizer [] = []
rxizer pps@(p:_) = let ( cfn , ( fn , pc ) ) = case p of
_ -> ( leftCombine , literal pps )
in ( cfn , ( fn , pc ) ) : rxizer ( drop pc pps )

-- meshes together the series of generated match-functions by foldr'ing them together by their combinative-functions
rx :: [Char] -> [Char] -> ( Bool , Int )
rx [] = \ccs -> (True,0)
rx mms@(m:_) = fst $ snd $ foldr fnmerge truestub ( rxizer mms )
where
truestub = ( leftCombine , ( \c -> ( True , 0 ) , 0 ) )
fnmerge x y = let ( cfn , fn ) = x
( cfn' , fn' ) = y
in ( cfn , cfn' fn fn' )

main = do
-- generate some matches
-- pattern target -- output
print $ "HelloWorld" `rx` "HelloWorld" -- (True,10)
print $ "Hello" `rx` "HelloWorld" -- (True,5)
print $ "HelloWorld" `rx` "Hello" -- (False,0)
print $ "" `rx` "" -- (True,0)

No comments:

(2) the subculture of the compulsive programmer, whose ethics prescribe that one silly idea and a month of frantic coding should suffice to make him a life-long millionaire. --ewd1036