The programmable programmer: November 2007

20071129

A context manager for temporary memoization.

This is very much a fragile hack at the moment. It's an interesting idea I think. I was disappointed when I initially found that the with_statement syntax does not restore the value of the `as var` upon completion.

This made doing something along the lines of


   with temporarily_memoized( func ) :
     for datum in data :
       func( datum )

unattainable.

Well, just a lot hackier actually.

Thus temporarily_memoized( ... ) was born :


#!/usr/bin/python

# a context manager for temporarily turning any function into
# a memoized version of itself.

from __future__ import with_statement
import contextlib , sys , types

def memoize ( func ) :
     """ In haskell mutability must be handled explicitly.
     Only fair that Python do the same to transparent functionality
     """
     remembered = {}
     def memoized ( *args ) :
          """ memoized version of function """
          if args in remembered :
               return remembered[ args ]
          else :
               new = func( *args )
               remembered[ args ] = new
               return new
     return memoized
    
@contextlib.contextmanager
def temporarily_memoized ( func ) :
     """ memoize the given function for the duration of this block
     save anything in the local variables that gets in the way of
     using this so that it can be restored afterward , the memoized
     version is found in the locals.  use on actual functions only.
     no members. """

     # this is being called, there has to be a frame above it
     frame = sys._getframe().f_back.f_back
     
     if func.func_name in frame.f_locals :
          f = frame.f_locals[ func.func_name ]
          frame.f_locals[ func.func_name ] = memoize( func )
          try :
               # this hack replaces whatever in the local scope
               # has the name of the variable.  if you try to use
               # the 'as whatever' portion of the syntax , you
               # are doing it wrong
               yield None 
          finally :
               frame.f_locals[ f.func_name ] = f
     else :
          frame.f_locals[ func.func_name ] = memoize( func )
          try :
               yield None
          finally :
               del frame.f_locals[ func.func_name ]

def fib(n):
     """ biggus fibbus """
     if n == 0 or n == 1:
          return n
     else:
          return fib(n-1) + fib(n-2)

if __name__ == '__main__' :
     print fib.__doc__
     with temporarily_memoized( fib ) :
          print fib.__doc__
          for i in xrange( 36 ) :
               print "n=%d => %d" % (i, fib(i))
          print fib.__doc__
     print fib.__doc__

I've played around with replacing the func_code and related variables in order to create a true proxy, but the requirement that the func_code and func_closure agree exactly on the number of local variables and that they refuse to set independently or be tricked into simultaneous setting by calling types.FunctionType.__init__( target , proxy-code , ... ) has thwarted my attempts at this. I considered creating a variety of proxy functions each requiring a different number of free variables and then selecting the appropriate one, which would probably work, but it seems too much the wrong way to do things.

20071128

Don's benchmark.

It appears the mob has spoken and that I am a wrong headed twit.

My apologies to Don on the misunderstanding. When I first saw the numbers I assumed that the speed had to be due to memoization. It's not.

I saw a link earlier to a post by Don Stewart in that he asserts that Haskell is 40 times faster than python for a given algorithm.

He then goes on to point out how easy it is to drop hints to the code to parallelize the code.

While I won't go into any threading issues, the core point that it takes python almost thirty seconds to generate each Fibonacci number up to 36 is misplaced.

Haskell is a language built around the idea of functions that can be assumed to be functionally transparent. If you give them the same set of arguments they'll give the same answer.

The upside to this is that the compiler writers can automatically memoize every function if they want to unless its been explicitly marked as being otherwise.

Python, being based upon the idea of dynamic objects rather than that of transparent functions has no need for such optimizing. The developers time is better spent ensuring rapid dictionary insertion and deletion over transparent function memoization.

But, as he doesn't seem to mind having to annotate concurrancy, I'm sure that annotating transparency is just as fair.

#!/usr/bin/python

# Don's original python
# def fib(n):
#     if n == 0 or n == 1:
#          return n
#     else:
#         return fib(n-1) + fib(n-2)
#     
# for i in range(36):
#     print "n=%d => %d" % (i, fib(i))

def memoize ( func ) :
    """ In haskell mutability must be handled explicitly.
    Only fair that Python do the same to transparent functionality
    """
    remembered = {}
    def memoized ( *args ) :
        if args in remembered :
            return remembered[ args ]
        else :
            new = func( *args )
            remembered[ args ] = new
            return new
    return memoized

@memoize
def fib(n):
    if n == 0 or n == 1:
         return n
    else:
        return fib(n-1) + fib(n-2)
    
for i in range(36):
    print "n=%d => %d" % (i, fib(i))

The above code runs at a comfortable

  real    0m0.020s
  user    0m0.012s
  sys     0m0.008s

which compared to his times

  Ruby (1.8.5)   64.26s
  Python (2.4) 25.16s
  Haskell (GHC 6.8)  0.78s
  Parallel Haskell (GHC 6.8)  0.67s

fairs reasonably.

Since Python allows mutable variables a looping implementation would have been more idiomatic python. Even better to use a generator. Then we can start output before the sequence is complete.

All his benchmark really showed was that memoization on transparent functions is really great. Thanks guy.

20071113

Random notes on query languages.

I started to include this in my post concerning my new language, but it relly isn't very helpful to the article. I want to keep the information around for myself later. Not all of these are generic information query languages, but all are examples of languages designed to ask for something.

QL :
Compiles to SQL. A combination of Datalog and SQL. Looks like it has a form of list comprehension. That could be useful.

Common Query Language :
This seems to be an attempt to standardize what thed user has to type into the innumerably implemented searches online and off. Simplicity is always nice to consider.

D :
Apparently this was a truly relational language. Google tells me concerning the site pointed at for it from wikipedia, "This site may harm your computer". I think this one best left alone.

Data Definition Language :
This appears to be a fuzzy logic belief system buried in a syntax molded to look like SQL.

Datalog :
Logic programming. ancestor(X,Y) := Parent(X,Y). The ancestor of X is Y if the Parent of X is Y. Could be useful. Looks rather constraining in what it could do easily.

ERROL :
This just led me to read about entity-relationship models, finally giving name to a project I created out of frustration with data manipulation about a year ago. This is a great system for data, this particular language is a natural language version interface. Better for people that machines to query with.

ISBL :
I found references to this but could not locate an example of its syntax.

LDAP :
This isn't a query language I hear you saying. Well, you're right. But I'm looking at data query, and this does query data, albeit not relational. The manner LDAP uses to filter datasets is nice. Very combinable, readable and easily created programatically. On being nice, forcing everything into a strict hierarchy is not.

MQL :
Not so much a language as a search API for finding structures and substructures in molecular simulations.

MDX : MultiDimensional eXpressions
This appears to have syntax borrowed from SQL with the addition of prepartitioned data and some niceties in reaching it. There are some interesting ideas at work here, I will have to look into them in the future.

OQL :
The binding of SQL tables and their associated columns to devices and their exposed attributes. The interesting difference is that instead of just returning scalars it can also return objects that can be traversed using dot-member syntax. The aggregation method differs from standard SQL in dividing items into groups that can then be referenced by values and subqueries using the keyword partition.

OPath :
A combination of API and string segment arguments that allow for locating items in the WinFS store. Apparently there will be a natural language to OPath translator. It looks like you acquire a query-object into a potion of the WinFS store programatically and then pass it a string specifying the boolean operations to indicate which items from the current partition of the store are desired.

QUEL :
SQL with a more complex from statement involving manipulating ranges and dropping the select keyword.

SPARQL :
This is a language designed for querying information found in RDF files. I applaud them for not merely repurposing the syntax of SQL to the task. It seems that many think this is what creating a query language means. The interesting part of the query consists of the where clause. Apparently the items returned are generated by specifying a series of interconnected assertions that whenever met return a clump of information consisting of all the specified variables that where intermixed with the assertions being bound to the values that allowed the assertions to match. It looks neat, but it bothered me that just glancing at it the query did not produce a mental model of how it operated. Maybe it was just me.

SQL :
The current cream of the crop in relational data manipulation, SQL has been allowing us to keep an external dataset and query and alter it in standard ways for more than thirty years. Of course the primary part that seems to be standard is mostly in the fashion that queries are marked up and some general agreement on what terms to use to indicate the various ways of stitching together tables of data. In general anything so clever as to be particularly interesting or useful is custom to only a given database and the few that might choose to copy them. For a standard language, the implementation specific libraries do much to unstandardize it. The primary limitation of the language in my experience is that anything you want to do must be expressed through the metaphor of stitching together tables of information and of course returning a single table of information resulting from this. Many clever hacks are used to bypass these limitations in practice of coarse, but trying to do anything other than simple column binds can become very confusing to look at very quickly.

WQL :
Microsofts query only SQL for WMI. A hidden SQL server on every server and workstation of a domain that only understands select queries. Useful.

20071111

Restructuring query languages.

SQL is the de facto language for the manipulation of relational data. Designed in the early 70s and standardized in the mid-80s, SQL has dozens of commercial and open-source servers written for it and enjoys being a language with almost no major competitors in its field.

It seems that all major relational databases are basically interpreters for various flavors of SQL. I've looked around from time to time for a decent alternative to SQL.

I haven't found anything interesting that was particularly useful out there.

I'm going to present for comments some of the ideas I have come up with to fix this. As you read below, know there is not yet an actual interpreter for my language. There will be, but I am not yet finished with it. I am writing an interpreter based on sqlite3 using lemon to create the parser and targeting the sqlite3 virtual machine which I will extend very minimally to handle some functionality I want that doesn't quite fit into their model.

My language is very much based on the entity-relationship model. SQL is often used to model this approach. It can be clumsy to fit into the tables of data paradigm, but it can.

The easiest way to explain my language of course will be to show a brief interpreter session. So here it goes.


Interpreter for AsYetUnnamedProgrammingLanguage by MichaelSpeer v0.-1

>

Let's begin by creating an entity to manipulate. We'll use products, something nicely generic that are probably manipulated by everyone reading this at some point.


> declare 
.   Products
.     :name PHRASE
.     :price MONEY'USD
. ;

There you are. We have a database containing products, each of which has a member specifying the name ( PHRASE : a string with no newlines ) and a price ( MONEY : a numeric datatype insured to two decimal places without fear of truncation errors, rounding errors, poor approximations or anything else silly like that. The tick after the MONEY datatype is something for which I do not yet have a permanent terminology, though I like it. I'll call it a shadow type. It is a small comment that must be present on anything that interacts with the particular datatype. So 3'USD + :price is good, whereas 3 + :price is an error. This is just an enforced helper to prevent 3'YEN + 3'USD from not being an error and giving a six.

The syntax for creating some instances of the Products entity is as follows :


> create Products [ :name , :price ] =
.   [ "DVD Player" , 20'USD   ] ,
.   [ "VCR"        , 15.5'USD ] ,
.   [ "DVD"        , 7'USD    ] ,
.   [ "VCR Tape"   , 4'USD    ]
. ;

So now we have Products in the system.


> Products ;
[ 1 , 2 , 3 , 4 ]

> Products:id ;
[ 1 , 2 , 3 , 4 ]

These are selectors. Selectors select what information from where will be available for call-back to the application.

As you can see, if you do not give an expressions_list or end the selector with a valuating member, then it will default to the id of whatever the current object is.


> Products[ :id , :name ] ;
[ 1 , "DVD Player" ]
[ 2 , "VCR"        ]
[ 3 , "DVD"        ]
[ 4 , "VCR Tape"   ]

By giving the expressions_list, the values in the selected attributes are returned to the caller.

So this doesn't allow anything interesting yet. The path to usefulness starts with filters.


> Products( :price < 5'USD )[ :name ];
[ "VCR Tape" ]

I mentioned earlier the difference between putting an expressions_list at the end of the selector vs placing an object_member there.


> Products( :price > 10'USD ):name ;
[ "DVD Player" , "VCR" ]

> Products( :price > 10'USD )[ :name ] ;
[ "DVD Player" ]
[ "VCR"        ]

Of course all of this would be a huge pain if there was no way to sort them. Sorting expressions can be interpolated directly into the filters.


> Products( asc :name ):name ;
[ "DVD" , "DVD Player" , "VCR" , "VCR Tape" ]

So far we've looked at how to manipulate a single object type. Now let's add a new one to put these products into some categories.


> declare
.   Categories
.     :name PHRASE
.     :sub-products -> Products:category single
. ;

The declaration of the Categories object-type and the :name member looks the same as last time. :sub-products is the new item. This line is declaring a relationship. The `Products:category single` section indicates that the relationship will interconnect Categories ( the current item ) and Products. The relationship can be referred to from a set of Categories using the :sub-products member and from a set of Products using the :category member. Furthermore, the caveat `single` means that any instance of a Products object may only relate to a single Categories set object. The caveat can appear on either or both sides.

Setting relationships is accomplished by placing a selector in the respective slot of the expressions_list being used to set the members. id est


> create Categories [ :name , :sub-products ] =
.   [ "Video Players" , Products( :name in [ "DVD Player" , "VCR" ] ) ]
. ;

Here is part of what I like about this syntax. The `:name` term above is a relative selector. When a filter has a selector that isn't grounded it is assumed to be relative to the current object in the filters. So the `:name` above is relative to Products. A filter of that nature can be read as objects from Products where Products:name is "DVD Player" or Products:name is "VCR". There isn't a limit to how many filters are applied either.

If you specify a relationship member as part of the filters then at that point a unique set of items available at that point are selected.


> Products:category:name ;
[ "Video Players" ]

> Products[ :category:name ] ;
[]
[ "Video Players" ]
[]
[ "Video Players" ]

Using the expression_list causes a separate return for each item whereas selecting the attribute directly takes the unique set of values from that attribute.

The reason that this happens is because in the first item the :category filter is selecting the set of all unique categories referred to by the set of current Products. The second selector is selecting each of the current set of Products ( all of them since no filter is applied ) and then for each of them returning a list of the :name attributes on the Categories in the set returned by the :category entry.

One of the things that will be immediately noticed is that I intend for my language to be able to represent and send out complex structures instead of just flat tables. The instruction to process a sub-list is the alteration to the virtual machine that will need to be made. I hope to link the language to python first planning to take advantage of the bindings that already exist in the builtin sqlite3 module.

Here are some more useful examples.


> Products( :category ):name ; -- returns products that have a category
[ "DVD Player" , "VCR" ]

> Products( not :category ):name ; -- returns products lacking a category
[ "DVD" , "VCR Tape" ]

> Products( :id != 3 && :id != 4 )[ :name ] ;
[ "DVD Player" ]
[ "VCR" ]

I will add the other two products to a category to demonstrate some more complex queries.


> create Categories [ :name , :sub-products ] =
.   [ "Blank Media" , Products( not :category ) ]
. ;

> Products( :category( :sub-products( :name == "DVD" ) ) ):name ;
[ "DVD" , "VCR Tape" ]

That second query is returning the names of Products that have Categories that have products in them named "DVD". The same query is entirely possible in SQL. Just far more complex.

This language doesn't do a whole lot that cannot be accomplished with SQL or other information engines. It does it easier. This, to me, is better.

Before I conclude we'll raise the price of media by a dollar per item and add another entry to Categories.


> alter Products( :category( :name == "Media" ) )[ :price ] = [ :price += 1'USD ] ;

> declare Categories:recommended -> :sub-products:recommended-by ;

This declaration creates an attribute that will be allowed to point at any number of items that are found in the same items :sub-products list.


> alter Categories( :name == "Media" )[ :recommended ] = [ :sub-products( :name == "DVD" ) ] ;

There will be a caveat called `automatic` that will allow adding items not already in :sub-products to :recommended wherein the added item would also be added to :sub-products and removing the recommended item from :sub-products would remove it from :recommended. Otherwise trying these things will result in an error.

The + sign will be a union operator against sets and the - sign will give all of the first set except those in the second set. As well union( ... ) and except( ... ) functions will likely exist.

This should be enough information to gather thoughts from others on. So, please leave any thoughts you have so I can roll them into the work I am doing.

The current Categories entity type could have been declared initially as :


> declare
.   Categories
.     :name PHRASE
.     :sub-products -> Products:category single
.        :recommended -> :sub-products:recommend-by
. ;

That should be it for now.

20071101

Multi-Argument map For perl

I've been playing with perl recently. I avoiding the language for a long time because I hit python first and foolishly believed the hoards when they decried everything about perl. It can be cryptic, but perl is far from being without merit.

One of the things that kept bugging me when programming in perl was that there was the lack of a multi-argument grabbing map. I'd want to chain against the values of a hash and found annoyingly that the syntax was unhelpful.

A hash in list context dumps its keys and values into a zipped list.

{ 'k1' : 'v1' , 'k2' : 'v2 , 'k3' : 'v3' } -> [ 'k1' , 'v1' , 'k2' , 'v2' , 'k3' , 'v3' ]

This pretty much screws any sane sort of action that can be taken against the data. So instead you have to use `keys' to get a list of just keys in a `for' statement and then use the keys to access members of the hash.

I didn't really appreciate this restriction, as it meant I couldn't easily massage data with an anonymous function by mapping a sub block across it.

Thus was born mapper.

mapper takes three arguments. well, this being perl it takes one huge list of arguments. But it treats them like (defun mapper ( n f &rest L ) ... ) where

n : the number of elements to peel off of the list and hand to the sub function

f : a sub function

L : the list of items

It isn't long.


sub mapper
{
 # n - number of items to pass at a time to the subroutine
 # s - the subroutine
 # r - the list of things to chunk and pass to the subroutine
 # x - number of items to chunk
 # c - index
 # l - mapped values

 local @_ = @_ ;

 my ( $n , $s , @r ) = @_ ;
 my $x = @r ;
 my $c = 0 ;
 my @l ;

 while( $x > $c )
 {
     push @l , &$s( @r[ $c .. ( ( $c + $n ) - 1 >= $x ? $x - 1 : ( $c + $n ) - 1 ) ] ) ;
     $c += $n ;
 }

 return @l ;

}

From this I derived a quick transposition function just to play with it ( demonstrating its use )


sub transpose
{
   local @_ = @_ ;
   mapper 2 , sub { ( $_[1] , $_[0] ) } , @_ ;
}

The function is rather obfuscatory when nested ...


print join ' ' , mapper 3 , sub { '<' . ( join '' , mapper 1 , sub { '[' . $_[0] . ']' } , @_ ) . '>' } , qw( the quick brown fox jumped over the lazy brown dog ) ;
print "\n" ;

You'll see from the last that mapper when presented with fewer than the expected number of items at the end of a list just passes along what it has, which is likely unhelpful for debugging, but helpful for applying n-arary functions across lists.

The programmable programmer