Sunday, May 4th, 2008

It occurs to me that there is a similarity between a zipper and the combination of an expression and an environment, as for evaluation. I've got a problem that seems almost solvable by use of this, and I would especially like to hear Planet Haskell readers' thoughts on the matter.

(The expressions given here are in E syntax.)

For example, let's start with the expression

  def x := 74
  x + 1

For those unfamiliar with E, the tree this expression parses into has structure like this (ignoring some extra features and names):

  sequence(define(var("x"), literal(74)),
           call(var("x"), "add", literal(1)))

If we move “down” zipperwise into the second subexpression of the sequence, then we get the expressionx + 1” with the contextdef x := 74 before this in a sequence”. Similarly, an interpreter uses an intermediate result of “x + 1”, “x is bound to 74”.


With a slightly extended context, I'm trying to use this to express program analysis in an auditor. The zipper context now tracks a small amount of information about variables, as well as the enclosing expressions.

The first problem is to determine that all occurrences of a given variable are used in a particular fashion: e.g. for f, permitting “f(1)” but not “f("foo")” or “somethingElse(f)” (where the third case would permit f to be used arbitrarily since the auditor doesn't get to see somethingElse’s code). This is easily accomplished by walking the entire expression looking for occurrences of var("f"), and walking “up” from each of those occurrences to check that the enclosing expressions form a permitted usage.

The second problem is to determine that the variable in a particular position is a free variable, and that its guard is of a particular form. This can be done by extending the context objects to record what variable names are bound by any expression preceding the current location (that is not inside its own block); thus walking up the contexts to the top-level determines whether the variable is free.

The third problem, I'm not sure what to do about. I have a situation roughly like this:

  def foo {
    method a() {
      def bar {
        method b() { foo }
      }
    }
  }
  object(var("foo"),
         method("a", object(var("bar"), 
                            method("b", var("foo")))))

As part of the first problem, I have found the object “bar”; I then walk upward to determine that it is generated by a method of the object “foo”. (There are additional constraints on the structure that I haven't mentioned.) I need to determine that the reference to “foo” in “b” is in fact to the outer object, and not to some other intervening lexical binding.

I've thought of two possible solutions so far:

  1. Starting from method("a", ...), scan its children for occurrences of var("foo") and reject any which bind it. This would reject more than it needs to.
  2. Switch to a pre-processing stage (instead of the incremental operation of a zipper) to assign an identity (or a mutable analysis-information field, equivalently) to each variable binding and all of its uses, and add upward references (like in the zipper) to each node; this would make the is-this-foo-that-foo test a simple comparison.

At the moment, the second option seems attractive; tying uses to definitions of variables in a generic fashion should be useful for other types of analysis. And since this is strictly analysis, not transformation, I don't need the modify-without-mutation function of a zipper. The only reason I haven't done that yet is I think zippers are neat (and there might be other uses for an established zipper over E ASTs).

What do you think I should do?