[personal profile] kpreid

It occurs to me that there is a similarity between a zipper and the combination of an expression and an environment, as for evaluation. I've got a problem that seems almost solvable by use of this, and I would especially like to hear Planet Haskell readers' thoughts on the matter.

(The expressions given here are in E syntax.)

For example, let's start with the expression

  def x := 74
  x + 1

For those unfamiliar with E, the tree this expression parses into has structure like this (ignoring some extra features and names):

  sequence(define(var("x"), literal(74)),
           call(var("x"), "add", literal(1)))

If we move “down” zipperwise into the second subexpression of the sequence, then we get the expressionx + 1” with the contextdef x := 74 before this in a sequence”. Similarly, an interpreter uses an intermediate result of “x + 1”, “x is bound to 74”.


With a slightly extended context, I'm trying to use this to express program analysis in an auditor. The zipper context now tracks a small amount of information about variables, as well as the enclosing expressions.

The first problem is to determine that all occurrences of a given variable are used in a particular fashion: e.g. for f, permitting “f(1)” but not “f("foo")” or “somethingElse(f)” (where the third case would permit f to be used arbitrarily since the auditor doesn't get to see somethingElse’s code). This is easily accomplished by walking the entire expression looking for occurrences of var("f"), and walking “up” from each of those occurrences to check that the enclosing expressions form a permitted usage.

The second problem is to determine that the variable in a particular position is a free variable, and that its guard is of a particular form. This can be done by extending the context objects to record what variable names are bound by any expression preceding the current location (that is not inside its own block); thus walking up the contexts to the top-level determines whether the variable is free.

The third problem, I'm not sure what to do about. I have a situation roughly like this:

  def foo {
    method a() {
      def bar {
        method b() { foo }
      }
    }
  }
  object(var("foo"),
         method("a", object(var("bar"), 
                            method("b", var("foo")))))

As part of the first problem, I have found the object “bar”; I then walk upward to determine that it is generated by a method of the object “foo”. (There are additional constraints on the structure that I haven't mentioned.) I need to determine that the reference to “foo” in “b” is in fact to the outer object, and not to some other intervening lexical binding.

I've thought of two possible solutions so far:

  1. Starting from method("a", ...), scan its children for occurrences of var("foo") and reject any which bind it. This would reject more than it needs to.
  2. Switch to a pre-processing stage (instead of the incremental operation of a zipper) to assign an identity (or a mutable analysis-information field, equivalently) to each variable binding and all of its uses, and add upward references (like in the zipper) to each node; this would make the is-this-foo-that-foo test a simple comparison.

At the moment, the second option seems attractive; tying uses to definitions of variables in a generic fashion should be useful for other types of analysis. And since this is strictly analysis, not transformation, I don't need the modify-without-mutation function of a zipper. The only reason I haven't done that yet is I think zippers are neat (and there might be other uses for an established zipper over E ASTs).

What do you think I should do?

Just a few drunken thoughts to consider...

Date: 2008-05-05 01:27 (UTC)
From: [identity profile] winterkoninkje.livejournal.com
I'd suggest that it might be worth reconsidering the structure of the code that the zipper is over (to the extent you can control this). It seems more natural to me to formulate "up" and "down" as entering and exiting block-scope entities (such as `def` blocks, `method` blocks, etc) and to have "left" and "right" cover the issue of ranging through statements within a block. The notion of environment is somewhat complicated by this (leftward in-block + upward) since you have a "tilted" tree instead of a regular tree, but it captures scoping issues more accurately.

So checking the foo, from the usage site we check to the left (nothing there), then we move up through b, and check to the left (nothing there), then move up through bar,... until we hit the definition of foo. Depending on how you've implemented your zipper you could either do this lookup scan each time, or maintain an auxilliary structure that tracks the current environment in order to jump back quickly (e.g. using a backtracking (http://okmij.org/ftp/Computation/monads.html#LogicT) monad), or your zipper could be set up with some sort of CPS which combines those two approaches (i.e. the backtracking monad is your encoding of the left/up-ward closure of the environment).
From: [identity profile] darius.livejournal.com
Maybe the problem is, once you've walked back from the use to the defining "def foo", cheaply determining that it's the *same* "def foo" you're looking for. So attach a unique id to each expression you may care about that way?
From: [identity profile] kpreid.livejournal.com
Yup, that's option 2, which I'm working on right now.
From: [identity profile] darius.livejournal.com
That's not quite what I meant -- you were at "{ foo }", then walked back to the enclosing object "def foo", then starting again from "{ foo }", you walk back to the binding expression for foo, which is a "def foo" again -- is it the same one? Let's compare their IDs.

I'm not saying that's better than option 2 explicitly linking bindings and uses.
From: [identity profile] kpreid.livejournal.com
Ah, sorry, I see the distinction now.

Not so drunk thinking...

Date: 2008-05-06 06:05 (UTC)
From: [identity profile] winterkoninkje.livejournal.com
If you can define an equality operation over zippers, then you walk from foo#1 to def foo#1, and from foo#2 to def foo#2, and then test if the zipper focused at def foo#1 is equal to the zipper focused at def foo#2.

The challenge, of course, is defining that innocuous equality function. A naive implementation would recurse through the upward triangle of the parse tree twice for each equality check, which is... suboptimal. One way to speed this equality check up is by interning zipper items, in which case you only need to check whether the unique ids are equal (hence the paths are equal, hence they're the same def). This is essentially the same thing as preprocessing to get unique ids, though with minor implementation differences for where the computations get pushed off to.

The language I'm working on now does interning automagically so I wasn't thinking about the fact that it'd have to be implemented for the problem in question. One nice thing about the interning approach is that the "preprocessing" only happens as you're walking over the tree and hence is interleaved with other computations. If you're only ever interested in def nodes, then interning zippers will label too much; but if you're interested in other sites as well, you may want those labels.

thanks much

Date: 2008-05-07 21:55 (UTC)
From: (Anonymous)
favorited this one, man

(no subject)

Date: 2008-05-15 16:20 (UTC)
From: [identity profile] eldridgge83.livejournal.com
Nice userpic!