Had a bit of a breakthrough in debugging facilities: realizing that there's no reason I shouldn't modify my local copy of the Cajita runtime to provide more debug information (and it doesn't matter if it subtly breaks security, even, as long as Caja-CapTP is still compatible with the proper version).

Only a week or so left. I've discussed the matter with MarkM, and he pointed out (again) that I should work toward having useful-to-other-people components even if I haven't finished the one I intended to do.

Toward this end, I am going to work on Data-E (which is already part of the Caja-CapTP project) as a standalone serialization utility (for network and disk) as I understand it will be useful for Shakhar.

Todo items:

  • Implement generic literals (description) to make it more JavaScript-natural.
  • Implement cyclic object graph handling using Ref promise implementation already done.
  • Commit all in-progress work as reasonable changesets (even the tests that don't yet pass).
  • Document what works, design decisions, etc.
  • Continue to work on CapTP itself as time permits.

I intend to continue working on the Caja-CapTP project beyond GSoC; both simply because it is a project of mine, and to mitigate my failure to complete the planned work on schedule. However, for the next few months it will have to compete with my college work rather than my procrastination.

Rosetta Code is a wiki where a variety of programming problems and language features are demonstrated in many languages, allowing language learning and comparison of features and paradigms.

If you'd like to contribute Common Lisp code to the project, I've just completed a classification of tasks in CL (like my prior one for E), so that it's easy to find the kind of problem one wants to work on at the moment.

One thing I find interesting about working on Rosetta Code is that many tasks bring with them some particular perspective — someone's notion of how programming necessarily works — and it can be a challenge to figure out the analogous way, or best way, to do it in your language is, and then explain it.

I'm having severe getting-around-to-actually-doing-the-work problems; I am far behind schedule. I think the problem is framing this as “work” rather than just another of the projects that I've found interesting and tinkered with. (I also blame Team Fortress 2 and Rosetta Code for providing attractive distractions...)

I've ported the “Ref” facility from E; this is necessary as CapTP is built on Ref semantics (promises, proxies, broken refs). I hope to very soon get the actual CapTP connection module up, then (as I wrote before) “define, document and implement a CapTP-over-HTTP protocol”.

(I previously mentioned implementing the Surgeon (serialization tool) but I then remembered that that's actually irrelevant to CapTP.)

Also: working without Internet access removes a whole lot of potential distractions — and one’s access to online documentation. Luckily I had some source code around which provided examples.

Common Lisp provides compile-file whose purposes is to convert a CL source file into an (implementation-defined) format which usually has precompiled code and is designed for faster loading into a lisp system.

compile-file takes the pathname of a textual lisp source file. But what if you want to compile some Lisp code that's not in a file already, perhaps because you translated/compiled it from some not-amenable-to-the-Lisp-reader input syntax, or because it contains unREADable literals? You can use a "universal Lisp file", which I know two ways to create (use whichever you find cleaner):

(cl:in-package :mypackage)
#.mypackage::*program*
or
(cl:in-package :mypackage)
(macrolet ((it () *program*))
  (it))

Suppose this is in "universal.lisp". Then to use it:

(defvar *program*)

(defun compile-to-file (form output-file)
  (let ((*program* form))
    (compile-file #p"universal.lisp"
                  :output-file output-file)))

This is just a minimal example; you'll also want to appropriately handle the return value from compile-file, provide an appropriate pathname to the universal file, etc. For example, here's an excerpt of the relevant code from E-on-CL, where I have used this technique to compile non-CL sources (emakers) into fasls:

(defparameter +the-asdf-system+ (asdf:find-system :e-on-cl))
(defvar *efasl-program*)
(defvar *efasl-result*)
...
(defun compile-e-to-file (expr output-file fqn-prefix opt-scope)
  ...
  (let* (...
         (*efasl-program*
           `(setf *efasl-result*
                  (lambda (...) ...))))
    (multiple-value-bind (truename warnings-p failure-p)
        (compile-file (merge-pathnames
                        #p"lisp/universal.lisp"
                        (asdf:component-pathname +the-asdf-system+))
                      :output-file output-file
                      :verbose nil
                      :print nil)
      (declare (ignore truename warnings-p))
      (assert (not failure-p) () "Compilation for ~A failed." output-file))))

(defun load-compiled-e (file env)
  (let ((*efasl-result* nil)
        ...)
    (load file :verbose nil :print nil)
    (funcall *efasl-result* env)))

Note that the pathname is computed relative to the ASDF system containing the universal file; also note the use of a variable *efasl-result* to simulate a "return value" from the compiled file, and the use of a lambda to provide a nonempty lexical environment, both of which are features not directly provided by the CL compiled file facility.

I'm not making progress as fast as I would like; mostly due to assorted distractions rather than Doing The Work.

That said, I've gotten both ends of serialization implemented, at least for non-circular cases: the deJSONTreeKit and deSubgraphKit have both builder and recognizer implementations. (For a terminology introduction, see the serialization paper.)

Next up is the surgeon (high-level serialization interface), and designing uncallers suitable for JavaScript. I definitely need to increase my rate of progress to get this done on schedule.

I completed enough of the E-on-JavaScript improvements, and wrote the beginnings of one Data-E kit in Cajita, together with the Updoc tests for it, and a Makefile to cajole the JavaScript and "animate" the Updoc - convert it into a HTML page that actually runs the tests.

I also improved various readme files and pages, hopefully such that someone else can get it all installed on their own system starting from my project page without too much prior knowledge.

Near-term plan: Put aside the EoJS, and get the Data-E kits working; then the surgeon; then the CapTPConnection; then define, document and implement a CapTP-over-HTTP protocol.

GSoC project update

Thursday, May 28th, 2009 10:29

I have begun the work on my GSoC project, Caja-CapTP.

I am currently working on improving E-on-JavaScript in order to be able to use its Updoc facility as the test framework for the Caja-CapTP code.

(If you don't know, Updoc is a test system which works by rerunning a recorded or made-up interactive session and confirming that the output is the same as before. The advantage of this system is that you can write test cases very simply as sequential code without gobs of assertFoo() for every particular attribute you think of testing.)

You can see my commits to EoJS at CIA.vc.

I am also learning more about exactly how Cajita works in areas such as library loading; the documentation in this area needs improvement, which I am going to work on myself and/or ask the Caja developers to clarify.

So. In JavaScript, an “object” is a set of “properties”: associations from strings to values. A method is just a property whose value is a function. Functions are called like “foo()”, properties are accessed like “bar.foo”, and methods are called like “bar.foo()”. Looks straightforward enough, right?

Now, how does a method access the state of its object? Without inheritance, you could just have the method functions of a given object all close over some variables; but JavaScript does have prototype inheritance, so the necessary access is provided by binding the variable “this”, in the method body, to the object the method was invoked on.

And when does this happen? When you use the method call syntax. bar.foo() is not the same as

var m = bar.foo;
m();

(It is the same as m.call(bar)call is a method on function objects which invokes them with this bound — but that's beside the point.)

So, the syntax is non-compositional.

Not only that, but it is enthusiastically so: bar.foo() is the same as (bar.foo)() — the parentheses do not break up the method call construct!

Something I've wished for several times recently is a database-document program.

By "document" I mean that the database is a single file, which I can move, copy, etc., as opposed to living in a database server which has to stay up, uses accounts and ACLs, needs special backup procedures, and so on. It doesn't need to support humongous data sets — fits-in-memory and even linear searches are fine.

I am aware that people use spreadsheets for such purposes, but I would like to have named, typed, and homogeneous columns, easy sorting/filtering/querying, etc. which I assume I'm not going to find there. Relational would be nice too.

It must be GUI, and run on Mac OS X, but it doesn't have to be thoroughly native — I can stand the better sort of Java or perhaps even X11 app.

And finally, it should have a file format that either is obvious how to parse, or has a specification, or is supported by many other programs.

Does such a thing exist?

(If not, I might write it.)

A dreadful thing

Saturday, October 4th, 2008 10:32
some type *foo;
size_t count = ...;

...

foo = malloc(count * sizeof * foo);

(no subject)

Sunday, August 31st, 2008 13:11

“Language-independent” just means they invented a new language.

Apple's Sampler is a profiler based on the principle of periodically collecting the entire call stack of the executing threads, then summarizing these stacks to show what occurs frequently; primarily, as a tree, rooted at the bottom of the stack, where each node shows the number of times that call sequence was found on the stack.

SBCL's sb-sprof is a profiler which also collects call stacks, but its summary report is much less useful to me as it does not provide the per-branch counting; just top-of-stack frequencies and a caller/callee graph.

Therefore, I examined Sampler's file format and wrote code to generate it from sb-sprof's record.

Screenshot of Sampler with SB-SPROF data

The file is mixed text/binary, LF line endings. The grammar, as far as I've determined it, is:

  "@supersamplerV1.0" LF
  "@symboltableV1.1" LF
  (TAB int32<id> TAB int32<unknown> 
   TAB text<symbol> 
   TAB text<library-path> TAB text<library-path> LF)*
  "@end" LF
  (
    "@threadV1.0" TAB int16Hex<thread-id> LF
    (
      TAB int32<1> int32<0> int32<1> int32<count of stack-frame> (int32<stack-frame>)* LF
    )*
  )*
  "@end" LF

where by "int32" I mean a big-endian 32-bit (unsigned?) integer (i.e. four not-necessarily-ASCII bytes), and by "int16Hex" I mean a 16-bit integer in hexadecimal (i.e. four ASCII bytes).

"id" is an arbitrary identifier for this symbol. "unknown" is occasionally nonzero, but I don't know what it means. "symbol" is the name of a function/method found on the stack. "library-path" is the pathname to the object file it was loaded from (relative in the case of a standard framework, e.g. "Carbon.framework/HIToolbox.framework/HIToolbox").

"thread-id" is an identifier for the thread, which should occur as an "id" in the symbol table; the upper 16 bits evidently must be 0. Thread symbol table entries have a name and library path which is the string ("Thread_" int16<thread-id>); I have not confirmed whether this is necessary.

Each entry in a @thread block is one sampling of the whole stack of that thread. I do not know what the 1, 0, and 1 mean, but the fourth integer is the number of frames on the stack; immediately after are that many integers, each of which is an id from the symbol table.

Files generated from this structure are accepted by Sampler, but not always by Shark; I don't know why, and my attempt at tracking it down made it seem to depend on the size of the trace file.

Here is code to generate such a file from sb-sprof data; it should be loaded in the SB-SPROF package: SB-SPROF to Sampler )

This code generates a noninteractive Sampler-style tree report from SB-SPROF data. SB-SPROF tree report )

Re: Zippers

Thursday, May 29th, 2008 22:18

For those wondering about the problem I was trying to solve with a zipper: After all the advice and some consideration, I decided to go with “option #2”: a whole-tree processing step which adds equal labels to corresponding variable uses and bindings. I’ve implemented all of the analysis I described in the original post using this system.

(However, the code isn't quite finished (as I discovered some problems with a different aspect of it) so it's not in any public repository yet.)

It occurs to me that there is a similarity between a zipper and the combination of an expression and an environment, as for evaluation. I've got a problem that seems almost solvable by use of this, and I would especially like to hear Planet Haskell readers' thoughts on the matter.

(The expressions given here are in E syntax.)

For example, let's start with the expression

  def x := 74
  x + 1

For those unfamiliar with E, the tree this expression parses into has structure like this (ignoring some extra features and names):

  sequence(define(var("x"), literal(74)),
           call(var("x"), "add", literal(1)))

If we move “down” zipperwise into the second subexpression of the sequence, then we get the expressionx + 1” with the contextdef x := 74 before this in a sequence”. Similarly, an interpreter uses an intermediate result of “x + 1”, “x is bound to 74”.


With a slightly extended context, I'm trying to use this to express program analysis in an auditor. The zipper context now tracks a small amount of information about variables, as well as the enclosing expressions.

The first problem is to determine that all occurrences of a given variable are used in a particular fashion: e.g. for f, permitting “f(1)” but not “f("foo")” or “somethingElse(f)” (where the third case would permit f to be used arbitrarily since the auditor doesn't get to see somethingElse’s code). This is easily accomplished by walking the entire expression looking for occurrences of var("f"), and walking “up” from each of those occurrences to check that the enclosing expressions form a permitted usage.

The second problem is to determine that the variable in a particular position is a free variable, and that its guard is of a particular form. This can be done by extending the context objects to record what variable names are bound by any expression preceding the current location (that is not inside its own block); thus walking up the contexts to the top-level determines whether the variable is free.

The third problem, I'm not sure what to do about. I have a situation roughly like this:

  def foo {
    method a() {
      def bar {
        method b() { foo }
      }
    }
  }
  object(var("foo"),
         method("a", object(var("bar"), 
                            method("b", var("foo")))))

As part of the first problem, I have found the object “bar”; I then walk upward to determine that it is generated by a method of the object “foo”. (There are additional constraints on the structure that I haven't mentioned.) I need to determine that the reference to “foo” in “b” is in fact to the outer object, and not to some other intervening lexical binding.

I've thought of two possible solutions so far:

  1. Starting from method("a", ...), scan its children for occurrences of var("foo") and reject any which bind it. This would reject more than it needs to.
  2. Switch to a pre-processing stage (instead of the incremental operation of a zipper) to assign an identity (or a mutable analysis-information field, equivalently) to each variable binding and all of its uses, and add upward references (like in the zipper) to each node; this would make the is-this-foo-that-foo test a simple comparison.

At the moment, the second option seems attractive; tying uses to definitions of variables in a generic fashion should be useful for other types of analysis. And since this is strictly analysis, not transformation, I don't need the modify-without-mutation function of a zipper. The only reason I haven't done that yet is I think zippers are neat (and there might be other uses for an established zipper over E ASTs).

What do you think I should do?

GraphLife

Tuesday, August 28th, 2007 17:50

This is a somewhat odd cellular automaton implementation in Haskell. It encodes the shape of the world, and the stepping rule, in the data structure. Briefly:

data Cell v n = Cell {
  value     :: v,
  neighbors :: n (Cell v n),
  future    :: Cell v n }

-- Build a cell and its chain of futures, given its rule, similar neighbors,
-- and initial value.
mkCell :: Functor n => (v -> n v -> v) -> v -> n (Cell v n) -> Cell v n
mkCell rule val nei = initial
  where initial = Cell val nei (calcFuture initial)
        calcFuture c = c'
          where c' = Cell (rule (value c) (fmap value (neighbors c)))
                          (fmap future (neighbors c))
                          (calcFuture c')

data Moore c = Moore c c c c c c c c

-- Conway's Life
evalLife :: Bool -> Moore Bool -> Bool
evalLife v n = f v (sumN fromEnum n)
  where f _    3 = True
        f True 2 = True
        f _    _ = False

The remainder of the code is routines for setting up worlds and displaying them.

Timer project

Sunday, July 22nd, 2007 23:02

This is my first nontrivial electronics project. The goal is to create a better kitchen timer; over the years I’ve had many break, and the model I’m currently using has an annoying UI and is usually too loud.

Ben Jackson sent me a PIC16F877 microcontroller and a bunch of parts, and I've finally gotten around to doing something with them. (Some of the parts in the photo I already had.)

I've written a separate page with more details on the timer prototype and my plans.

Steps to get to this point: )

The big things remaining are a multi-digit LCD display, calibration to real time, better input devices, and building it into a case.

(Please let me know if you'd like more information about some part.)

I've updated my quines page with a new Haskell quine and tags indicating whether the quine is for a REPL or writes output itself.

On Sunday, I was reading about arrows in Haskell, and I noticed that these diagrams of the primitive arrow functions looked rather like diagrams of data flow in concatenative (stack-based) languages.

This suggested that they might be related, or transformable.

(IRC log of the early discussion)

I have now found that simple concatenative code is quite easily expressed with arrows, but there are apparent limitations depending on the value representation.

  • If you use tuples (i.e. (a, (b, ()))), then the standard arrow combinators can be used for some nice definitions (e.g. call and m1_0 in the code below), and you automatically get Haskell's static typing for stack elements, but recursion and general quotation calling is not possible because the entire "stack", even the items not touched by the current "word", needs a specific type. I'd think this could be solved by existential types but I haven't seen a way to do so yet.
  • If you use lists (i.e. a : b : []), then general calls and recursion are possible, but the stack is homogeneous, and the type system can't see anything interesting about words' stack usage.

Below are my (rather disorganized) experiments so far. If you see a way to remove the restrictions of the tuple version, please let me know!

Concat.hs )

I wanted a little bit of assistance with a tile-flipping puzzle, so I wrote this little bit of almost-Haskell code.

let zero = replicate 4 $ replicate 4 $ False
let gshow g = putStrLn $ unlines [[if v then '#' else '.' | v <- r] | r <- g]
let flips = [(-1,0),(0,0),(1,0),(0,-1),(0,1)]
let gflip (x,y) g = [ [ ((vi-x,ri-y) `elem` flips) /= v | (v,vi) <- zip r [0..]] | (r,ri) <- zip g [0..]]
let grid l = foldr gflip zero l

It doesn't compute solutions given a puzzle; rather, it shows the result of a series of flips.

Prelude> gshow $ grid [(1,1)]
.#..
###.
.#..
....

Prelude> gshow $ grid [(2,0),(1,2),(2,3)]
.###
.##.
##..
..##

I tried a version using 2-dimensional arrays instead of lists, but gshow and gflip turned out uglier than they already are. Suggestions welcome.