One of the nice things about Common Lisp is the pervasive use of (its notion of) symbol objects for names. For those unfamiliar, I'll give a quick introduction to the relevant parts of their semantics before going on to my actual proposal for a “good parts version”.

A CL symbol is an object (value, if you prefer). A symbol has a name (which is a string). A CL package is a map from strings to symbols (and the string key is always equal to the symbol's name). A symbol may be in zero or more packages. (Note in particular that symbol names need not be unique except within a single package.)

Everywhere in CL that something is named — a variable, a function, a class, etc. — the name is a symbol object. (This is not impractical because the syntax makes it easy to write symbols; in fact, easier than writing strings, because they are unquoted.)

The significance of this is that the programmer need never give significance to characters within a string name in order to avoid collisions. Namespacing of explicitly written symbols is handled by packages; namespacing of programmatically generated symbols is handled by simply never putting them in any package (thus, they are accessible only by passing references); these are known as gensyms.

Now, I don't mean to say that CL is perfect; it fails by way of conflating too many different facilities on a single symbol (lexical variables, dynamic variables, global non-lexical definitions, ...), and some of the multiple purposes motivate programmers to use naming conventions. But I think that there is value in the symbol system because it discourages the mistake of providing an interface which requires inventing unique string names.

(One thinking along capability lines might ask — why use names rather than references at all? Narrowly, think about method names (selectors, for the Smalltalk/ObjC fans) and module exports; broadly, distribution and bootstrapping.)


So, here’s my current thought on a “good parts version”, specifically designed for an E-style language with deep equality/immutability and no global mutable state.

There is a notion of name, which includes three concrete types:

  1. A symbol is an object which has a string-valued name, and whose identity depends solely on that string.
  2. A gensym also has a name, but has an unique identity (selfish, in E terms). Some applications might reject gensyms since they are not data.
  3. A space-name holds two names and its identity depends solely on that combination. (That is, it is a “pair” or “cons” specifically of names.)

Note that these three kinds of objects are all immutable, and use no table structures, and yet can produce the same characteristics of names which I mentioned above. (For implementation, the identity of a name as above defined can be turned into pointer identity using hash consing, a generalization of interning.) Some particular examples and notes:

  • A CL symbol in a package corresponds to a pair of two symbols, or perhaps a gensym and a symbol. This correspondence is not exact, of course. (In particular, there is no notion here of the set of exported symbols in a package. But that's the sort of thing you have to be willing to give up to obtain a system without global mutable state. And you can still imagine 'linting' for unexpected symbols.)
  • The space-name type means that names can be arbitrary binary trees. If we consistently give the left side a “namespace” interpretation and the right side a “local name” one, then we have a system, I think, where people can carve out all sorts of namespaces without ever fearing collisions or conflicts, should it become necessary. Which probably means it's massively overdesigned (cf. "worse is better").
  • Actual use case example: Suppose one wishes to define (for arbitrary use) a subtype of some well-known interface, which adds one method. There is a risk that your choice of name for that method conflicts with someone else's different subtype. Under this system, you can construct a space-name whose two components are a large random number (i.e. a unique ID) acting as the namespace, and a symbol which is your chosen simple name. One can imagine syntax and tools which make it easy to forget about the large random number and merely use the simple name.
  • It's unclear to me how these names would be used inside the lexical variable syntax of a language, if they would at all; I suspect the answer is that they would not be, or mostly confined to machine-generated-code cases. The primary focus here is improving the default characteristics of a straightforwardly written program which uses a map from names to values in some way.

(This is all very half-baked — I'm just publishing it on the grounds described in my previous post: in the long run I'll have more ideas than I ever implement, and this is statistically likely to be one of them, so I might as well publish it and hope someone else finds some use for it; if nothing else, I can stop feeling any obligation to remember it in full detail.)

“:”

Thursday, February 21st, 2013 21:34

When Larry Wall was designing Perl 6, he started with lots of community proposals, from which he made the following observation:

I also discovered Larry's First Law of Language Redesign: Everyone wants the colon.

When I was recently trying to redesign E, I found that this holds true even if only one person is involved in the process. One of the solutions considered was having “” and “ :” be two different tokens…

I really haven't been posting very much, have I? It's mostly the job occupying most of my “creative energy”, but I've also been doing a little bit of this and that and not ever finishing something to the point of feeling like writing it up.

On the programming-projects front, I'm attempting to extract two reusable libraries from Cubes for the benefit of other web-based games.

  • Measviz takes performance-measurement data (frames per second and whatever else you want) and presents (in HTML) a compact widget with graphs; my excuse for not announcing it is that the API needs revision, and I haven't thought of a good toy example to put in the documentation-and-demo page I'm writing, but if you're willing to deal with later upgrades it's ready to use now.
  • The other library, currently in need of a good name, is a generalized keybinding library (generalized in that it also handles gamepads/joysticks, which are completely different). You define the commands in your application, and it handles feeding events into them. Commands can be polled, or you can receive callbacks on press and release, with optional independent autorepeat. It's currently in need of a name, and also of API cleanup.

I've been making some sketches towards a redesign of E (list archive pointer: starting here), basically to take into account everything we've learned over the years without being constrained by compatibility, but it hasn't gotten very far, partly because language syntax is hard — all options are bad. (The current E syntax is pretty good for usability, but it has some particularly verbose/sea-of-punctuation corner cases, and I'd also like to see a simpler syntax, with more facilities moved into code libraries.)

I’ve uploaded almost all of my published Git repositories (previously hosted on a git-only server on on switchb.org, which is down at the moment) to my account on GitHub. Please update your remote URLs if you have any git clones.

The motivation for this change is simply that GitHub offers better visibility — an automatic web presence for each project, including viewing repository contents. I am not intending to depend on GitHub’s continued existence, of course; I still have local copies of each project, and additionally I plan to arrange so switchb.org automatically mirrors my GitHub repositories.

What I've just uploaded to GitHub also includes a project which I have not previously mentioned, timeline-ui:

A user interface experiment. Multiple types of time-series data, variously static/interactive, historical/future, etc. are displayed in a single view. (This was an idea I had floating around and which I used in 2010 for a class project; there is a lot more that could be done with it.) Written in Java.

I was going to write more about the concept, but I never got around to it; this will have to do.

List of projects just moved to GitHub:

Google again

Saturday, February 12th, 2011 09:28

It’s now confirmed that I'm going to be doing the same thing this summer as last summer: working at Google on Caja.

All comments, congratulations, housing recommendations, and activity suggestions are welcome.

One bit of advice sometimes given to the novice programmer is don't ever compare floating-point numbers for equality, the reason being that floating-point calculations are inexact, and one should use a small epsilon, allowable error, instead, e.g. if (abs(value - 1.0) < 0.0001).

This advice is actually wrong, or rather, overly strong. There is a situation in which it is 100% valid to compare floats, and that is an cache or anything else which is comparing a float with, not a specific constant (in which case the epsilon notion is appropriate), but rather a previous value from the same source; floating-point numbers may be approximations of exact arithmetic, but that doesn't mean you won't get the same result from the same inputs.

So, don't get any bright ideas about outlawing aFloat == anotherFloat.

Unfortunately, there's a case in which the common equality on floats isn't what you want for previous-value comparison anyway: for most definitions of ==, NaN ≠≠ NaN. This definition makes sense for numerics (and is conformant to IEEE floating point specifications), because NaN is “not a number”; it's an error marker, provided as an alternative to exceptions (or rather, floating point error signals/traps/whateveryoucallit) which propagates to the end of your calculation rather than aborting it and requiring immediate error handling, which can be advantageous in both code simplicity and efficiency. So if you think about calculating within the space of “numbers”, then NaN is outside of that. But if you're working in the space of “results of calculations”, then you probably want to see NaN == NaN, but that may not be what you get.

Mathematically, the floating-point comparison is not an equivalence relation, because it is not reflexive on NaN.

(It's also typically the case that 0 == -0, even though positive and negative zero are distinct values. Oh, and NaNs carry data, but I'm not talking about that.)

What to do about it, in a few languages:

JavaScript

Even the === operator does not compare identities rather than numeric values, so if you want to compare NaN you have to do it as a special case. Google Caja handles it this way:

/**
 * Are x and y not observably distinguishable?
 */
function identical(x, y) {
  if (x === y) {
    // 0 === -0, but they are not identical
    return x !== 0 || 1/x === 1/y;
  } else {
    // NaN !== NaN, but they are identical.
    // NaNs are the only non-reflexive value, i.e., if x !== x,
    // then x is a NaN.
    return x !== x && y !== y;
  }
}
Common Lisp

The = operator generally follows the IEEE comparison (if the implementation has NaN at all) and the eql operator does the identical-object comparison.

E

The == operator is guaranteed to be reflexive, and return false for distinguishable objects, so it is appropriate for the “cache-like” use cases, and the <=> operator does conventional !(NaN <=> NaN), 0.0 <=> -0.0 floating-point comparison.

One of the things I’ve been procrastinatingah, not had the time to do, being busy with school and other projects, is announcing and working on a job search for this summer. I have posted my resume, but I didn’t even get around to mentioning that. The process really doesn’t excite me that much — it’s essentially research, comparison shopping, which I have never been very fond of.

But, last October, I was contacted out of the blue by a recruiter asking if I was interested in opportunities at — Google. After checking that it wasn’t a spoof I naturally said yes, and after a number of rounds of information exchange and interviews,

This summer, I will be (well, subject to my completing the process of accepting the offer) working as a Software Engineering Intern at Google, with the Caja team, in Mountain View, CA.

So — whoa and yay and other such cheerful words. And thanks to my friends at Google who referred me and nudged the process along.*


The most uncertain remaining step is finding housing in or near Mountain View (could be as far as San Francisco or San Jose; Google runs a shuttle bus and is convenient to public transportation). Google has provided some general advice-for-interns, but I’d like to hear input from my readers and friends who already live in in the area.

Some parameters:

  • I would consider living with other people, but I wouldn’t want to take a chance on a complete unknown. (So if you are someone or know someone with a room...)
  • Speaking of taking chances, make the chance of being mugged on the way home in the evening very small, please.
  • I am traveling from the east coast, probably by train, so I don’t want to have to transport a lot of stuff, or buy items that I’ll use for only three months — so a furnished space is better.
  • I do not own a car, but I know how to drive one.
  • I do not own a bicycle, but I used to know how to ride one.
  • This will be the first time I have lived outside of my home city for longer than a week’s visit/vacation.

*Y’know how job search advice is big on saying you should be “networking”? If you’ve thought you’re too much of the non-face-to-face-social non-polite-small-talk would-rather-talk-to-people-through-the-computer sort for that — take me as an example. This opportunity came to me because of other people who knew me entirely through my work on open source projects (E, and thus Caja-CapTP) — I didn’t do anything that I wouldn’t have done for other reasons anyway. I’m not saying you shouldn't do any of the other stuff you might be thinking of — I’m saying this stuff counts.

Normal People Things:

  • Gone to a party unrelated to my family.

Never thought I'd do:

  • Worn a t-shirt with text on it.
  • Written an essay structured using a gratuitous extended metaphor.

Extra nerd points:

  • Programmed a number type which carries units and error values, to reduce the tedium of lab reports.
  • Learned to write all my assignments in LATEX.

The previously-mentioned darcs repositories (which are e-aui, e-modules, factor, norsmu) have been moved to my new server at http://switchb.org/darcs/. As far as I recall, I haven't published any links to these except privately and to mailing lists, so I haven’t updated any links for the new location; but if you had the old location, now you know where to go.

A machine I used to use to host some web services, bots, and repositories became no longer accessible from the Internet, as a result of which I've had to move what I was serving from it; some to switchb.org, some to personal machines.

I took the opportunity to clean things up a bit, as a result of which I now have better backups, more polished services, and know a little bit more about configuring Apache — though not as much as I perhaps should.

  • My Subversion repositories are now served over HTTP, and therefore browsable; and they are now backed up daily (using svnsync triggered by a cron job) to my laptop, and thence to all its backups.

    (I wasted several minutes on remembering that cron will ignore the last line of a crontab file if it doesn't end with a newline; after listening to me grumbling about this, someone made a suggestion to end the file with a comment, so that the last line is harmless whether ignored or not, and also reminds one of the issue.)

    If you have a working copy of one of my repositories (E-on-CL, E-on-JavaScript, MudWalker, Den, etc.), here's a guide to the changed URLs.

  • My other Tahoe-LAFS volunteer grid storage node is now residing on a machine on my home LAN.

  • Finally, some simple data-querying web services I wrote for Waterpoint's word games have now been moved to switchb.org; I also took the time to prettify their URLs (no cgi-bin or .cgi) and write documentation.

I haven't yet gotten to working on the bots, darcs repositories, or miscellaneous other stuff I had there.

(Pondering moving my blog over to switchb.org as well so as to not have ads, especially now that I found I can still have LJ-friends by way of OpenID. (Hm, but reading friends-locked posts over RSS might not work since there's no username+password for LJ to accept. Anyone have experience with that situation?))

Apache configuration questions:

  1. If I have multiple variants of a document (e.g. foo.html foo.pdf foo.txt) handled by MultiViews, so the canonical URL of the document is extensionless (“foo”), how do I properly control the default variant to serve in the event that the client does not express a preference via the Accept header? (Without doing so, I found that it would serve the .txt version, whereas I would prefer the HTML.) All that I found that worked was to create a type map file named “foo” with quality values, and force it to be interpreted as a type map using <FilesMatch>. This seems kludgy to me.
  2. What is the right way to serve CGIs, not in a designated cgi-bin directory, and without any .cgi extension in the URL? I initially tried to apply mod_rewrite, but I couldn't get it to work such that /foo internally contacted foo.cgi whereas /foo.cgi redirected to /foo. I resorted to another <FilesMatch> explicitly listing the name of each CGI and doing SetHandler cgi-script.
  3. What is the right way to handle “testing” vs. “deployment” configurations, where the relevant Directory, Location, etc may be different depending on which server or subdirectory the site is set up on? I see that one may use environment variables — should I just set up variables containing the path prefixes for the particular host before including the generic configuration file?

Late update on Caja-CapTP:

Google Summer of Code is over. I passed based on revised goals, but I'm not happy with the state of the code and I did not complete any significant part of the original plan.

Regarding the items mentioned in my last update:

  • Write more documentation/comments.
  • Commit as much of the work-in-progress as I can.
    • ...including the incomplete CapTPConnection, even though its tests don't pass yet, so that the partial work can be counted.

I committed CapTPConnection, and found and fixed enough infrastructure (whenResolved, CommTable, SwissTable, deSubgraphKit, etc.) bugs that it starts up and can do a basic operation or two. It's not useful for anything, but it's a lot closer to running than it was at the time of my last update.

Also, I removed dependencies on 'window' so in principle it will operate on a non-browser (server) JavaScript implementation. This has not been exercised because there is no browserless driver for the test scripts yet.

I stated that I would continue working on Caja-CapTP past the GSoC work period; however, with the time occupied by schoolwork, I have not had time or brain space to do so yet. I am not going to abandon the project.

Stuff done:

Stuff to do:

  • Write more documentation/comments.
  • Commit as much of the work-in-progress as I can.
    • ...including the incomplete CapTPConnection, even though its tests don't pass yet, so that the partial work can be counted.

Had a bit of a breakthrough in debugging facilities: realizing that there's no reason I shouldn't modify my local copy of the Cajita runtime to provide more debug information (and it doesn't matter if it subtly breaks security, even, as long as Caja-CapTP is still compatible with the proper version).

Only a week or so left. I've discussed the matter with MarkM, and he pointed out (again) that I should work toward having useful-to-other-people components even if I haven't finished the one I intended to do.

Toward this end, I am going to work on Data-E (which is already part of the Caja-CapTP project) as a standalone serialization utility (for network and disk) as I understand it will be useful for Shakhar.

Todo items:

  • Implement generic literals (description) to make it more JavaScript-natural.
  • Implement cyclic object graph handling using Ref promise implementation already done.
  • Commit all in-progress work as reasonable changesets (even the tests that don't yet pass).
  • Document what works, design decisions, etc.
  • Continue to work on CapTP itself as time permits.

I intend to continue working on the Caja-CapTP project beyond GSoC; both simply because it is a project of mine, and to mitigate my failure to complete the planned work on schedule. However, for the next few months it will have to compete with my college work rather than my procrastination.

I'm having severe getting-around-to-actually-doing-the-work problems; I am far behind schedule. I think the problem is framing this as “work” rather than just another of the projects that I've found interesting and tinkered with. (I also blame Team Fortress 2 and Rosetta Code for providing attractive distractions...)

I've ported the “Ref” facility from E; this is necessary as CapTP is built on Ref semantics (promises, proxies, broken refs). I hope to very soon get the actual CapTP connection module up, then (as I wrote before) “define, document and implement a CapTP-over-HTTP protocol”.

(I previously mentioned implementing the Surgeon (serialization tool) but I then remembered that that's actually irrelevant to CapTP.)

Also: working without Internet access removes a whole lot of potential distractions — and one’s access to online documentation. Luckily I had some source code around which provided examples.

Common Lisp provides compile-file whose purposes is to convert a CL source file into an (implementation-defined) format which usually has precompiled code and is designed for faster loading into a lisp system.

compile-file takes the pathname of a textual lisp source file. But what if you want to compile some Lisp code that's not in a file already, perhaps because you translated/compiled it from some not-amenable-to-the-Lisp-reader input syntax, or because it contains unREADable literals? You can use a "universal Lisp file", which I know two ways to create (use whichever you find cleaner):

(cl:in-package :mypackage)
#.mypackage::*program*
or
(cl:in-package :mypackage)
(macrolet ((it () *program*))
  (it))

Suppose this is in "universal.lisp". Then to use it:

(defvar *program*)

(defun compile-to-file (form output-file)
  (let ((*program* form))
    (compile-file #p"universal.lisp"
                  :output-file output-file)))

This is just a minimal example; you'll also want to appropriately handle the return value from compile-file, provide an appropriate pathname to the universal file, etc. For example, here's an excerpt of the relevant code from E-on-CL, where I have used this technique to compile non-CL sources (emakers) into fasls:

(defparameter +the-asdf-system+ (asdf:find-system :e-on-cl))
(defvar *efasl-program*)
(defvar *efasl-result*)
...
(defun compile-e-to-file (expr output-file fqn-prefix opt-scope)
  ...
  (let* (...
         (*efasl-program*
           `(setf *efasl-result*
                  (lambda (...) ...))))
    (multiple-value-bind (truename warnings-p failure-p)
        (compile-file (merge-pathnames
                        #p"lisp/universal.lisp"
                        (asdf:component-pathname +the-asdf-system+))
                      :output-file output-file
                      :verbose nil
                      :print nil)
      (declare (ignore truename warnings-p))
      (assert (not failure-p) () "Compilation for ~A failed." output-file))))

(defun load-compiled-e (file env)
  (let ((*efasl-result* nil)
        ...)
    (load file :verbose nil :print nil)
    (funcall *efasl-result* env)))

Note that the pathname is computed relative to the ASDF system containing the universal file; also note the use of a variable *efasl-result* to simulate a "return value" from the compiled file, and the use of a lambda to provide a nonempty lexical environment, both of which are features not directly provided by the CL compiled file facility.

I'm not making progress as fast as I would like; mostly due to assorted distractions rather than Doing The Work.

That said, I've gotten both ends of serialization implemented, at least for non-circular cases: the deJSONTreeKit and deSubgraphKit have both builder and recognizer implementations. (For a terminology introduction, see the serialization paper.)

Next up is the surgeon (high-level serialization interface), and designing uncallers suitable for JavaScript. I definitely need to increase my rate of progress to get this done on schedule.

I completed enough of the E-on-JavaScript improvements, and wrote the beginnings of one Data-E kit in Cajita, together with the Updoc tests for it, and a Makefile to cajole the JavaScript and "animate" the Updoc - convert it into a HTML page that actually runs the tests.

I also improved various readme files and pages, hopefully such that someone else can get it all installed on their own system starting from my project page without too much prior knowledge.

Near-term plan: Put aside the EoJS, and get the Data-E kits working; then the surgeon; then the CapTPConnection; then define, document and implement a CapTP-over-HTTP protocol.

GSoC project update

Thursday, May 28th, 2009 10:29

I have begun the work on my GSoC project, Caja-CapTP.

I am currently working on improving E-on-JavaScript in order to be able to use its Updoc facility as the test framework for the Caja-CapTP code.

(If you don't know, Updoc is a test system which works by rerunning a recorded or made-up interactive session and confirming that the output is the same as before. The advantage of this system is that you can write test cases very simply as sequential code without gobs of assertFoo() for every particular attribute you think of testing.)

You can see my commits to EoJS at CIA.vc.

I am also learning more about exactly how Cajita works in areas such as library loading; the documentation in this area needs improvement, which I am going to work on myself and/or ask the Caja developers to clarify.

Re: Zippers

Thursday, May 29th, 2008 22:18

For those wondering about the problem I was trying to solve with a zipper: After all the advice and some consideration, I decided to go with “option #2”: a whole-tree processing step which adds equal labels to corresponding variable uses and bindings. I’ve implemented all of the analysis I described in the original post using this system.

(However, the code isn't quite finished (as I discovered some problems with a different aspect of it) so it's not in any public repository yet.)