[personal profile] kpreid

One of the nice things about Common Lisp is the pervasive use of (its notion of) symbol objects for names. For those unfamiliar, I'll give a quick introduction to the relevant parts of their semantics before going on to my actual proposal for a “good parts version”.

A CL symbol is an object (value, if you prefer). A symbol has a name (which is a string). A CL package is a map from strings to symbols (and the string key is always equal to the symbol's name). A symbol may be in zero or more packages. (Note in particular that symbol names need not be unique except within a single package.)

Everywhere in CL that something is named — a variable, a function, a class, etc. — the name is a symbol object. (This is not impractical because the syntax makes it easy to write symbols; in fact, easier than writing strings, because they are unquoted.)

The significance of this is that the programmer need never give significance to characters within a string name in order to avoid collisions. Namespacing of explicitly written symbols is handled by packages; namespacing of programmatically generated symbols is handled by simply never putting them in any package (thus, they are accessible only by passing references); these are known as gensyms.

Now, I don't mean to say that CL is perfect; it fails by way of conflating too many different facilities on a single symbol (lexical variables, dynamic variables, global non-lexical definitions, ...), and some of the multiple purposes motivate programmers to use naming conventions. But I think that there is value in the symbol system because it discourages the mistake of providing an interface which requires inventing unique string names.

(One thinking along capability lines might ask — why use names rather than references at all? Narrowly, think about method names (selectors, for the Smalltalk/ObjC fans) and module exports; broadly, distribution and bootstrapping.)

So, here’s my current thought on a “good parts version”, specifically designed for an E-style language with deep equality/immutability and no global mutable state.

There is a notion of name, which includes three concrete types:

  1. A symbol is an object which has a string-valued name, and whose identity depends solely on that string.
  2. A gensym also has a name, but has an unique identity (selfish, in E terms). Some applications might reject gensyms since they are not data.
  3. A space-name holds two names and its identity depends solely on that combination. (That is, it is a “pair” or “cons” specifically of names.)

Note that these three kinds of objects are all immutable, and use no table structures, and yet can produce the same characteristics of names which I mentioned above. (For implementation, the identity of a name as above defined can be turned into pointer identity using hash consing, a generalization of interning.) Some particular examples and notes:

  • A CL symbol in a package corresponds to a pair of two symbols, or perhaps a gensym and a symbol. This correspondence is not exact, of course. (In particular, there is no notion here of the set of exported symbols in a package. But that's the sort of thing you have to be willing to give up to obtain a system without global mutable state. And you can still imagine 'linting' for unexpected symbols.)
  • The space-name type means that names can be arbitrary binary trees. If we consistently give the left side a “namespace” interpretation and the right side a “local name” one, then we have a system, I think, where people can carve out all sorts of namespaces without ever fearing collisions or conflicts, should it become necessary. Which probably means it's massively overdesigned (cf. "worse is better").
  • Actual use case example: Suppose one wishes to define (for arbitrary use) a subtype of some well-known interface, which adds one method. There is a risk that your choice of name for that method conflicts with someone else's different subtype. Under this system, you can construct a space-name whose two components are a large random number (i.e. a unique ID) acting as the namespace, and a symbol which is your chosen simple name. One can imagine syntax and tools which make it easy to forget about the large random number and merely use the simple name.
  • It's unclear to me how these names would be used inside the lexical variable syntax of a language, if they would at all; I suspect the answer is that they would not be, or mostly confined to machine-generated-code cases. The primary focus here is improving the default characteristics of a straightforwardly written program which uses a map from names to values in some way.

(This is all very half-baked — I'm just publishing it on the grounds described in my previous post: in the long run I'll have more ideas than I ever implement, and this is statistically likely to be one of them, so I might as well publish it and hope someone else finds some use for it; if nothing else, I can stop feeling any obligation to remember it in full detail.)

(no subject)

Date: 2013-07-24 05:18 (UTC)
From: [identity profile] juan-gandhi.livejournal.com
This seems like a good step in the seemingly right direction.
Somehow reminds me RDF.

(no subject)

Date: 2013-07-24 15:30 (UTC)
From: [identity profile] kpreid.livejournal.com
Heh, perhaps, but it's interesting to note that RDF actually fails at this. There are several things you can't do in RDF without minting string identifiers; particularly, you can't have an anonymous predicate. You can use local "#foo" URIs, but in principle those are relative to the base URI of the RDF document itself, which is also a string.