Saturday, September 12th, 2020

I've now been programming in Rust for over a month (since the end of July). Some thoughts:

  • It feels a lot like Haskell. Of course, Rust has no mechanism for enforcing/preferring lack of side effects, but the memory management, which avoids using a garbage collection algorithm in favor of statically analyzable object lifetimes, gives a very similar feeling of being a force which shapes every aspect of your program. Instead of having to figure out how to, at any given code location, fit all the information you want to preserve for the future into a return value, you instead get to store it somewhere with a plain old side effect, but you have to prove that that side effect won't conflict with anything else.

    And, of course, there are algebraic data types and type classes, er, traits.

  • It's nice to be, for once, living in a world where there's a library for everything and you can just use them by declaring a dependency on them and recompiling. Of course, there's risks here (unvetted code, library might be doing unsound unsafe, unmaintained libraries you get entangled with), but I haven't had a chance to have this experience at all before.

  • The standard library design sure is a fan of short names like we're back in the age of “linker only recognizes 8 characters of symbol name”. I don't mind too much, and if it helps win over C programmers, I'm all in favor.

  • They (mostly) solved the method chaining problem! (This got long, so it's another post.)

In my previous post, I said “Rust solved the method chaining problem!” Let me explain.

It's popular these days to have “builders” or “fluent interfaces”, where you write code like

let house = HouseBuilder()
    .bedrooms(2)
    .bathrooms(2)
    .garage(true)
    .build();

The catch here is that (in a “conventional” memory-safe object-oriented language, not Rust) each of the methods here has the option of:

  1. Mutating self/this/recipient of the message (I'll say self from here on), and then returning self.
  2. Returning a different object with the new configuration.
  3. “Both”: returning a new object which wraps self, and declaring it a contract violation for the caller to use self further (with or without actually documenting that contract).

The problem — in my opinion — with the fluent interface pattern by itself is that it’s underconstrained in this way: in a type (1) case, which is often the simplest to implement, the caller is free to completely ignore the return values,

let hb = HouseBuilder();
hb.bedrooms(2);
hb.bathrooms(2);
hb.garage(true);
let house = hb.build();

but this means that the fluent interface cannot change from a type 1 implementation to a type (2) or (3), even if this is a non-breaking change to the intended usage pattern. Or to look at it from the “callee misbehaves” angle rather than “caller misbehaves”, the builder is free to return something other than self, thus causing the results to differ depending on whether the caller used chained calls or not.

(Why is this a problem? From my perspective on software engineering, it is highly desirable to, whenever possible, remove unused degrees of freedom so that the interaction between two modules contains no elements that were not consciously designed in.)


Now here's the neat thing I noticed about Rust in this regard: Rust prevents this confusion from happening by default!

In Rust, there is no garbage collector and no arbitrary object-reference graph: by default, everything is either owned (stored in memory belonging to the caller, like a non-pointer variable or field in C) or borrowed (referred to by a “reference” which is statically checked to last no longer than the object does via its ownership). The consequence of this is that every method must explicitly take an owned or borrowed self, and this means you can't equivocate between writing a setter and writing a chaining method:

impl HouseBuilder {
    /// This is a setter. It mutates the builder passed by reference.
    fn set_bedrooms(&mut self, bedrooms: usize) {
        self.bedrooms = bedrooms;
    }

    /// This is a method that consumes self and returns a new object of
    /// the same type; “is it the same object” is not a meaningful question.
    /// Notice the lack of “&”, meaning by-reference, on “self”.
    fn bedrooms(mut self, bedrooms: usize) -> HouseBuilder {
        // This assignment mutates the *local variable* “self”, which the
        // caller cannot observe because the value was *moved* out of the
        // caller's ownership.
        self.bedrooms = bedrooms;
        self                       // return value
    }
}

Now, it's possible to write a setter that can be used in chaining fashion:

    fn set_bedrooms(&mut self, bedrooms: usize) -> &mut HouseBuilder {
        self.bedrooms = bedrooms;
        self
    }

But because references have to refer to objects owned by something, a method with this signature cannot just decide to return a different object instead. Well, unless it decides to return some object that's global, allocated-and-leaked, or present in some larger but non-global context. (And, having such a method will contaminate the entire rest of the builder interface with the obligation to either take &mut self everywhere or make the builder an implicitly copyable type, both of which would look funny.)

So this isn't a perfect guarantee that everything that looks like a method chain/fluent interface is nonsurprising. But it's pretty neat, I think.


Here's the rest of the code you'd need to compile and play with the snippets above:
struct HouseBuilder {
    bedrooms: usize,
}

impl HouseBuilder {
    fn new() -> Self {
        HouseBuilder {
            bedrooms: 0
        }
    }

    fn build(self) -> String {
        format!("Home sweet {}br home!", self.bedrooms)
    }
}

fn main() {
    let h = HouseBuilder::new()
        .bedrooms(3)
        .build();
    println!("{:?}", h);
}