Why Rust closures are somewhat hard

Animats · on Jan 21, 2020

There's a lot of confusion about this.

A lambda is just a function without a name. (This feature tends to come with special syntax, although it doesn't have to.)

A nested function is a function defined inside another function which can access the variables of the enclosing function. (A nested function can be a lambda, but it doesn't have to be. Some languages have named nested functions. A lambda doesn't have to be a nested function; it doesn't have to pull in any variables from an outer scope.)

A closure is a nested function which can outlive the outer function, keeping its data alive. (A closure can be named, the usual case in Python and Javascript, or anonymous. So a closure need not be a lambda.)

Closures are easy to implement in garbage collected languages, but hard in explicitly allocated ones, because extending the lifetime of the imported data gets complicated.

So the options are:

- Lambda without any external data access -- typical use, comparison function for a sort.

- Lambda with external data access, but not outliving its enclosing function - typical use, iteration expression

- Named function with no external data access. Typical use, a local function in languages that don't do local functions well, such as C.

- Named function with external data access, not outliving its enclosing function. Typical use, internal function within a function to avoid passing extra parameters.

- Named function with external data access, outliving its enclosing function. A true closure, but not a lambda. Typical use, saving state for a callback in Javascript by passing the function to something that will save it and invoke it later. An object, really. This was how LISP did objects.

- Lambda function with external data access, outliving its enclosing function. A true closure. Same uses as above, but in different languages.

Most languages offer some subset of these six options.

jarjoura · on Jan 22, 2020

I think the OP makes it clear that this nuance is actually quite tricky in a native systems language like Rust.

A good example in C++ would be:

- A function (non capturing/lambda) is created with

  auto f = [] { ... };

- A function (capturing/closure) is created with

  auto s = "...";
  auto f = [&] { s; ... };

In source code both of those look similar and you can even define them with the same type: std::function<void()>

However, the resulting assembly for both couldn't be more different. The Lambda case is as raw as any normal C function type, which the closure case creates a C++ class that closes over the outer method's state. If you're not a seasoned C++ engineer, this nuance will be lost on you.

It's fair to say that everyone should understand Lambda Calculus rules, but that makes the language less accessible to new-comers. In the case for non-systems languages, you can blur the lines with ease, but something Rust and C++ just cannot afford to do. That makes understanding the nuance important to be effective.

vlovich123 · on Jan 22, 2020

I’m pretty sure that the generated assembly for both your examples written as-is is the same. The optimizer should see through the lambda sugar unless the example gets weird or you type-erase via std::function.

I think your broader point still holds but perhaps could do with a clearer example

t3hz0r · on Jan 21, 2020

Note that the workaround to force `change_x` out of scope in the section 'Implications of “Closures are Structs”' is no longer necessary with non-lexical lifetimes. So Rust closures are (somewhat) less hard now :)

cobbzilla · on Jan 21, 2020

The fact that you can even do stuff like this in Rust is amazing. It’s simultaneously: relatively clean code, very efficient and type-safe. I like it.

capableweb · on Jan 21, 2020

Not sure I agree with "relatively clean code" when one of the examples show "fn compose <T>(f1: impl Fn(T)->T, f2: impl Fn(T)->T) -> impl Fn(T)->T {" which is just a mix-match of keywords and other things, with tons of syntax embedded in just one line.

But as always, depends on where you come from. I mostly deal with lisp languages nowadays, so guessing it's just my view that the line quoted above seems complex enough to not be interested one bit in Rust.

aratakareigen · on Jan 21, 2020

That line is extremely simple to me. What makes it seem like a "mix-match" of symbols to you?

The equivalent in Haskell would be "compose :: (a -> a) -> (a -> a) -> a -> a", which lacks "impl" because it boxes all closures (each closure in Rust has its own type), and lacks "Fn" because it doesn't care about what kind of access the closure needs to its context (Rust distinguishes between Fn, FnMut and FnOnce). Still, the Rust is only a tiny bit more complicated than the Haskell.

the8472 · on Jan 21, 2020

Trait aliases will make some of those unwieldly declarations a bit easier on the eyes

    #![feature(trait_alias)]

    trait Map<T> = Fn(T) -> T;
    fn compose <T>(f1: impl Map<T>, f2: impl Map<T>) -> impl Map<T> {
      |n| f1(f2(n))
    }

The_rationalist · on Jan 21, 2020

This is a nice feature, do you know any other language that has something similar?

Psyonic · on Jan 22, 2020

Scala's type aliases are pretty similar.

jcelerier · on Jan 22, 2020

looks doable with C++ using :

template<typename T> using Map = std::function<T (T)>;

imtringued · on Jan 22, 2020

Perhaps you should pick your examples more carefully. That function just repeats impl Fn(T)->T three times which is exactly what yo would expect from the compose function. If there was a mix of multiple different types of functions you could demonstrate the additional complexity of Rust. What you did is just prove that Rust is the same as most languages.

iknowstuff · on Jan 22, 2020

    fn compose<T>(
        f1: impl Fn(T)->T,
        f2: impl Fn(T)->T
    ) -> impl Fn(T)->T
    {
        // both arguments as well as the returned value 
        // are functions which take and return the generic type T.
        // (to be precise, they implement the Fn trait).
    }
    
    let returned_function = compose(|val| val == 1, |val| val < 0);
    let value: bool = returned_function(true);

nynx · on Jan 21, 2020

That line can be simplified to:

  fn compose<T, F: Fn(T)->T>(f1: F, f2: F) -> F {

tomjakubowski · on Jan 21, 2020

Lambdas have unique types, and you can't use a generic parameter in a return type (which is impl Trait's raison d'etre), so I think it would have to be this:

    fn compose<T, F: Fn(T) -> T, G: Fn(T) -> T>(f: F, g: G) -> impl Fn(T) - > T

or:

    fn compose<T, F, G>(f: F, g: G) -> impl Fn(T) -> T
      where F: Fn(T) -> T, G: Fn(T) -> T

Diggsey · on Jan 21, 2020

Maybe you just miscommunicated, but you absolutely can use generic parameters in a return type:

https://doc.rust-lang.org/nightly/std/iter/trait.Iterator.ht...

... and `impl Trait` exists to be able to have return types which are difficult, verbose, or impossible to name.

Sean1708 · on Jan 22, 2020

It's impossible to write a compose function with the declaration

  fn compose<T, F: Fn(T) -> T>(f1: F, f2: F) -> F

As far as I'm aware the only way to write a function like that is

  fn compose<T, F: Fn(T) -> T>(f1: F, f2: F) -> F {
    f1
  }

and even then you can only call it like this

  let clos = |x| 2 * x;
  compose(clos, clos);

YatoRust · on Jan 21, 2020

Yes, I think he meant that it would be a bad idea to lock all of the type parameters to the same type, because all closures have different types. So you wouldn't be able to do something like this

    compose(|x| x, |x| x)

because the closures have the same type.

lmm · on Jan 22, 2020

Because the closures don't have the same type, if I'm following you correctly.

mathw · on Jan 22, 2020

Because the parameters demand the same type (and for the return value), but the supplied parameters are not the same type because the only way to get that is to pass in exactly the same instance of the closure.

It's a fun little thing, and a good example of why impl Trait in argument position is a really nice addition to Rust, even though based on what we were originally excited about (impl Trait in return position for returning Iterators and other such things), the argument position form didn't seem so important.

aratakareigen · on Jan 22, 2020

Return position impl Trait also exists so that you can change the actual return type without making a breaking change.

optimuspaul · on Jan 21, 2020

While that may be an extreme example I agree. nothing clean about that code. I've only just begun to dive into rust, but I feel like I'd need a concordance to navigate the meaning of that snippet alone.

Bt then again, maybe it depends on where you come from.

bschwindHN · on Jan 22, 2020

Takes a week or two of studying the language. It's well worth it, in my opinion.

elfexec · on Jan 21, 2020

> The fact that you can even do stuff like this in Rust is amazing.

Stuff like this? You mean closures? That's what you find amazing? What is considered ordinary in other programming languages is considered amazing in Rust. Amazing.

> It’s simultaneously: relatively clean code, very efficient and type-safe. I like it.

You like it? Do you work for mozilla? If there is a tech evangelist of the year award, I will vote for you. I've never seen someone take absolutely nothing and try to spin it into something positive.

Edit: Instadownvotes by the evangelists. I like it!

dang · on Jan 22, 2020

This comment breaks quite a whole lot of the site guidelines. Please don't post in the flamewar style to HN.

If you'd please review https://news.ycombinator.com/newsguidelines.html and take the intended spirit of this place more to heart, we'd be grateful.

wtetzner · on Jan 24, 2020

The reason it's amazing is that it's being done without a GC, while still giving you control over allocation, and even being able to inline the lambdas in many places.

It's not that lambdas themselves are hard, it's that they made it work with all of the other constraints of the language.

chrismorgan · on Jan 21, 2020

> With Rust 1.26, there’s a simpler but completely equivalent notation:

  fn new_invoke(f: impl Fn(f64)->f64, x: f64) -> f64 {
      f(x)
  }

That actually isn’t completely equivalent. With the former example, `invoke::<_>(x)` works, but with impl, `new_invoke::<_>(x)` doesn’t work. This is a deliberate aspect of the design of impl in argument position, and part of the reason why it can be better to avoid it in libraries.

In the case of functions, this difference probably doesn’t matter, because you normally can’t name the type of a function anyway, and are extremely unlikely to wish to; but in other cases being able to type the turbofish can matter for ergonomics.

As an arbitrary example, take std::convert::Into::into: the type parameter is on the trait rather than the method, so you can’t do `x.into::<T>()`, but if you need to constrain the type you must do so otherwise, e.g. `let y: T = x.into();` or `<_ as Into<T>>::into(x)`. (In that case in particular, you’d write `T::from(x)` instead, but there won’t always be such an ergonomic replacement.)

rafaelvasco · on Jan 21, 2020

I feel most things in Rust are harder to do, at least at first sight. A compromise for the enhanced safeness I guess. Could give it a try again sometime.

taberiand · on Jan 21, 2020

It seems to me that it's not that Rust is hard, it's that all programming is hard but Rust exposes the complexity so we can make a more informed decision about the trade offs of safety and performance etc, while other languages tend to hide the complexity (null reference, race conditions, etc).

efaref · on Jan 22, 2020

I've found that once you fully get this, Rust is actually easier to write code with. Especially refactoring. You can make sweeping changes to complex code and be satisfied that once it compiles it will most likely work.

Fearless refactoring is one of the prime benefits of Rust.

rafaelvasco · on Jan 21, 2020

Yeah. Makes sense. Rust just explicitly exposes certain features so that the programmer has more control.

stefs · on Jan 21, 2020

something i've read times and times again: programmers of other languages who learned rust tend to discover potentially problematic code in their earlier work.

hoseja · on Jan 22, 2020

That's just normal for learning to be a better programmer.

saghm · on Jan 21, 2020

In general, it's not the safety alone, but the combination of safety and eschewing implicit runtime costs. It would be 100% safe for Rust to always implicitly box closures; it just wouldn't be as efficient in the cases where boxing them isn't needed.

justwalt · on Jan 21, 2020

In my experience that’s true. But once you code in it consistently for some appreciable amount of time, you start to think in ways that the compiler will be happy with and that make sense from a safety standpoint. It’s getting to that point that requires a bit of work.

zelly · on Jan 22, 2020

Everything that's hard in Rust could be solved by GC.

With Rust, programmers have to spend most of their mental energy worrying about management of memory, which has largely been automated already.

I always go back to this quote from Andrei Alexandrescu (creator of D):

  A disharmonic personality. Reading any amount of Rust code evokes the
  joke "friends don't let friends skip leg day" and the comic imagery of
  men with hulky torsos resting on skinny legs. Rust puts safe, precise
  memory management front and center of everything. Unfortunately,
  that's seldom the problem domain, which means a large fraction of the
  thinking and coding are dedicated to essentially a clerical job (which
  GC languages actually automate out of sight). Safe, deterministic
  memory reclamation is a hard problem, but is not the only problem or
  even the most important problem in a program. Therefore Rust ends up
  expending a disproportionately large language design real estate on
  this one matter. It will be interesting to see how Rust starts bulking
  up other aspects of the language; the only solution is to grow the
  language, but then the question remains whether abstraction can help
  the pesky necessity to deal with resources at all levels.

RAII and borrow checking is a crutch. You just limited the set of programs you can write to those that can be written in the block-lifetime-scoped manner, which is smaller than the set of good programs (see: pretty much any graph-heavy data structures). The limitations of this programming model show up everywhere, like in the way closures have to be implemented.

There will be more innovation in GC that will make manual memory management even more useless. In a lot of cases the JVM does a better job of freeing memory than a programmer. I don't want to spend my time programming worrying about the same thing (memory) that K&R did in the 70s. I don't want to bet against innovation and technology.

jcranmer · on Jan 22, 2020

> You just limited the set of programs you can write to those that can be written in the block-lifetime-scoped manner, which is smaller than the set of good programs (see: pretty much any graph-heavy data structures).

Given that most of my Rust programs have involved graph-heavy data structures, I'd like to see you explain why it's impossible for me to have written what I have written.

zelly · on Jan 22, 2020

Not impossible, just harder. Ceteris paribus, it would have been easier to write those programs in Python or Java.

It's not like this is not well-known. Rust users admit it. GUIs are hard to write in Rust because of the limitations of RAII+borrow checker based memory. You have to rely on ref counted pointers which can become very complicated to deal with in cyclic dependencies and long-lived references common in GUIs. This problem is not unique to Rust either. C++ makes it hard to write GUIs too, but hacks are used to get around the language like Qt's moc or Microsoft's old CLI extensions to C++.

All non-GC languages are crippled for application programming. Sure, there are domains where GC cannot ever work, like signal processing or kernel dev. In these places I'd much rather see Rust used than C++.

im_down_w_otp · on Jan 22, 2020

This indictment makes little sense to me considering that, outside the Rust compiler itself, the largest early and still ongoing use case for Rust was/is Servo. A web browser engine, which is predominantly a thing that's dealing with multitudes of graph-shaped problems.

:confused:

wahern · on Jan 22, 2020

Have you looked at the issue tracker for Servo? The bugs and pain points are enlightening. Plus, Servo isn't even used in production! The strategy has been to move parts of it. AFAIU, the most complex piece so far has been the CSS engine. That's nothing to sneeze at, but still. I recently wrote a full PKIX X.509 DER decoder in LPeg in a couple nights of hacking; have fun doing that (or writing anything equivalent to LPeg[1], for that matter) in a language without GC. It's absolutely possible, just like it's possible to implement it in ATS or Coq, but that's not the point of debate.

That said, it's one thing to admit the costs of Rust's model. It's another to argue about the implications of that cost--i.e. the degree to which it makes Rust non-viable long-term. On that point I have no opinion, except to say that for complex application development I've been very happy using Lua as a glue language, and many others have been happy using GC'd languages for complex logic. But that's more of a neutral factoid as it both diminishes Rust's relative costs and benefits. Whether it augurs in favor of Rust or not depends on your starting point. Plus, language success is less dependent on merit than we'd like to believe.

[1] Note that LPeg is far more than a PEG engine (like Rust pest), as strict PEG engines are incapable of parsing DER, which is context-sensitive. In addition to its extensions (e.g. runtime matches), what makes LPeg so simple is that you don't even need intermediate graph representations. You can transform subtrees in the grammar inline, mixing function transforms and intermediate captures to your hearts content. This is difficult to explain and more difficult to appreciate until you see and use it in action.

vlovich123 · on Jan 22, 2020

The challenge with gc is that it itself isn’t a complete solution and actually creates new problems that are even harder to solve. Everyone worries about memory because it’s the most common resource but all resources have this problem (eg files, network handles, dB connections, audio handles, etc). So then you end right back to having GC languages add back scope-based ownership for that (python with, java closeable, etc). With RAII-style management like C++ or Rust all resources are treated uniformly.

Furthermore the challenge with concurrent GC is that max memory utilization is difficult to contain and difficult to make performance deterministic. So yes it has advantages. It also has severe disadvantages for the types of low-level efficient things Rust is meant for. Remember. Rust is meant to replace the existing problems that people reach for C/C++. That’s the space it’s competing in - embedded programming, operating systems, browsers, etc. There are problems that other languages are better suited for. Python, Ruby, Java are easier to work with not only because of GC. Java in particular pays an even larger cost due to not supporting value types and is still struggling afaik to integrate them properly into the GC which inhibits programmer-aided optimization of memory usage/layout.

Also, I’ll note that Swift and Objective-C are pretty easy to work with for memory management (on the level of Java) despite not having GC and get quite close to the performance of C/C++/Swift.

The challenges with GC that you handwave away has been what’s been said about GC ever since the beginning. Perhaps it’s time to admit that GC isn’t the silver bullet you think it is?

sdegutis · on Jan 22, 2020

> Swift and Objective-C are pretty easy to work with

> and get quite close to the performance of C/C++/Swift.

In some cases, identical performance. ;)

millstone · on Jan 22, 2020

It’s a real issue. Consider (as you say) the Rust compiler. It has an IR phase called MIR, which is a tree, and elements in the tree need to know how to find themselves in the parent. For example, a function has a set of basic blocks, and each BB needs to know about its function. This is a very typical IR; LLVM is the same.

Backreferences are hard in Rust, so a BB instead maintains its index into a Vec<BB>, owned by the function. But this index is just a number: ownership is not modeled, it is not statically checked, and it may fall out of sync. It is effectively a slow, weird (though sandboxed) raw pointer.

You can write this stuff in Rust, but it is awkward and Rust cannot bring its strengths to bear. Rust assumes a tree-like ownership model and if you fall off that path, it can’t help much. Graphs aren’t trees so Rust is less helpful here.

Sean1708 · on Jan 22, 2020

> It is effectively a slow ... raw pointer.

In my experience (unless I've misunderstood what you're trying to say) this is the fastest way to write graphs because of how much more cache-friendly it is than having to chase a load of pointers.

zelly · on Jan 22, 2020

True, but you can do the same thing more efficiently with pointers.

Suppose the parent maintains an array of children

  Node*

Then each child has a

  Node**

member which points to itself in that array. That's one dereference to get up to the parent context.

In the Rust example, to do the same would take an extra few steps and probably miss the cache. I think speculative execution would favor the pointer method.

vlovich123 · on Jan 22, 2020

Do you have benchmarks? An array offset isn’t speculative and I suspect you’ll have a very hard time showing that vec[i] is slower than *ptr. One challenge is if vec[i] involves a bounds check which it might in native rust.

zelly · on Jan 22, 2020

I stand corrected. The contiguous vector is better and more cache-friendly.

  #include <chrono>
  #include <iostream>
  #include <memory>
  
  static constexpr size_t MAX_NODES = 1 << 27;
  
  struct Node {
   size_t index_self;
   Node** ptr_self = nullptr;
  };
  
  struct TopLevel {
   Node** ptr_children;
   size_t n = 0;
   explicit TopLevel() : ptr_children(new Node*[MAX_NODES]) {}
   ~TopLevel() { delete[] ptr_children; }
   void add(const std::shared_ptr<Node>& descendant) {
    if (n < MAX_NODES) {
     ptr_children[n] = descendant.get();
     descendant->index_self = n;
     descendant->ptr_self = &ptr_children[n];
     n += 1;
    }
   }
  };
  
  int main(void) {
   TopLevel top_level{};
   for (int i = 0; i < MAX_NODES; ++i) {
    auto node = std::make_shared<Node>();
    top_level.add(node);
   }
   // access by pointer
   {
    auto t_0 = std::chrono::high_resolution_clock::now();
    Node** it = &top_level.ptr_children[0];
    for (int i = 0; i < MAX_NODES; ++i) {
     Node* via = *(*it)->ptr_self;
     it += 1;
    }
    auto t_1 = std::chrono::high_resolution_clock::now();
    std::cout
        << "Via pointer: "
        << std::chrono::duration_cast<std::chrono::nanoseconds>(
        t_1 - t_0)
        .count()
        << " ns\n";
   }
   // access by index
   {
    auto t_0 = std::chrono::high_resolution_clock::now();
    Node* it = top_level.ptr_children[0];
    auto idx = it->index_self;
    for (int i = 0; i < MAX_NODES; ++i) {
     Node* via = top_level.ptr_children[idx];
     idx += 1;
    }
    auto t_1 = std::chrono::high_resolution_clock::now();
    std::cout
        << "Via index: "
        << std::chrono::duration_cast<std::chrono::nanoseconds>(
        t_1 - t_0)
        .count()
        << " ns\n";
   }
   return 0;
  }

Results:

  % clang++ -O3 bench.cc && ./a.out
  Via pointer: 90 ns
  Via index: 41 ns

jcelerier · on Jan 22, 2020

Note that moc does not do anything related to ownership, its main point is to provide runtime reflection abilities to your own classes. If you write Qt code without creating your own types you don't even need it.

lifthrasiir · on Jan 22, 2020

First of all, Andrei Alexandrescu is not the creator of D.

> With Rust, programmers have to spend most of their mental energy worrying about management of memory, which has largely been automated already.

This is a common misunderstanding. I think the word "automated" is misleading, and suggest that we should look at two phases of memory management: the plan and the execution.

In traditional low-level languages like C, you plan how to manage memory and you write code to execute that plan yourself. Your code may not match the plan, thus all sort of problems ensues.

In languages with GC, you are allowed to not think how to manage memory and GC will execute their own inferred plan... until it is not. At some point you are forced to think how to manage memory and also tune GC and/or code to fit to your belated plan. This point may not be reached (simple scripts) or can be delayed much further (good modern GC), of course, so it still remains very useful.

In (a typical use case of) Rust, you plan how to manage memory but give that plan to the compiler to check if the plan is executed correctly. The compiler also generate code for common executions that you don't have to. So the execution is mostly automated (or rather, delegated), while planning is not.

A common theme here is that you can't avoid at least the planning once you've got past a threshold. If GC would supplant even that planning, you won't need value types at all; isn't that what the generation hypothesis is all about? But many languages with GC but not value types now get them. While it can be argued that Rust's affine typing is not sufficient to capture some common memory management patterns [1], that fact doesn't diminish the value of explicit memory management planning, and thus Rust's.

[1] And I think it is actually true! Indeed, I believe affine typing plus GC can be actually better than affine typing or GC alone (and value types mimic a good subset of this). There are some Rust libraries for tracing GC (e.g. [2]) as a starting point.

[2] https://github.com/Manishearth/rust-gc

TheDong · on Jan 22, 2020

"You just limited the set of programs you can write to those that can be written ..."

Yes! That's awesome! I want the set of programs I can write to be small, but still large enough to contain correct programs.

For example, I really like statically typed languages! But what is static typing? It's rules that reduce the set of valid programs. In a dynamically typed language, 'def double(x): x*2; double(os.Open("foo"))' or other such nonsense is a valid program that errors out at runtime reliably. In a statically typed language, that program is not valid. Woo! That's a whole class of programs that do bad stuff which aren't valid anymore!

If rust restricts the set of valid programs to those that structure the ownership of data more rigidly, I'll take it!

I do realize that rust has escape hatches that let you model those things when you need to, but I'll gladly take a language that pushes programmers away from difficult to reason about data-ownership-models even if it doesn't prevent all such cases.

otabdeveloper4 · on Jan 22, 2020

If you want a garbage-collected language, there's already literally a thousand of them out there. Including dozens that are more popular than Rust.

imtringued · on Jan 22, 2020

Gargabe collection is a much bigger crutch. Once you use it your entire program is contaminated by the performance characteristics of a single improperly written function. You cannot mix real time code with non real time code. If you do your real time code will be delayed by the non real time code. The unpredictability of the GC has become a property of your entire application.

zelly · on Jan 22, 2020

Yes, but let's be real, 95% of programmers do not build real-time systems.

Especially not the type of programmers Rust is appealing to. They seem to be more excited by WebAssembly than the stuff that actual low latency people worry about, like tuning obscure FPGAs and vectorizing math code. These are industries like audio processing, SpaceX, industrial equipment manufacturers, self-driving cars, high frequency trading. Definitely not code targeting x86_64 cloud VMs or iPhones.

In hard real-time environments they do not even do memory management. They just statically allocate the whole heap and never malloc or free while the program is running. So even if Rust is better at memory management, it won't make them switch from C because it wasn't an issue.

zozbot234 · on Jan 22, 2020

Yes, but you can have other kinds of cycle-aware memory management, which integrate better with their surrounding code than plain tracing GC. E.g. something like https://github.com/artichoke/cactusref which can collect cycles deterministically, and does not have to trace the whole heap to do so (which implies a lot of hidden overhead in plain GC, because all heap-allocated structures must then be legible to the tracing routine).

openfuture · on Jan 23, 2020

I think Gluon is a good example of how Rust can be 'bulked up'.

The idea is to have a DSL inside Rust that essentially has haskell semantics. This allows you to write performance sensitive code in Rust and then marshall it into Gluon where you can do high level manipulations with type safety before you unmarshall into a performance sensitive section again.

To me the failure of Rust is more that it didn't go full-on-haskell syntax wise and (to a lesser extent) that there isn't a specification (or some standards) beyond the test suite.

wtetzner · on Jan 24, 2020

> You just limited the set of programs you can write ... which is smaller than the set of good programs

That's true of all static type systems.

Also, the whole point of Rust is to cover cases where performance and memory management are important. If you can use a GC'd langauge for you program, then pick one of the many fantastic existing ones.

shmerl · on Jan 22, 2020

Every GC is a trade off. You can't replace RAII with GC without losing some control. No matter how much you innovate in it, you have to pay with something. That's the whole point of GC after all.

Here is a good read about it: https://blog.plan99.net/modern-garbage-collection-911ef4f8bd...

pjmlp · on Jan 22, 2020

GC enabled systems programing languages don't replace RAII with GC, they offer both and its up to the developer to make the best use of them.

shmerl · on Jan 22, 2020

If the language works without GC, you can implement optional GC for it. Even Rust had something like that in the early days (though it was dropped). But languages where GC is expected by design, don't allow you to disable it (like Java).

pjmlp · on Jan 22, 2020

Except I wasn't speaking about Java, rather D, Nim, Modula-3, Active Oberon, System C#, Swift.

hechang1997 · on Jan 22, 2020

This article does a great job explaining the subtleties about rust closures.

A subtle point often missed is that if you want the anonymous structure of your closure itself to own a value, so that you can have a closure with a `'static` lifetime (which lives forever) that still could be called multiple times, you must use move keyword. If you just move the values into the closure manually, your closure will be a `FnOnce` since the compiler will mistakenly believe that you want to move that value each time the closure gets invoked, which can only happen once if the moved value doesn't implement `Copy`. In constrast, if you use the `move` keyword, compiler will correctly move the value into the anonymous structure of the closure itself during its construction, and the closure will be a `FuMut` and/or `Fn` as long as you don't consume the said value in the closure.

hechang1997 · on Jan 22, 2020

https://www.snoyman.com/blog/2018/11/rust-crash-course-05-ru...

jimbob45 · on Jan 21, 2020

I don't understand the intuition of closures and they turn me off to languages immediately. They feel like a hack from someone who didn't want to store a copy of a parent-scope variable within a function. The idea that I can touch variables that have gone out of scope (and that have ostensibly been GC'd) makes me feel that it is impossible to reason about variable lifetimes when dealing with closures. Is there some perspective I'm missing here? Is it literally just the language invisibly adding the parent-scope variables and their values to the top of my function?

merlincorey · on Jan 21, 2020

There are indeed some nuances you are missing as well as at least one misconception you are holding:

> The idea that I can touch variables that have gone out of scope (and that have ostensibly been GC'd)

You can't touch variables that have been garbage collected, indeed. A variable captured in a closure will have a reference to it and therefore not be garbage collected until the closure itself goes out of scope and is garbage collected.

Closures allow encoding some state into a function, which is especially useful for higher order functions that return specialized functions, for example. This is, for example, how decorators in Python work - a function is captured in a closure, which is then referenced and called by a new function wrapping the original function.

chubot · on Jan 21, 2020

I don't have a problem with the intuition but I'm with you on programming style.

It's sort of a contrary opinion but I prefer to be explicit about state, and closures are sort of silently bundling up state for you behind the scenes. It's not very apparent from the syntax.

I think languages should have a lighter-weight syntax for classes and that would subsume many use cases for closures.

-----

While I've never programmed in PHP, it appears PHP actually has a pretty nice solution for this problem! You declare the variables captured using 'use':

https://stackoverflow.com/questions/1065188/in-php-what-is-a...

    $callback =
        function ($quantity, $product) use ($tax, &$total)

In other languages the capture of 'tax' and 'total' are implicit.

smilekzs · on Jan 21, 2020

C++ lambda has explicit capture list as well.

kragen · on Jan 26, 2020

Yeah, I came to the same conclusion about closures when I wrote Ur-Scheme. In mainstream modern languages, you have the remarkable situation that, when you are writing the code, you know which variables you intend your closure to capture, but you don't note that explicitly in the code. Then the compiler needs some extra complexity to calculate the set of free variables in the body of the lambda — and which binding contours they are being captured from, although that's usually pretty trivial. And then the guy who is changing the code next year also needs to reverse-engineer that same set of variables and binding contours in order to modify it successfully. So it seems sort of perverse to leave that information out of the code!

Since then, though, I've come to doubt my conclusion, for two reasons:

1. The same thing is true of, for example, static types; but using type inference or dynamic typing often makes your code easier rather than harder to read. (Some wag said something to the effect that dynamic typing is what you do when it's simultaneously so trivial to see that your program has no type errors that you can do it in your head and so difficult that you don't want to spend the time to do the proof.)

2. There isn't a really compelling difference to me between the closure in

    ^(x < 2 ifTrue: [ 1 ] ifFalse: [ (x - 1) factorial ])

capturing x and the closure in

    fetch(json).then(r => widget.displayJson(r.json()))

capturing widget, or the block in

    (let ((overlay (make-overlay beg end)))
      (overlay-put overlay 'face (or face 'highlight)))

capturing face.

That is, inner blocks of control structures implicitly capture variables from their outer scope all the time, and this is mostly not a problem; and closures are a useful technique to make it possible to extend the set of control structures. (As it happens, let in Emacs Lisp is implemented as a built-in special form, but in Scheme it's normally implemented as a macro that puts the inner block into a closure.)

Maybe part of the reason is the lifetime: the example with fetch() is in fact socking away a reference to widget until after the HTTP response comes back, so aliasing bugs and space leaks are possible, while the other two examples aren't. This might be a reason implicit closures are so much more popular in functional programming languages: aliasing bugs are not a problem for immutable data, and nobody expects to be able to predict how much memory their functional program will need anyway.

mrkeen · on Jan 21, 2020

> Is it literally just the language invisibly adding the parent-scope variables and their values to the top of my function?

I believe that would be 'lambda-lifting', i.e. taking a lambda which has free variables, adding those free variables to its input variables, moving the lambda out into the top- level, and modifying the call site to pass in the no-longer-free-variables. That should give you completely reasonable lifetimes and only require the stack.

It gets trickier when you want functions to be able to return functions. For instance:

  add(a, b) {
    a + b
  }

  myFun() {
    var add5 = x -> add(5, x);
    var add6 = x -> add(6, x);
    add5(1) + add6(2)
  }

In this case I've made two variants of add. I can't just run add5 on the stack, get rid of it, and then run add6 afterwards, because they both need to be alive at the same time.

saghm · on Jan 21, 2020

> The idea that I can touch variables that have gone out of scope (and that have ostensibly been GC'd)

I think this might be due to your mental model of GC not quite matching the way it's usually implemented (at least, in traditional GC, as opposed to reference counting); most GC gives no guarantees that memory will immediately be cleaned up once the last reference goes out of scope, just that it will happen some time in the future (usually the next time it checks). Languages like Rust (and C++, if you use destructors) will inline the memory cleanup at the end of the scope, but this is pretty explicitly not what will happen in a garbage-collected language.

Note also that this isn't really any different than the fact that a variable can go out of scope and not be cleaned up if another reference to it exists. Consider the following Java-esque code:

    void setNewChild(Parent parent) {
        var child = new Child();
        parent.child = child;
    }

The `Child` created in the method body will not be garbage collected even though the variable goes out of scope because it's still referenced somewhere else (i.e. in the Parent object). The key insight to understanding closures is that using the local variable in another function makes another reference to it, so naturally it would be kept alive until that function is gone.

Sean1708 · on Jan 22, 2020

> most GC gives no guarantees that memory will immediately be cleaned up once the last reference goes out of scope

But in this example the last reference hasn't gone out of scope, the last reference is in the closure. If the GC ignored closures and acted as if last reference had gone out of scope then there's a chance that that variable would be cleaned up before the closure gets called, which would make closures completely useless.

saghm · on Jan 23, 2020

I think we're saying the same thing! The point I was trying to make with the second half of my comment was that a variable going out of scope isn't the same thing as the object going out scope, since other references can still be alive afterwards. The example I gave was intended to demonstrate this without using closures (since the person I was responding to had indicated that they found them confusing), and then went on to attempt to explain that the example with just objects was in fact doing the same thing as if the object were referenced in a closure rather than another object. The first half of my comment was just an attempt to respond to the statement about the object having "ostensibly been GC'd", which isn't necessarily the case even if there wasn't a closure referencing the object; in your phrasing, there's a chance that it would be cleaned up, and I wanted to make sure that it was clear that it was only a chance, not a guarantee.

sdegutis · on Jan 21, 2020

In JavaScript and similar languages, “yes.” It’s really a lot simpler than you’re making it. Closures are great in these languages and incredibly useful for rapid prototyping. Especially in Node.js and in browsers.

dnautics · on Jan 21, 2020

not just rapid prototyping. Often times, when i write unit tests and the like, i use closures. In the system i work in, asserting an exception requires spawning a new thread and then watching it burn, which is most easily accomplished with a closure. But also writing one-off tasks to be run in a new thread (even in prod) can take closures.

wffurr · on Jan 21, 2020

In the article they discuss that Rust closures are just sugar for structs that work exactly as you describe.

As with any other syntactic sugar, it's easy to reason about the behavior once you know what the syntax desugars into.

One classic example I see with beginning programmers all the time is the order of the parts of a for loop.

e.g.

    for(initialization statement; bounds check expression; increment statement) {
      ...
    }

is just syntactic sugar for:

    {
      initialization statement;
      while (bounds check expression) {
        ...
        increment statement;
      }
    }

But once you understand what a for-loop desugars into, it's trivial to use and reason about.

Closures aren't much different. In Rust, if anything, they're less magic, because you have to be explicit about the lifetime of the closure relative to the references it's borrowing.

cesarb · on Jan 21, 2020

> is just syntactic sugar for [...]

There's just a subtle detail to keep in mind here: both "continue" and "break" are syntactic sugar for "goto", and when converting a "for" loop into a "while" loop, you have to also convert "continue" to a "goto" to a label just before the increment statement.

wwright · on Jan 22, 2020

I know you've already got a storm of replies, but I'd like to illustrate part of what I love about closure (especially when combined with anonymous objects like in JavaScript): you can replace classes, private vs public, "this", and other features with a simpler, smaller number of features.

    const Counter = () => {
      let counts = new Map();

      return {
        reset: () => {
          counts = new Map();
        },

        count: (key) => {
          const count = counts.get(key) + 1;
          counts.set(key, count);
          return count;
        }
      }
    }

    const items = ['apple', 'apple', 'banana', 'canteloupe'];
    const counter = Counter()

    items.forEach(item => {
      console.log(item + ": count is", counter.count(item));
    })

    counter.reset();
    // do some more counting now

IMO, this is much simpler [and prettier ;)] than the alternative using classes: you only need to know closures and objects, and the rules apply the same as they do in all other contexts. Classes in most languages usually come with their own twists and surprises.

samatman · on Jan 22, 2020

I've got just the paper for you! The Implementation of Closures in Lua:

https://pdfs.semanticscholar.org/73a2/e3c03f799956aa5a3188e4...

This is just one way to do it, but it's an efficient way, and easy to understand.

kxyvr · on Jan 21, 2020

In many ways, closures can be more difficult when there is not garbage collection because we precisely do not want to refer to values that have been freed or collected. For example, in C++, we can write code like:

  #include <functional>
  #include <iostream>
  auto foo(double a) -> std::function <double(double)> {
      return [&a](auto x) {
          return x + a;
      };
  }
  int main() {
      auto f = foo(2.0);
      std::cout << f(1.0) << std::endl;
      std::cout << f(1.0) << std::endl;
  }

This code is bad because the function returned by `foo` depends on a reference to `a`, which is freed at the end of `foo`. Running this gives:

  3
  1

which is bad. Rust has the advantage of finding such errors. For example, the following Rust code:

  fn foo(a: f64) -> impl Fn(f64) -> f64 {
      |x: f64| -> f64 { x + a }     
  }          
  fn main() {                
      let f = foo(2.0);      
      println!("{}", f(1.0));                                                     
      println!("{}", f(1.0));                                                     
  }

fails to compile with the error:

  error[E0597]: `a` does not live long enough
   --> test01.rs:2:27
    |
  1 | fn foo(a: f64) -> impl Fn(f64) -> f64 {
    |                   ------------------- opaque type requires that `a` is borrowed for `'static`
  2 |     |x: f64| -> f64 { x + a }
    |     ---------------       ^ borrowed value does not live long enough
    |     |
    |     value captured here
  3 | }
    |  - `a` dropped here while still borrowed

That forces us to correct the code by adding the move annotation to the the closure, which moves ownership of a to the closure itself:

  fn foo(a: f64) -> impl Fn(f64) -> f64 {
      move |x: f64| -> f64 { x + a }
  }          
  fn main() {                
      let f = foo(2.0);      
      println!("{}", f(1.0));                                                     
      println!("{}", f(1.0));                                                     
  }

This gives the correct result:

  3
  3

Of course, all of this nuance can be avoided in a garbage collected language. For example, in OCaml, we can write:

  let foo a =
      fun x -> x +. a
  in
  let f = foo 2.0 in
  Printf.printf "%f\n" (f 1.0);
  Printf.printf "%f\n" (f 1.0);

which also returns the correct result:

  3.000000
  3.000000

More generally, closures are nice because it's a useful feature to return a function from a function. For example, we may want call a function parameterized on some parameters repeatedly and we don't want to have to pass all of the parameters each time we invoke the function. Functional programming and closures allow for a variety of other programming techniques as well.

Beyond that, my point is that a garbage collector can track the use of variables and memory extremely well, which makes the process of constructing closures moderately easy. We can have closures without garbage collection, but then we must be careful about who and what owns this memory. C++ does not enforce this, but Rust does, which is one of its benefits.

phibz · on Jan 22, 2020

In your rust example could you give "a" a lifetime that is the same as the lifetime of the impl Fn that foo() returns? That way "a" lives as long as the impl Fn.

phibz · on Jan 22, 2020

I tried changing a to a reference and doing:

fn foo<'a>(a: &'a f64) -> impl Fn(f64) -> f64 + 'a { ....

but it doesn't work

ufo · on Jan 21, 2020

> Is it literally just the language invisibly adding the parent-scope variables and their values to the top of my function?

Pretty much. When implementing closures, the variables that are going to be used by the nested functions are allocated on the heap instead of on the stack. The nested function is then set up to receive a pointer to the outer variables as an additional parameter.

If you want to read more about it, the word to search for would be "closure conversion"

http://matt.might.net/articles/closure-conversion/

It's unfortunate the the Wikipedia article for closure conversion now redirects to the article about Lambda Lifting. They are related but aren't quite the same thing.

zozbot234 · on Jan 22, 2020

> Is it literally just the language invisibly adding the parent-scope variables and their values to the top of my function?

Yup. A "first class" closure is just reifying some arguments of a future function call and bundling them up nicely in a struct/record. This is especially clear in languages that support currying, since that means you'll literally be building that record step-by-step - only after a full set of arguments is provided does a true function call really occur.

lmm · on Jan 22, 2020

It feels like the part that's missing is the ability to abstract over these different types of function, polymorphically. At least, that seems to be where things fall down when we try to talk about e.g. implementing a Functor trait in Rust.

As a concrete example, the `compose` method given should clearly be the same for `Fn`, `FnOnce`, and `FnMut`. Can we write it once and reuse it for `Fn`, `FnOnce` and `FnMut`? If not, why not?

kccqzy · on Jan 22, 2020

Because Rust doesn't have higher-kinded types. Well it does but it doesn't allow you to define or implement traits for them.

Also, what happens when you compose one function with FnMut and another with just Fn? The answer is clearly FnMut. These three traits are related by a subtyping relationship. I don't know enough Rust to answer the question of whether such a hypothetical compose function can return the right one from this hierarchy.

Koshkin · on Jan 22, 2020

Well, looks like everything in Rust is somewhat hard.