[personal profile] kpreid

I have several times heard that one should not rely on finalizers (that is, code invoked after some object becomes garbage) to reclaim external resources (file descriptors, temporary files, etc.), on the grounds that there is no guarantee they will be promptly reclaimed and therefore one might run out.

Certainly for resources whose use has semantic significance to an outside system (e.g. a network connection or a locked file) or if there is a potential shortage of resources affecting other processes (e.g. free disk space), one should free them promptly whenever possible. (Finalizers are still important for error recovery unless you’re programming completely without nonlocal exits and extremely carefully, in which case you’re probably writing C and don’t have finalizers.)

But if the concern is for limited internal resources (most prominently, the limit on number of open file descriptors), and the process is entirely managed by the GC, would it not suffice to force a garbage collection and retry in the event that opening a file fails due to lack of file descriptors, just as if running out of memory while allocating memory?

No!

Date: 2011-05-25 12:10 (UTC)
From: (Anonymous)
"would it not suffice to force a garbage collection ..."

Not formally, no, because the GC may be aware that you are far from exhausting your currently available memory and not bother to do anything. Even worse, it might work for you on your machine because of the particular implementation of the garbage collection in whatever runtime you're using but your users may be using a different runtime which does GC differently.

Remember that "Garbage collection is simulating a computer with an infinite amount of memory": http://blogs.msdn.com/b/oldnewthing/archive/2010/08/09/10047586.aspx

(no subject)

Date: 2011-05-25 13:18 (UTC)
From: [identity profile] juan-gandhi.livejournal.com
Good point; but regarding file descriptors, an alternative solution would be to do the correct ulimit.

Re: No!

Date: 2011-05-25 13:47 (UTC)
From: [identity profile] kpreid.livejournal.com
OK, so we specifically need a GC aware of this intent, or at least one that has a force-GC operation which actually does (or has an option to) collect everything collectable.

That doesn't seem to be a large burden considering that doing this right also requires GC-and-retry code on every operation which allocates a file descriptor (or whatever).

Re: No!

Date: 2011-05-25 22:32 (UTC)
From: [identity profile] qedragon.livejournal.com
That logic doesn't hold water. One could just as easily say that a GC simulates a machine with an infinite number of file descriptors!

Re: No!

Date: 2011-05-26 12:56 (UTC)
From: (Anonymous)
The CLR garbage collector only considers memory, and thus simulates a machine with an infinite amount of memory. The proposal here is a generalisation of "traditional" garbage collectors from only considering memory, to considering all limited resources within the program's control; effectively, a runtime with a garbage collector of the form suggested is simulating a machine with no finite resources at all.

(no subject)

Date: 2011-05-26 18:33 (UTC)
From: (Anonymous)
In many cases, the resource exhaustion will be encountered by external library code which is not prepared to trigger a garbage collection and retry. If you control the complete stack all the way down, your proposal is an option, but this is the exception.

Research topic :)

Date: 2011-06-13 09:40 (UTC)
From: [identity profile] dutherenverseauborddelatable.wordpress.com (from livejournal.com)
I have studied this kind of thing in my PhD. You can take a look at an abridged version in one of my papers: Towards a Resource-Safe Erlang (http://www.univ-orleans.fr/lifo/Members/David.Teller/publications/colsec2007.pdf).

Re: Research topic :)

Date: 2011-06-13 15:35 (UTC)
From: [identity profile] kpreid.livejournal.com
I've scanned your paper and I agree that we're both looking at the question of avoiding resource exhaustion, but rather different aspects.

Your paper is about showing that a program will not run out of resources given arbitrary/malicious clients; my post is about allowing the programmer to manage resources less explicitly without losing reliability (that is, not introducing cases/orderings where the program fails despite having sufficient resources).