an instancemethod puzzle

2013.May.02

Here’s a simple puzzle with some strange results and some interesting implications.

The Python comparator is compares on object identity. In CPython terms, this means that PyObject* are equal.

1
2
3
4
5
6
7
8
9
10
11
12
13
static PyObject *
cmp_outcome(int op, register PyObject *v, register PyObject *w)
{
    int res = 0;
    switch (op) {
    case PyCmp_IS:
        res = (v == w);
        break;
    case PyCmp_IS_NOT:
        res = (v != w);
        break;
    /* ... SNIP ... */
}

So why do we get the following results?

1
2
3
4
5
6
7
8
9
10
class Foo(object):
	def bar(self):
		pass

foo = Foo()

assert Foo is Foo # the class object is the class object (expected)
assert foo is foo # an instance is the instance (expected)
assert Foo.bar is not Foo.bar # the unbound method is not object-identical (unexpected!)
assert foo.bar is not foo.bar # the bound method is not object-identical (unexpected!)

This tells us that Foo.bar and foo.bar give us different objects on different invocations.

We know that the __builtins__.id function gives us the pointer-value of a Python object (i.e., the memory address on the heap where this object resides.) As a result, comparing id(x) == id(y) should be equivalent to comparing x is y.

Strangely enough:

1
2
3
4
5
6
7
8
9
class Foo(object):
	def bar(self):
		pass

foo = Foo()

# this gives different results in PyPy
assert id(Foo.bar) == id(Foo.bar)
assert id(foo.bar) == id(foo.bar)

This seems to be the opposite of what we saw above.

However, we can reconcile the two results by realising that Foo.bar and foo.bar are being instantiated as new instances, and nothing is capturing a reference to these two objects. Since nothing captures a reference to them, once id accepts the object as a parameter, it’s free to be garbage collected. The next Foo.bar or foo.bar then happens to be allocated at the same memory address. This isn’t as coincidental as it seems: there is an optimisation in CPython that is not present in PyPy that re-uses freed bound and unbound method objects instead of costly freeing and reallocating of memory.

Now, say we have an application where some component maintains a callback.

The above tells us that capturing of references is backwards: instance methods capture a strong reference to their instance but instances do not capture a strong reference to their instance methods (since instance methods are instantiated temporarily.)

In other words:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
class Foo(object):
	def bar(self):
		pass
	def __del__(self):
		print 'Foo.__del__'

foo = Foo()
foo_bar = foo.bar

del foo
# foo will still linger until we `del foo_bar`
#   because the instance method captures a strong
#   reference to the instance
del foo_bar

# however, foo_bar will get deleted the moment
#   nothing references it
# foo still being alive won't have any bearing on
#   this, because foo.bar returns a new, temporary
#   object every time

When we capture our callbacks, we may want to capture weak references to them. We may not want the mere presence of a callback to force our objects to linger, even if nothing else references them. Therefore, we have to put everything into a weakref.WeakSet.

However, if we try to capture a bound, instance method as a callback, we’ll see some puzzling behaviour.

Unless we capture a strong reference to this instance method somewhere else in our programme (and manually manage the lifetime of the instance method to exactly match that of the instance itself,) we’ll see the instance method immediately freed.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
class Foo(object):
	def bar(self):
		pass

foo = Foo()

from weakref import WeakSet
callbacks = WeakSet()
callbacks.add(foo.bar)
# foo.bar has no further references, and
#   the callbacks set captures only weak
#   references, therefore it will get freed
#   immediately

# we have no callbacks registered, despite foo still being alive!
assert foo
assert not list(callbacks) # nothing here!

# one simple workaround is to capture our own
#   strong reference
ref = foo.bar
callbacks.add(ref)

assert foo
assert list(callbacks) == [ref] 
# except this defeats the purpose of having garbage-collection,
#   because we will have to manually manage the duration of
#   `ref` to match that of `foo` 
# otherwise, `foo` may linger 

There are a number of solutions to this problem.

There is a code recipe for creating an instance method wrapper that reverses the direction of how references are captured (so that the instance method captures only a weak reference on the instance.)

Tomorrow, I’ll blog about a different way to solve this problem.

Here’s the full, raw answer to our puzzle in CPython. This is the function that is called by the __get__ descriptor on all Python functions to instantiate a new method. We can see that we are creating a new object every time, and that we use a linked list of allocated objects so that we don’t have to do a full memory allocation/deallocation each time.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
/* Instance method objects are used for two purposes:
   (a) as bound instance methods (returned by instancename.methodname)
   (b) as unbound methods (returned by ClassName.methodname)
   In case (b), im_self is NULL
*/

PyObject *
PyMethod_New(PyObject *func, PyObject *self, PyObject *klass)
{
    register PyMethodObject *im;
    im = free_list;
    if (im != NULL) {
        free_list = (PyMethodObject *)(im->im_self);
        PyObject_INIT(im, &PyMethod_Type);
        numfree--;
    }
    else {
        im = PyObject_GC_New(PyMethodObject, &PyMethod_Type);
        if (im == NULL)
            return NULL;
    }
    im->im_weakreflist = NULL;
    Py_INCREF(func);
    im->im_func = func;
    Py_XINCREF(self);
    im->im_self = self;
    Py_XINCREF(klass);
    im->im_class = klass;
    _PyObject_GC_TRACK(im);
    return (PyObject *)im;
}