# the variance of an arbitrary Iterable

## 2013.Apr.23

I’ve asked the following interview question to a number of candidates. This question is based on an actual problem I have had to solve in a real report (find the variance of an Iterable, where the Iterable may not be a Sequence.)

The question starts simply:

Write a function to calculate the average of a list of numbers, `xs`.

This is the only part of the question where I generally don’t hold the candidate’s hand.

Each part of this question can lead to follow-up discussions.

For example, a follow-up discussion on errors & error reporting:

What if `xs` is an empty list? Should `average([]) == 0`? How is that distinguished from `average([0])`? Should the function raise an exception? (It already will raise `ZeroDivisionError`: do we translate this into a `ValueError`?) Where might this function be used? What are the user's expectations when calling this function? What implications does that have on the way we might handle errors?

An easy follow-up discussion on complexity:

What is the asymptotic time behaviour of the above? Can we do any better? Why not? What is the asymptotic memory usage of the above? Can we do any better? Why?

The following two discussions can be repeated for each step of the question, though only the answer to the latter changes (when we write the constant memory usage formulation with a generator expression.)

Additionally, there is a follow-up discussion to be had on what the candidate thinks of the problem and the solution.

Do you like the solution? Is it something you’d do in real code? Are there any qualifications on when, where, and how you might solve these problems?

There exist the following functions in the global scope (`__builtins__`):

Rewrite `average` using the `__builtins__`, `sum` and `len`.

Simple enough!

Now, Python has list comprehensions:

Can we write `variance` that finds the variance of the list and makes use of list comprehensions?

Python also has generator expressions:

`len` works on any object that has `__len__` defined.

(More precisely, in CPython it works on any object that has `obj->ob_type->tp_as_sequence->sq_length` or `obj->obj_type->tp_as_mapping->mp_length.`)

Can we write our own `len` that will work on any finite Iterable, even a finite generator?

Can we do this using generator expression syntax?

Writing this using generator syntax requires an intuitive leap: that the expression yielded does not have to be a function of the loop variable.

Additionally, candidates with C++ or Java background will try to solve this with an explicit mutation of the count `(count++ for _ in xs)`, but integers in Python are non-mutable, and the purpose of this question is to flirt with a more functional approach.

Using this result, can we rewrite `average`?

We have to be able to calculate the average of any Iterable (including non-materialised generators)? This means that we can only iterate over the Iterable once.

This requires another intuitive leap: that we can capture two values in the generator expression.

A cute observation that avoids the use of `sumtup` by using a complex number.

Finally, can we use all of these results to rewrite `variance`?

For the sheer don’t use this code of it, I will avoid the use of `sumtup`.

`sum` is merely a fold (`reduce` in Python terms) on addition.

The purpose of this question is to start a discussion.

I have recommended candidates who didn’t get any of the answers on the first try but were able to clearly demonstrate their skill, knowledge, and experience in the surrounding discussion.

I’m just about the easiest interviewer out there, and I don’t hold candidates to getting the exact answer on the first try. In fact, many parts of this question rely on making a single intuititve leap, and I don’t think it’s fair to ask riddles in interviews.

In fact, I like giving this question to people who don’t know Python or who don’t know Python very well. It allows me to show them something and see how they approach putting it into use.

What kind of questions to they ask? Are they engaged in the discussion? What experience or knowledge can they bring to bear in answering it?

In the end, I think that what matters most to me is whether the candidate can have a reasonable discussion about the process of creating software. Are they engaged in their work? Are they easy to work with? Can they hold a productive discussion on software development? Can they reconcile differences of opinion to negotiate a practical, reasonable consensus?