I’ve asked the following interview question to a number of candidates. This question is based on an actual problem I have had to solve in a real report (find the variance of an Iterable, where the Iterable may not be a Sequence.)
The question starts simply:
Write a function to calculate the average of a list of numbers,
1 2 3 4 5 6 7 8 9 10 11 from __future__ import division # allow for true division # ref: http://www.python.org/dev/peps/pep-0238/ def average(xs): total, count = 0, 0 for x in xs: total += x count += 1 return total / count assert average([1,2,3,4,5,6]) == 21/6 == 3.5
This is the only part of the question where I generally don’t hold the candidate’s hand.
Each part of this question can lead to follow-up discussions.
For example, a follow-up discussion on errors & error reporting:
xsis an empty list? Should
average() == 0? How is that distinguished from
average()? Should the function raise an exception? (It already will raise
ZeroDivisionError: do we translate this into a
ValueError?) Where might this function be used? What are the user's expectations when calling this function? What implications does that have on the way we might handle errors?
An easy follow-up discussion on complexity:
What is the asymptotic time behaviour of the above? Can we do any better? Why not? What is the asymptotic memory usage of the above? Can we do any better? Why?
The following two discussions can be repeated for each step of the question, though only the answer to the latter changes (when we write the constant memory usage formulation with a generator expression.)
Additionally, there is a follow-up discussion to be had on what the candidate thinks of the problem and the solution.
Do you like the solution? Is it something you’d do in real code? Are there any qualifications on when, where, and how you might solve these problems?
There exist the following functions in the global scope (
1 2 3 4 5 # sum: sum an Iterable of non-strings assert sum([1,2,3]) == 6 # len: length of some Sized (usu. Sequence) assert len([1,2,3,4]) == 4
1 2 3 4 5 6 from __future__ import division def average(xs): return sum(xs) / len(xs) assert average([1,2,3,4,5,6]) == 21/6 == 3.5
Now, Python has list comprehensions:
1 2 3 4 5 6 7 8 9 # [ expression for loop-variable in Iterable if predicate ] # ---------- ------------- -------- --------- # e.g., the squares of all the odd numbers between [0,10) xs = [x**2 for x in xrange(10) if x % 2] assert xs == [1**2, 3**2, 5**2, 7**2, 9**2] == [1,9,25,49,81] # xs is a materialised list, taking O(N) memory assert type(xs) is list
Can we write
variancethat finds the variance of the list and makes use of list comprehensions?
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 from __future__ import division # we'll use the following formulation of variance (provided): # E[(x - avg)**2] def average(xs): return sum(xs) / len(xs) def variance(xs): avg = average(xs) ys = [(x - avg)**2 for x in xs] return average(ys) data = [1,2,3,4,5,6] assert average(data) == 21/6 == 3.5 assert variance(data) == 18.5/6
Python also has generator expressions:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 # ( expression for loop-variable in Iterable if predicate ) # ---------- ------------- -------- --------- # e.g., the squares of all the odd numbers between [0,10) xs = (x**2 for x in xrange(10) if x % 2) assert list(xs) == [1**2, 3**2, 5**2, 7**2, 9**2] == [1,9,25,49,81] # xs is NOT materialised; it yields values as it is # iterated over, and, by itself, requires O(1) memory assert type(xs) is not list # a generator expression can, as a result, represent a list # of any size, even a countably infinite one! # e.g., the squares of all the odd numbers from [0, inf) from itertools import count xs = (x**2 for x in count() if x % 2) # we can step through the generator with next() assert next(xs) == 1**2 == 1 assert next(xs) == 3**2 == 9 assert next(xs) == 5**2 == 25 # note that the generator is a one-way stream: # once we step through and yield a value, we can never revisit that value! # in a future post, I'll discuss the difference between generators and generator expressions
len works on any object that has
(More precisely, in CPython it works on any object that has
Can we write our own
lenthat will work on any finite Iterable, even a finite generator?
1 2 3 4 5 6 7 def len(xs): count = 0 for x in xs: count += 1 return count assert len([1,2,3,4]) == 4
Can we do this using generator expression syntax?
Writing this using generator syntax requires an intuitive leap: that the expression yielded does not have to be a function of the loop variable.
Additionally, candidates with C++ or Java background will try to solve this with an explicit mutation of the count
(count++ for _ in xs), but integers in Python are non-mutable, and the purpose of this question is to flirt with a more functional approach.
1 2 def len(xs): return sum(1 for _ in xs)
Using this result, can we rewrite
We have to be able to calculate the average of any Iterable (including non-materialised generators)? This means that we can only iterate over the Iterable once.
This requires another intuitive leap: that we can capture two values in the generator expression.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 from __future__ import division # we'll presume we have access to the following # with standard Python tuples, x + y means concatenation # for the purposes of this example, we'll assume we have a tuple # wherein x + y means pairwise addition, e.g., # (a,b) + (c,d) == (a+c, b+d) class sumtup(tuple): def __add__(self, other): # note: (a,b) + (c,d,e) == (a+c, b+d) return type(self)(sum(p) for p in zip(self,other)) def __eq__(self, other): return all(a == b for a,b in zip(self, other)) assert sum([sumtup([1,2]), sumtup([3,4])], sumtup([0,0])) == (1+3,2+4) == (4,6) def average(xs): xs_tot, count = sum((sumtup([x, 1]) for x in xs), sumtup([0,0])) return xs_tot/count data = [1,2,3,4,5,6] assert average(data) == 21/6
A cute observation that avoids the use of
sumtup by using a complex number.
Recall that for a complex number:
1 2 3 4 5 6 7 8 from __future__ import division def average(xs): result = sum(x+1j for x in xs) return result.real/result.imag data = [1,2,3,4,5,6] assert average(data) == 21/6
Finally, can we use all of these results to rewrite
For the sheer don’t use this code of it, I will avoid the use of
sum is merely a fold (
reduce in Python terms) on addition.
1 2 3 4 5 6 7 8 from __future__ import division variance = lambda xs: (lambda (xs_sq_tot, xs_tot, count): (xs_sq_tot - xs_tot**2/count)/count)( reduce(lambda acc,ys: type(ys)(sum(y) for y in zip(acc,ys)), ((x**2, x, 1) for x in xs))) data = [1,2,3,4,5,6] assert round(variance(data),3) == round(17.5/6,3) # round, because we lose a little precision
The purpose of this question is to start a discussion.
I have recommended candidates who didn’t get any of the answers on the first try but were able to clearly demonstrate their skill, knowledge, and experience in the surrounding discussion.
I’m just about the easiest interviewer out there, and I don’t hold candidates to getting the exact answer on the first try. In fact, many parts of this question rely on making a single intuititve leap, and I don’t think it’s fair to ask riddles in interviews.
In fact, I like giving this question to people who don’t know Python or who don’t know Python very well. It allows me to show them something and see how they approach putting it into use.
What kind of questions to they ask? Are they engaged in the discussion? What experience or knowledge can they bring to bear in answering it?
In the end, I think that what matters most to me is whether the candidate can have a reasonable discussion about the process of creating software. Are they engaged in their work? Are they easy to work with? Can they hold a productive discussion on software development? Can they reconcile differences of opinion to negotiate a practical, reasonable consensus?