flattening lists

2013.Apr.22

Here are a few ways we can take a nested sequence and flatten it.

In the following examples, I’ll assume we’re looking at a list-of-lists: a structure that needs to be flattened once to produce a list.

I’ll call the list-of-lists nested and each list within that will be designated xs and each element within that will be designated x.

Let’s start with the simplest, most naïve approach.

1
2
3
4
5
6
7
8
9
def flatten(nested):
	temp = []
	for xs in nested:
		for x in xs:
			temp.append(x)
	return temp

data = [[1,2,3],[4,5,6]]
assert flatten(data) == [1,2,3,4,5,6]

The above code will work not only for a list of lists but also for a generator of lists or a generator of generators. (In fact, all of the examples provided here will work for generators of generators or lists of lists or structures in-between.)

If we look through the standard library, we’ll come across itertools.chain which allows us the following formulations.

1
2
3
4
5
from itertools import chain
flatten = lambda nested: chain(*nested)

data = [[1,2,3],[4,5,6]]
assert list(flatten(data)) == [1,2,3,4,5,6]

Conveniently enough, chain allows us to provide an iterable, so we don’t have to do explicit unpacking with the asterisk/argument unpacking operator.

1
2
3
4
5
from itertools import chain
flatten = lambda nested: chain.from_iterable(nested)

data = [[1,2,3],[4,5,6]]
assert list(flatten(data)) == [1,2,3,4,5,6]

These two formulations have the advantage that they do not prematurely materialise our flattened structure. This might be useful if we have memory constraints or if we wanted to feed the result into another data structure (a set or some other Container) without first putting it into a list.

Interestingly enough, we can perform this flattening without any libraries.

Comprehension syntax allows us muliple for clauses, and we can use these to iterate over the elements of the nested structure.

1
2
3
4
5
6
7
8
9
10
11
12
13
from itertools import chain
flatten = lambda nested: (x for xs in nested for x in xs)

data = [[1,2,3],[4,5,6]]
assert list(flatten(data)) == [1,2,3,4,5,6]

# or, if we want to materialise eagerly:

from itertools import chain
flatten = lambda nested: [x for xs in nested for x in xs]

data = [[1,2,3],[4,5,6]]
assert flatten(data) == [1,2,3,4,5,6]

Now, these are all well and good, but I think a lot of folks came here to see code they seriously shouldn’t ever use.

To that effect, I present the following list flattening function (Python 3.3< only):

1
2
3
4
5
6
7
from sys import version_info
assert version_info.major == 3 and version_info.minor >= 3, 'Python 3.3< only!'

flatten = lambda nested: (None for xs in nested if (yield from xs) and False)

data = [[1,2,3],[4,5,6]]
assert list(flatten(data)) == [1,2,3,4,5,6]

Did you know that you can put an explicit yield in a generator expression?

Next time: * why do explicit yield or yield from in generator expressions work? * what is yield from, anyway? * what does the following do, why, and how does it work? (((yield x), (yield x)) for x in xrange(10)) * should I write or publish a library function for flattening higher order sequences of sequences? (quick answer: probably not) * why are strings kind of weird? (quick answer: they are structures with infinite, nested iterability; there is no constituent scalar element type.)