generating a secure, random password

2013.Apr.16

I was going to write about weakrefs and RAII tonight, but I’ll save that for later.

The NYC Python meetup hosted a fantastic event this evening. We heard talks from Chad Whitacre about Gittip which is written using the Aspen web framework and from Jesse Jiryu Davis about async programming in Python using Tornado and Tulip


I was recently working with a student studying for a programming 101 midterm. He came across an interesting problem in his practice materials.

Write a function that generates an 8-character random password containing at least one uppercase letter, at least one lowercase letter, at least one digit, and at least one punctuation symbol.

The student came up with the following, simple solution.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
from string import ascii_uppercase, ascii_lowercase, digits, punctuation
from random import choice

all_symbols = ascii_uppercase, ascii_lowercase, digits, punctuation

# a valid password contains at least one element from each set of symbols:
#    uppercase, lowercase, digits, and punctuation
def valid_pass(password):
	return all(set(password) & set(x) for x in all_symbols)

# generate passwords by picking from the list of all symbols
#   until a valid password is generated
def generate(n):
	if n < len(all_symbols):
		raise ValueError('must be at least {}'
 		                 ' characters long!'.format(len(all_symbols)))
	password = ''
	while not valid_pass(password):
		password = ''.join(choice(''.join(all_symbols)) for _ in xrange(n))
	return password

This is a great, straightforward solution to the problem. It’s easy to read, and it works.

However, it’s non-deterministic: we cannot statically predict how many loops the while in generate will take.

We can easily construct an algorithm that builds this password in a constant number of steps.

One such algorithm would be to generate all of the valid passwords and then randomly choose one.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
from string import ascii_uppercase, ascii_lowercase, digits, punctuation
from itertools import permutations
from random import choice

all_symbols = ascii_uppercase, ascii_lowercase, digits, punctuation

def valid_pass(password):
	return all(set(password) & set(x) for x in all_symbols)

# generate every possible, valid password
def all_passwords(n):
	symbols = ''.join(all_symbols)
	return (x for x in permutations(symbols,n) if valid_pass(x))

# pick from the set of all valid passwords
def generate(n):
	if n < len(all_symbols):
		raise ValueError('must be at least {}'
 		                 ' characters long!'.format(len(all_symbols)))
	return choice(list(all_passwords(n)))

This is not efficient!

Another approach is to build a probability/decision tree. This tree would encode the proces of picking each symbol, and would represent which of the types of symbols we were allowed to pick for each position, given what we’ve already picked. e.g., if we’ve picked only lowercase letters so far, and we have only three symbols left in the password, we can only pick an uppercase letter, digit, or punctuation to fill those.

Once we have the tree, we can randomly pick a path through it by randomly branching at each junction. An example result might look like: 1. pick a lowercase letter 2. then, pick an uppercase letter 3. then, pick another lowercase letter 4. then, pick another lowercase letter 5. then, pick another lowercase letter 6. then, pick another lowercase letter 7. then, pick a digit (we could not have picked another lowercase or uppercase letter here) 8. then, pick a punctuation mark (we could not have picked a lowercase, uppercase, or digit here)

Let’s try building the entire tree, then pruning invalid branches. An invalid branch is a branch which would lead us to generating an invalid password.

Once we have built this decision tree, we can generate a password by traversing the tree and making the approriate random choices at each step. For the above example, that would mean: 1. lowercase letter: a 2. uppercase letter: J 3. lowercase letter: f 4. lowercase letter: w 5. lowercase letter: i 6. lowercase letter: p 7. digit: 8 8. punctuation: !

Our password is: aJfwip8!

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
from string import ascii_uppercase, ascii_lowercase, digits, punctuation
from itertools import product
from random import choice

all_symbols = ascii_uppercase, ascii_lowercase, digits, punctuation

# check if a path is valid: it must contain
#   a selection from each of the above sets of symbols
def valid_path(path):
	return all(x in path for x in all_symbols)

# generate all possible trees and prune this
#   list to only trees that contain valid paths
def tree(n):
	return [x for x in product(all_symbols, repeat=n) if valid_path(x)]

# pick a random path (where the path is of the form, e.g.,
#   uppercase -> lowercase -> digit -> lowercase -> digi -> punctuation -> uppercase) 
#   and go through that path picking randomly from each set of symbols
def generate(n):
	if n < len(all_symbols):
		raise ValueError('must be at least {}'
 		                 ' characters long!'.format(len(all_symbols)))
	return ''.join(choice(x) for x in choice(tree(n)))

Let’s see if we can do this more cleverly.

Instead of generating all of the possible paths through the decision tree and then pruning the invalid ones, we’ll generate only valid paths.

We can do this recursively, where each branch is aware of all of the branches leading to it. With this information, we can know when the branches must allow only certain symbols, because they haven’t been used yet, and we’re nearing the end of the password.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
from string import ascii_uppercase, ascii_lowercase, digits, punctuation
from random import choice

all_symbols = ascii_uppercase, ascii_lowercase, digits, punctuation

# recursively generate all valid paths
def tree(n):
	def recurse(state, depth):
		if not depth:
			return [state]

		# if we have `depth` steps left
		#   and we have the same number of unused
		#   symbols, then we have to pick from those only	
		symbols = all_symbols
		if len(set(all_symbols) - set(state)) == depth:
			symbols = list(set(all_symbols) - set(state))

		# take each possible branch
		tree = []
		for s in symbols:
			tree += recurse(state + [s], depth-1)
		return tree
	return recurse([], n)

def generate(n):
	if n < len(all_symbols):
		raise ValueError('must be at least {}'
 		                 ' characters long!'.format(len(all_symbols)))
	return ''.join(choice(x) for x in choice(tree(n)))

Now, in the above two cases, we generated the entire probability tree ahead of time. Can we dynamically generate just our single choosen path?

Yes, we can create a generator that records the state of the branches taken (in state.) It will first yield a random choice of which set of symbols (uppercase, lowercase, digit, or punctuation) to use for the first characters. Then, it will yield a random choice of the allowed remaining sets of symbols for the last characters (e.g., if we haven’t made a choice from the uppercase letters yet, you must make that choice within the last three symbols fof the password.)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
from string import ascii_lowercase, ascii_uppercase, digits, punctuation
from random import choice

all_symbols = ascii_lowercase, ascii_uppercase, digits, punctuation

# generate one particular path through the tree
#   as we step through it
def tree(n):
	state = []

	# we can pick freely for the first few branches
	for _ in xrange(n,len(all_symbols),-1):
		state.append(choice(all_symbols))
		yield state[-1] 

	# we have constrained choice here
	for depth in xrange(len(all_symbols),0,-1):
		# if we have `depth` steps left
		#   and we have the same number of unused
		#   symbols, then we have to pick from those only	
		symbols = all_symbols
		if len(set(all_symbols) - set(state)) == depth:
			symbols = list(set(all_symbols) - set(state))
		state.append(choice(symbols))
		yield state[-1] 

def generate(n):
	if n < len(all_symbols):
		raise ValueError('must be at least {}'
 		                 ' characters long!'.format(len(all_symbols)))
	return ''.join(choice(x) for x in tree(n))

Is there any way to refactor the above? Well, we can restructure this using two coroutines to separate the “rules” driving the construction of the decision tree.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
from string import ascii_lowercase, ascii_uppercase, digits, punctuation
from random import choice

all_symbols = ascii_lowercase, ascii_uppercase, digits, punctuation

# coroutines are somewhat inconvenient
#   to use in Python; this helps a little bit
#   by pre-priming/pre-pumping our coroutine
from functools import wraps
def nopump(gen):
	@wraps(gen)
	def wrapper(*args, **kwargs):
		g = gen(*args, **kwargs)
		next(g)
		return g.send
	return wrapper

# this coroutine generates a decision tree
#   by continuously branching using
#   a function that decides what branch
#   to take given all previous state
# note if branch raises a StopIteration
#   this will (correctly) cause tree to terminate
def tree(branch):
	state = []
	while True:
		state.append(branch(state)) # branch depending on state
		yield state[-1]

# this coroutine decides to take a branch
#   given some state allowing for our
#   two password rules: it takes only n branches
#   and as we near the terminal branches, we
#   ensure that select from only unused symbols
@nopump
def password_rules(n):
	state = yield
	for depth in xrange(n,0,-1):
		symbols = all_symbols
		if len(set(all_symbols) - set(state)) == depth:
			symbols = list(set(all_symbols) - set(state))
		state = yield choice(symbols)

def generate(n):
	if n < len(all_symbols):
		raise ValueError('must be at least {}'
 		                 ' characters long!'.format(len(all_symbols)))
	return ''.join(choice(x) for x in tree(password_rules(8)))

Don’t use this code.

By popular request, here’s another short, simple way to solve this problem. It’s a pretty good approach, and it models itself very closely to the structure of the password rules.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
from string import ascii_uppercase, ascii_lowercase, digits, punctuation
from random import choice, shuffle

all_symbols = ascii_uppercase, ascii_lowercase, digits, punctuation

def generate(n):
    if n < len(all_symbols):
        raise ValueError('must be at least {}'
                         ' characters long!'.format(len(all_symbols)))

    # pick the constrained symbols, then pick the unconstrained remaining
    password = [choice(x) for x in all_symbols] + \
               [choice(choice(all_symbols)) for _ in xrange(n-len(all_symbols))]

	# get a random permutation
    shuffle(password)

    return ''.join(password)

We can see the bug if we consider the question: how many valid passwords are there of length N that satisfy our requirements?

In the above example, we are generating a valid password by: 1. first, pick the characters we know we must have: pick one lowercase, pick one uppercase, pick one digit, and pick one punctuation mark 2. then, pick all the remaining characters we’re allowed: pick any type of symbol 3. randomly shuffle these

This maps nicely to the combinatoric solution. The number of valid passwords is the product of the independent events: 1. the number of ways we can pick one lowercase, one uppercase, one digit, one punctuation 2. the number of ways we can pick all the remaining characters 3. the number of ways we can permute the password

This comes to:

Unfortunately, however, the number of permtuations of the password is a fairly complex itself. Consider that if your password is aA0!aaaa, there are only 336 instead of 40320 distinct permutations!

The number of distinct permutations of a sequence with n elements and repeated elements m1 to mk is:

So the number of permutations will be the number of ways we can have repeated symbols by the number of permutations for each of these ways.

Does itertools.permutations give you unique permutations if there are repeated elements? No, you would have to do: set(permutations(xs)).

Does random.shuffle give you a uniformly distributed shuffling if there are repeated elements? Yes, because every sequence can appear the same number of repeated times, so the sample space is linearly scaled, and the probability of any given sequence is unaffected.

Next time, how would you extend the above approaches to handle: * a blacklist of specific, forbidden passwords: e.g., Pa$sw0rd * complex rules: e.g., the password cannot contain any symbol repeated more than twice in a row * what is the exact solution for the number of valid passwords given our initial rules?