Published 12 Aug 2011
I recently came across an amazing technique for manipulating Python iterators in a recipe by Raymond Hettinger.
The technique consists in defining an iterator as a function of itself, thus making a cyclical definition. The recipe hinges on using itertools.tee to split the to-be iterator, which is forward-referenced in a function before it is built, and feeding one (or more) of the tee’d iterators into the other.
|— The Ouroboros comes to mind|
|(By AnonMoos [Public domain], via Wikimedia Commons)|
Bear with me, I’ll explain in a moment.
Cyclical iterators can be quite useful when defining infinite iterators for recursive sequences. The problem solved by the recipe (generating 5-smooth numbers) has this very recursive notion, in the sense that, given a set of these numbers, mutiplying each by 2, 3 and 5 yields a larger set of such numbers.
Or, to explore the inductive side of the problem, given an iterator for the first n 5-smooth numbers, you can define a larger iterator for 5-smooth numbers by exhausting the first one and multiplying each number it outputs by 2, 3 and 5 (and eliminating duplicates).
And that is exactly what the recipe does: starting with a seed value of 1 (the first 5-smooth number), it builds an iterator that swallows itself, multiplying each already-generated number by 2, 3 and 5, sorting increasingly and eliminating duplicates.
I came up with this different (simpler) example while getting my head around the technique, which I’ll explain in greater detail:
1 import operator 2 import itertools 3 4 def factorials(): 5 def delayed_output(): 6 for i in output: 7 yield i 8 9 result, feedback = itertools.tee(delayed_output()) 10 seeded = itertools.chain(, feedback) 11 output = itertools.imap(operator.mul, 12 seeded, 13 itertools.count(1)) 14 15 return result 16 17 if __name__ == "__main__": 18 print list(itertools.islice(factorials(), 10))
This returns a factorial iterator, which outputs sequentially 1!, 2!, 3! and so on. The inductive nature of the problem presents itself more clearly this time: given an iterator for the first n factorials, we can exhaust it and multiply the last value it generated by n+1 to obtain an iterator for the first n+1 factorials.
Let’s walk by it one line at a time.
Lines 1 to 3 aren’t of much interest. They only make a few imports we’ll need and define the function factorials(), which returns the factorial iterator.
Things start to get interesting in lines 04 to 07 as we define a generator inside factorials() which simply iterates through some output variable and yields every item in it. This is a delayed reference to output: we don’t want to evaluate it right now because it points to nowhere (that is, it’s not defined), but we want to use it anyway. Hence, we wrap output in a generator, so that it only gets evaluated when the generator is being traversed.
Then on line 9 we split the iterator returned by delayed_output() in two: one result, which we will return from factorials(), and a feedback iterator, which we will feed into itself.
This is the picture we have so far:
|— Diagram for line 9|
Note that output still points to nowhere, and if we get it to point to feedback somehow, then we will be making feedback indirectly point to itself.
Also note that I represented output twice. This emphasizes that, once we tee’d the delayed reference to it, we can have result and feedback pointing to different parts of the iteration. More on that later.
In line 10 we seed the cyclical iterator with the first value, 1!, which is just 1. We do that by defining another iterator, seeded, which will first yield the seed value, then proceed with the values it finds in feedback. Here’s the diagram:
|— Diagram for line 10|
And finally, on line 11, we define yet another iterator, which grabs one value from seeded, our inductive factorial generator, and one value from the infinite sequence 1,2,3,4…, multiplies them and yields the result. This is where the iterator for n factorials becomes an iterator for n+1 factorials.
But wait, what is that iterator called? output! The Python bites its own tail as we close the cycle and make feedback point to itself indirectly. Check out the diagram:
|— Diagram for line 11|
And then we return result, which remained unchanged since line 09. It helps to think that result always points to one value before seeded. That is, every time a new factorial is requested from result, seeded moves one position ahead, on to the next factorial.