Python Generators
What is a Generator?
A Python generator is a function that produces a sequence of results. It works by maintaining its local state, so that the function can resume again exactly where it left off when called subsequent times. Thus, you can think of a generator as something like a powerful iterator.
The state of the function is maintained through the use of the keyword yield
, which has the following syntax:
yield [expression_list]
This Python keyword works much like using return
, but it has some important differences, which we’ll explain throughout this article.
Generators were introduced in PEP 255, together with the yield
statement. They have been available since Python version 2.2.
How do Python Generators Work?
In order to understand how generators work, let’s use the simple example below:
# generator_example_1.py
def numberGenerator(n):
number = 0
while number < n:
yield number
number += 1
myGenerator = numberGenerator(3)
print(next(myGenerator))
print(next(myGenerator))
print(next(myGenerator))
The code above defines a generator named numberGenerator
, which receives a value n
as an argument, and then defines and uses it as the limit value in a while loop. In addition, it defines a variable named number
and assigns the value zero to it.
Calling the "instantiated" generator (myGenerator
) with the next()
method runs the generator code until the first yield
statement, which returns 1 in this case.
Even after returning a value to us, the function then keeps the value of the variable number
for the next time the function is called and increases its value by one. So the next time this function is called, it will pick up right where it left off.
Calling the function two more times, provides us with the next 2 numbers in the sequence, as seen below:
$ python generator_example_1.py
0
1
2
If we were to have called this generator again, we would have received a StopIteration
exception since it had completed and returned from its internal while loop.
This functionality is useful because we can use generators to dynamically create iterables on the fly. If we were to wrap myGenerator
with list()
, then we'd get back an array of numbers (like [0, 1, 2]
) instead of a generator object, which is a bit easier to work with in some applications.
The Difference Between return and yield
The keyword return
returns a value from a function, at which time the function then loses its local state. Thus, the next time we call that function, it starts over from its first statement.
On the other hand, yield
maintains the state between function calls, and resumes from where it left off when we call the next()
method again. So if yield
is called in the generator, then the next time the same generator is called we'll pick right back up after the last yield
statement.
Using return in a Generator
A generator can use a return
statement, but only without a return value, that is in the form:
return
When the generator finds the return
statement, it proceeds as in any other function return.
As the PEP 255 states:
Note that return means "I'm done, and have nothing interesting to return", for both generator functions and non-generator functions.
Let's modify our previous example by adding an if-else clause, which will discriminate against numbers higher than 20. The code is as follows:
# generator_example_2.py
def numberGenerator(n):
if n < 20:
number = 0
while number < n:
yield number
number += 1
else:
return
print(list(numberGenerator(30)))
In this example, since our generator won't yield any values it will be an empty array, as the number 30 is higher than 20. Thus, the return statement is working similarly to a break statement in this case.
This can be seen below:
$ python generator_example_2.py
[]
If we would have assigned a value less than 20, the results would have been similar to the first example.
Using next() to Iterate through a Generator
We can parse the values yielded by a generator using the next()
method, as seen in the first example. This method tells the generator to only return the next value of the iterable, but nothing else.
For example, the following code will print on the screen the values 0 to 9.
# generator_example_3.py
def numberGenerator(n):
number = 0
while number < n:
yield number
number += 1
g = numberGenerator(10)
counter = 0
while counter < 10:
print(next(g))
counter += 1
The code above is similar to the previous ones, but calls each value yielded by the generator with the function next()
. In order to do this, we must first instantiate a generator g
, which is like a variable that holds our generator state.
When the function next()
is called with the generator as its argument, the Python generator function is executed until it finds a yield
statement. Then, the yielded value is returned to the caller and the state of the generator is saved for later use.
Running the code above will produce the following output:
$ python generator_example_3.py
0
1
2
3
4
5
6
7
8
9
Note: There is, however, a syntax difference between Python 2 and 3. The code above uses the Python 3 version. In Python 2, the next()
can use the previous syntax or the following syntax:
print(g.next())
What is a Generator Expression?
Generator expressions are like list comprehensions, but they return a generator instead of a list. They were proposed in PEP 289, and became part of Python since version 2.4.
The syntax is similar to list comprehensions, but instead of square brackets, they use parenthesis.
For example, our code from before could be modified using generator expressions as follows:
# generator_example_4.py
g = (x for x in range(10))
print(list(g))
The results will be the same as in our first few examples:
$ python generator_example_4.py
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Generator expressions are useful when using reduction functions such as sum()
, min()
, or max()
, as they reduce the code to a single line. They're also much shorter to type than a full Python generator function. For example, the following code will sum the first 10 numbers:
# generator_example_5.py
g = (x for x in range(10))
print(sum(g))
After running this code, the result will be:
$ python generator_example_5.py
45
Managing Exceptions
One important thing to note is that the yield
keyword is not permitted in the try
part of a try/finally construct. Thus, generators should allocate resources with caution.
However, yield
can appear in finally
clauses, except
clauses, or in the try
part of try/except clauses.
For example, we could have created the following code:
# generator_example_6.py
def numberGenerator(n):
try:
number = 0
while number < n:
yield number
number += 1
finally:
yield n
print(list(numberGenerator(10)))
In the code above, as a result of the finally
clause, the number 10 is included in the output, and the result is a list of numbers from 0 to 10. This normally wouldn't happen since the conditional statement is number < n
. This can be seen in the output below:
$ python generator_example_6.py
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
Sending Values to Generators
Generators have a powerful tool in the send()
method for generator-iterators. This method was defined in PEP 342, and is available since Python version 2.5.
The send()
method resumes the generator and sends a value that will be used to continue with the next yield
. The method returns the new value yielded by the generator.
The syntax is send()
or send(value)
. Without any value, the send method is equivalent to a next()
call. This method can also use None
as a value. In both cases, the result will be that the generator advances its execution to the first yield
expression.
If the generator exits without yielding a new value (like by using return
), the send()
method raises StopIteration
.
The following example illustrates the use of send()
. In the first and third lines of our generator, we ask the program to assign the variable number
the value previously yielded. In the first line after our generator function, we instantiate the generator, and we generate a first yield
in the next line by calling the next
function. Thus, in the last line we send the value 5, which will be used as input by the generator, and considered as its previous yield.
# generator_example_7.py
def numberGenerator(n):
number = yield
while number < n:
number = yield number
number += 1
g = numberGenerator(10) # Create our generator
next(g) #
print(g.send(5))
Note: Because there is no yielded value when the generator is first created, before using send()
, we must make sure that the generator yielded a value using next()
or send(None)
. In the example above, we execute the next(g)
line for just this reason, otherwise we'd get an error saying "TypeError: can't send non-None value to a just-started generator".
After running the program, it prints on the screen the value 5, which is what we sent to it:
$ python generator_example_7.py
5
The third line of our generator from above also shows a new Python feature introduced in the same PEP: yield expressions. This feature allows the yield
clause to be used on the right side of an assignment statement. The value of a yield expression is None
, until the program calls the method send(value)
.
Connecting Generators
Since Python 3.3, a new feature allows generators to connect themselves and delegate to a sub-generator.
The new expression is defined in PEP 380, and its syntax is:
yield from
where
is an expression evaluating to an iterable, which defines the delegating generator.
Let's see this with an example:
# generator_example_8.py
def myGenerator1(n):
for i in range(n):
yield i
def myGenerator2(n, m):
for j in range(n, m):
yield j
def myGenerator3(n, m):
yield from myGenerator1(n)
yield from myGenerator2(n, m)
yield from myGenerator2(m, m+5)
print(list(myGenerator1(5)))
print(list(myGenerator2(5, 10)))
print(list(myGenerator3(0, 10)))
The code above defines three different generators. The first, named myGenerator1
, has an input parameter, which is used to specify the limit in a range. The second, named myGenerator2
, is similar to the previous one, but contains two input parameters, which specify the two limits allowed in the range of numbers. After this, myGenerator3
calls myGenerator1
and myGenerator2
to yield their values.
The last three lines of code print on the screen three lists generated from each of the three generators previously defined. As we can see when we run the program below, the result is that myGenerator3
uses the yields obtained from myGenerator1
and myGenerator2
, in order to generate a list that combines the previous three lists.
The example also shows an important application of generators: the capacity to divide a long task into several separate parts, which can be useful when working with big sets of data.
$ python generator_example_8.py
[0, 1, 2, 3, 4]
[5, 6, 7, 8, 9]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
As you can see, thanks to the yield from
syntax, generators can be chained together for more dynamic programming.
Benefits of Generators
- Simplified code
As seen in the examples shown in this article, generators simplify code in a very elegant manner. These code simplification and elegance are even more evident in generator expressions, where a single line of code replaces an entire block of code.
- Better performance
Generators work on lazy (on-demand) generation of values. This results in two advantages. First, lower memory consumption. However, this memory saving will work in our benefit if we use the generator only once. If we use the values several times, it may be worthwhile to generate them at once and keep them for later use.
The on-demand nature of generators also means we may not have to generate values that won't be used, and thus would have been wasted cycles if they were generated. This means your program can use only the values needed without having to wait until all of them have been generated.
When to use Generators
Generators are an advanced tool present in Python. There are several programming cases where generators can increase efficiency. Some of these cases are:
- Processing large amounts of data: generators provide calculation on-demand, also called lazy evaluation. This technique is used in stream processing.
- Piping: stacked generators can be used as pipes, in a manner similar to Unix pipes.
- Concurrency: generators can be used to generate (simulate) concurrency.
Wrapping Up
Generators are a type of function that generate a sequence of values. As such they can act in a similar manner to iterators. Their use results in a more elegant code and improved performance.
These aspects are even more evident in generator expressions, where one line of code can summarize a sequence of statements.
Generators' working capacity has been improved with new methods, such as send()
, and enhanced statements, such as yield from
.
As a result of these properties, generators have many useful applications, such as generating pipes, concurrent programming, and helping in creating streams from large amounts of data.
As a consequence of these improvements, Python is becoming more and more the language of choice in data science.
What have you used generators for? Let us know in the comments!