Python — Iterators and Generators

A simple introduction to memory-effective iteration in Python

Ihor Lukianov
7 min readNov 5, 2023

This article focuses on one of the essential concepts of data processing in Python, which is iteration. In simple words, iteration helps us overcome the problem of limited memory by allowing us to use the data one at a time, as and when needed, thereby enabling lazy execution. All standard collections in Python are iterable, and the following operations are supported by the iterator:

  • for loops
  • list, dict, and set comprehensions
  • unpacking assignments

But why should we matter about iterators and generators?

That’s a great question. Although we could perform the same operation without using generators, it can be difficult to manage memory when running a program that involves data processing. This is where generators come in handy. They use significantly less memory, making them a more efficient option for operations. To begin, let’s explore how iterators function and then delve into the equally important topic of generators.

Iter function

In Python, a lot of processing occurs behind the scenes, and iterators are used more frequently than one might imagine. When we attempt to iterate over a sequence, Python automatically calls the function iter(x). We can either call this function explicitly or use it within a for-loop, and the outcome is identical either way.

x = [1, 2, 3, 4, 5]

for i in x: # funcion iter is called by default
print(i)

for i in iter(x):
print(i)

# both variants provide the same output

What happens in the background, when we call this function:

We can manually iterate through an iterator object by using the next() function. When the iterator has no more objects left, trying to get the next one will result in a StopIteration error being raised. This is an interesting feature to keep in mind while working with iterators in Python.

x = [1, 2, 3, 4, 5]

it = iter(x)

for i in range(5):
print(next(it))

If we need to create an iterator for any of our classes, we can achieve this by defining a separate iterator class. You might wonder why we can’t add this code to the existing class. The answer is pretty tangles — we want our class to be iterable (have this functionality) but not an iterator (providing this functionality). This distinction might be clearer with an example. Consider a situation where we have numbers in a string separated by a space. We can make our class iterable so that we can use it in iterations. You can experiment with the instance and the next() method to get a better understanding of this concept.

class NumberFromSequence:
# we get numbers daily in a string
def __init__(self, daily_results: str) -> None:
self.daily_results = daily_results
self.results_separated = daily_results.split(' ')

def __iter__(self):
return NumbersIterator(self.results_separated)

class NumbersIterator:
def __init__(self, numbers) -> None:
self.numbers = numbers
self.index = 0

def __next__(self):
try:
number = self.numbers[self.index]
except IndexError:
raise StopIteration()
self.index += 1
return number

def __iter__(self):
return self

nums = NumberFromSequence('20 31 21 54 90')
for i in nums: # we call class instance, not the list nums.results_separated
print(i)

# 20
# 31
# 21
# 54
# 90

Generators

Let’s now shift our focus to the main topic of this article — generators. The concept behind generators is quite similar. In general, any function that makes use of the yield keyword is considered to be a generator function. When we call such a function, it returns a generator object. Here’s a simple example to help you understand how this works.

def basic_gen(): 
yield 1
yield 2

print(basic_gen())
# <generator object basic_gen at 0x000002680F04A810>

for i in basic_gen():
print(i)
# 1
# 2

# we can also use next() with generator objects
basic_gen = basic_gen()
print(next(basic_gen))
# 1

There is a clear difference between using yield and return in a function. The most notable thing is that with yield, we can have multiple outputs from the function, which gives us more control over our data processing. In contrast, with a basic function, we only get the output after all the calculations are done, usually in a list. With yield, we can return values one by one, which is much faster than the conventional approach.

Compare function with return on the left and generator on the right. Source: https://www.logilax.com/python-yield-vs-return/

Let’s come back to our example from the previous section and try to implement the same, but using generators. It’s very simple and we won’t provide unnecessary functionality to our class (and even create a new one).

class NumberFromSequence:
# we get numbers daily in a string
def __init__(self, daily_results: str) -> None:
self.daily_results = daily_results

def __iter__(self):
# make it more lazy - not storing list of numbers to attribute
for number in self.daily_results.split(' '):
yield number # no need to explicit return

nums = NumberFromSequence('20 31 21 54 90')
for i in nums:
print(i)

As we can see, this implementation is way quicker and not so complex. This is how you can understand the difference between the iterator and generator — we implement the iterator with the __next__ method, while the generator does this automatically with the yield keyword.

What’s the matter with memory?

Previously, I mentioned that generators can help us reduce memory usage for operations. However, this might seem contradictory as the amount of output data received is the same as in regular for-loops. To clarify this, let’s compare the sizes of two generators that have different numbers of inputs.

import sys

def gen_list(numbers):
yield from numbers

# small case
numbers = list(range(10))
gen_1 = gen_list(numbers)
size_1 = sys.getsizeof(gen_1)

# more numbers in list
numbers = list(range(10000000))
gen_2 = gen_list(numbers)
size_2 = sys.getsizeof(gen_2)

print(size_1, size_2)
# 192 192

Regardless of the size of the input data, the generator will allocate 192 bytes of memory for the operation. While this approach may not have a dramatic difference for small problems, it becomes increasingly important for larger datasets. Imagine you have a file that is 20GB in size. In such a case, it is not practical to load the entire file into memory at once. By using a generator, you can avoid this issue and save on memory usage.

itertools library

When discussing generators in Python, it is essential to mention and understand the itertools library. This is a built-in library, so there is no need to install it separately. The library currently offers around 20 various generator functions, all of which can be found in the documentation. We will only cover a few of them as examples.

With the itertools.count function we can create an arithmetic progression generator. This tool creates an infinite sequence, so you can use it as many times as you need.

from itertools import count
# create a generator
gen = count(100, 10)

# we can create an infititive progression if needed with while loop
for i in range(5):
print(next(gen))

Let me provide you with an example of a terminating generator that offers a limited number of outputs. There’s a very useful function itertools.batched that can help us with this. We need to provide a sequence and an integer n, and this function will yield tuples of size n. However, the last output might be shorter if the remainder is less than n. It’s important to note that this function is relatively new and is only available starting from Python version 3.12.

from itertools import batched

data = list(range(0, 30, 3)) # creates list of 10 items
gen = batched(data, 4) # build a generator

for i in gen:
print(i)
# (0, 3, 6, 9)
# (12, 15, 18, 21)
# (24, 27)
# last output has only 2 items - all that remained in starting list

Using yield from syntax as a subgenerator

It is worth noting that we can simplify the syntax for using the yield keyword in loops. This feature was added to Python 3.3 a long time ago. To illustrate, let’s revisit the previous example where we used a loop in our __iter__ method.

class NumberFromSequence:
# we get numbers daily in a string
def __init__(self, daily_results: str) -> None:
self.daily_results = daily_results
self.results_separated = daily_results.split(' ')

def __iter__(self):
# not it's written in one line
yield from self.results_separated

nums = NumberFromSequence('20 31 21 54 90')
for i in nums:
print(i)

Generic Iterable Types

When we define an iterator function, it can be useful to specify the type of the output. To achieve this, we can make use of collections.abc.Iterator. Let’s take the example of the simple Fibonacci generator. This approach can also be applied to classes when defining __next__ methods. While collections.abc.Generator can be used for this purpose as well, generators are already used as iterators and don’t require specification.

from collections.abc import Iterator

def fibonacci() -> Iterator:
a, b = 0, 1
while True:
yield a
a, b = b, a + b

Conclusion

I hope you found the information on working with iterators and generators in Python useful. It’s a simple concept that you may have encountered before, but utilizing generators and the itertools library can improve your code in numerous ways.

Iteration is a fundamental aspect of Python, and almost all data processing programs rely on it. Therefore, I recommend exploring more efficient methods of these processes and incorporating the yield keyword in your future code.

I can be found on Linkedin and I am looking forward to connecting with you. Let’s engage in discussions about the intricate world of data science and data engineering.

--

--

Ihor Lukianov
Ihor Lukianov

Written by Ihor Lukianov

Data Engineer @Namecheap. Interest in Data Engineering and NLP/NLU problems.

Responses (1)