Python: Pitfalls With Variable Capture

Python has a few surprising gotchas. One such surprise that most developers will eventually come across is the behavior related to variable capture. Variable capture is a feature by which variables (non-local or global) are utilized by objects far from where those objects or variables were defined. Capturing variables extends the lifetime of the variables beyond the natural scope in which they were defined. Variables can be captured in the definitions of lambdas, functions, methods, properties and classes.
Creating objects that make use of variables defined outside of those objects’ scope must be done with care, lest you end up with unexpected results. This post will demonstrate a few pitfalls and methods by which you can avoid them.
LOOPING OVER LAMBDAS
Consider the following code that creates a list of lambdas, each of which prints a value of 2 raised to the power of some number:
#-------------------------------------------------------------------
# Test: Capturing with lambdasdef make_function_list():
result = []
for i in range(10):
result.append(lambda: 2**i)
return resultprint([f() for f in make_function_list()])#-------------------------------------------------------------------
# Output# [512, 512, 512, 512, 512, 512, 512, 512, 512, 512]
The value i
ranges from 0 to 9 and a lambda is appended to result
. One might assume that because the lambda is an expression that exists as a parameter, the value i
is captured by a unique lambda on subsequent iterations of the loop.
The truth is, that’s only half true.
The result might surprise you if you were expecting values ranging from 1 to 512 instead of identical values of 512. What’s going on? Didn’t we create a new lambda object on each iteration? Well, yes we did. We can prove that by inspecting the lambdas’ object ids:
#-------------------------------------------------------------------
# Test: lambda inspectionuniques = []
lambdas = make_function_list()for f in lambdas:
if f not in uniques: uniques.append(f)print(len(lambdas) == len(uniques))
#-------------------------------------------------------------------
# Output# True
In the above snippet, we iterate over the generated lambdas and see that, indeed, each lambda object is unique. The number of unique lambdas is equal to the length of the entire set. Let’s return to the original question: What’s going on?
Let’s take a closer look at the code from above:
for i in range(10):
result.append(lambda: 2**i)
return result
Here, the variable i
is captured by each lambda object in the iteration, but it is just a reference to a value. As such, each lambda object captures the same reference! As the value of i
changes, the reference to that integer location does not. As a result, when the lambdas are later called in succession, each one individually has a reference to the same i
and the identical value that it references, namely 512, the last value assigned in the loop.
How do we get this code to do want we want, i.e. generate a unique value based on the iteration? For that, we need to create a new i
each time! To address this, we’ll need to define a lambda generating function and redefine make_function_list
to call it:
#-------------------------------------------------------------------
# Test: Create a lambda that captures a unique index variable.def make_function_list():
result = []
for i in range(10):
def lambda_generator(i): #<-- A different 'i'!
return lambda: 2**i
result.append(lambda_generator(i))
return resultlambdas_2 = make_function_list()
print([f() for f in lambdas_2]) #<-- call each generated lambda.#-------------------------------------------------------------------
# Output# [1, 2, 4, 8, 16, 32, 64, 128, 256, 512]
Crazy, ain’t it? Well, it’s not too crazy if you understand what’s going on under the cover. Naively, you might assume that we’d have the same problem withi
. After all, it’s still a reference. This time, though, the i
that is defined in the nested lambda_generator
function is a different reference, with a different id. The variable i
becomes a non-local variable to the lambda and is captured uniquely. We can use the inspect module to verify that the i
variables that were captured (and their values) are indeed unique:
#-------------------------------------------------------------------
# Test: Inspect the lambdas' captured 'i' variable.from inspect import getclosurevars as closure_vars# A method to inspect the captured values:
def captured_variable(function, name):
return closure_vars(function).nonlocals[name]# Looking for the value of the variable 'i' captured by
# each lambda:
print ([captured_variable(f,'i' ) for f in lambdas_2])#------------------------------------------------------------------
# Output# [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
The getclosurevars
function from the inspect
module will allow us to see the variables the lambda has captured. Capturing occurs when a lambda needs a variable from the executing context to perform its operation. In this case, it needs i
, a non-local variable defined in the lambda_generator
scope.
The function will return a named tuple ClosureVars
with attributes (nonlocals, globals, builtins, unbound)
. The i
value we were looking for exists in a dictionary within this tuple called nonlocals
. The function we defined, captured_variable
, will extract the value of any captured, non-local variable by name.
What have we learned? Captured values are not copied by value and assigned to new variables owned by the lambda. They exist as independent, non-local (or even global) variables that a lambda holds a reference too. In that light, the lambda does conform to Python’s design philosophy.
FOOLING WITH FUNCTIONS
Functions, as you might have come to expect, have the same issue as lambdas. If you create them in a loop, you’d better create a function to create your function, if you want to capture variables uniquely!
#-------------------------------------------------------------------
# Test: Capturing variables in functionsfunction_list = []
for i in range(10):
def value():
return 2**i
function_list.append(value)print([value() for value in function_list])#-------------------------------------------------------------------
# Output# [512, 512, 512, 512, 512, 512, 512, 512, 512, 512]
By now, you should have expected that this was going to be the result. The solution is the same as with lambdas: we have to create a function that returns our function, however odd that may seem!
#-------------------------------------------------------------------
# Test: Defining functions to create functions to capture variablesfunction_list = []
for i in range(10):
def make_value_function(x):
def value():
return 2**x
return value
function_list.append(make_value_function(i))print([value() for value in function_list])#-------------------------------------------------------------------
# Output# [1, 2, 4, 8, 16, 32, 64, 128, 256, 512]
Success! Let’s take a look at one more situation.
CAREFULLY CRAFTING CLASSES
Now that we know to look out for this behavior from lambdas and functions, are there any other quirks we should know about? At least one other comes to mind. Nested classes behave the same way. In python, virtually everything is an object, including class and member function definitions.
Just as lambdas can be created anew within functions, classes and functions can as well. Consider the following:
#-------------------------------------------------------------------
# Test: Class variable uniquenessclass_list = []
for i in range(5):
class A:
value = i #<-- Assignment, not capture. def get_value():
return i #<-- Capture of loop's non-local `i` class_list.append(A)print([(Class.value,Class.get_value()) for Class in class_list])#-------------------------------------------------------------------
# Output# [(0, 4), (1, 4), (2, 4), (3, 4), (4, 4)]
Of course the same issue exists here, but the difference is how i
is dereferenced depending on whether it is assigned to a class variable or used within a class’s method. With the class variable A.value
, the value of i
is assigned to the variable. In contrast, the class method A.get_value()
captures the variable reference to i
. As a result, we see different values, depending on how i
is used.
To fix, we’ll apply the same technique, creating the class inside a helper function, in this case make_class
:
#-------------------------------------------------------------------
# Test: Fixing Class variable uniqueness
class_list = []
for i in range(5):
def make_class(i):
class A:
value = i #<-- Assignment, not capture. def get_value():
return i #<-- Captured make_class's non-local `i` return A class_list.append(make_class(i))print([(cls.value, cls.get_value()) for cls in class_list])#-------------------------------------------------------------------
# Output# [(0, 0), (1, 1), (2, 2), (3, 3), (4, 4)]
In this example, the method make_class
was added to create a local variable i
that uniquely captures the value of the loop’s i
passed in at each iteration. It is important to note that the i
that was captured is not the same i
defined in the loop!
FINAL THOUGHTS
I hope you enjoyed this article. This particular issue is something that I believe most Python programmers will run into at some point early in their Python experience. When it happens, it’s a head-scratcher to be sure. Hopefully, armed with the knowledge presented here, you’ll be confidently capturing variables from the get-go!