Python Under The Hood: Part 2 — Interpretation Phase

Eng. Malek | مَالِكْ
Python in Plain English
5 min readJun 21, 2021

--

In this article, which is the second part of the “Python Under the Hood” series, I will be talking about the interpretation phase of the Python language, describing key features of the Python byte code and CPython. If you didn’t read the first part, you can refer to it here.

Let’s start by doing a quick recap; As we saw in the first part, Python language is actually a compiled and interpreted language. The compilation is done first to convert your source code into intermediate machine-independent byte codes, then the interpretation is happened to actually execute your code on the host machine.

So what is the Python Byte Code?

Python byte code is a set of instructions for the interpreter to execute your code. The implementation details of the byte codes change between versions, so what you see in this article may not be valid for other python versions.

An example of a byte string and it’s hexadecimal values

What is an interpreter?

An interpreter is a kind of program that implements a process virtual machine. This program's responsibility is to execute your code in the host os and, at the same time to provide a platform-independent environment for your code. There are primarily two ways to do such implementation, stack-based, and register-based. In the case of Python, it is a stack-based virtual machine.

Before diving into the details of the bytes code and how the interpreter work, an important topic to talk about is the code object. Everything in Python is an object; those objects are stored in the Heap memory and manipulated using References to them in the stack. The code object contains not only the bytecode but also some other information necessary for CPython to run the bytecode

Let’s try to examine the code object of a function and check some of its attributes:

def f(x):   z = 3   t = 5   def g(y):       return t*x + y   return g
a = 5
h = f(a)

to access the code object, we use the __code__ attribute:

>> f.__code__ <code object f at 0x7f44f2c679c0, file "<ipython-input-68-062f597fc1f3>", line 2>

co_consts returns a tuple containing the literals used by the bytecode:

>> f.__code__.co_consts(None, 3, 5, <code object g at 0x7f44f2c7a5d0, file “<ipython-input-53–062f597fc1f3>”, line 5>, ‘f.<locals>.g’)

co_varnames returns a tuple containing the names used by the bytecode, which can be global variables, functions, and classes, or attributes loaded from objects.

>> f.__code__.co_varnames('x', 'z', 'g')

co_name returns the name of the function

>> f.__code__.co_name‘f’

co_cellvars returns a tuple containing the names of nonlocal variables. These are the local variables of a function accessed by its inner functions.

>> f.__code__.co_cellvars('t', 'x')

co_freevars returns a tuple containing the names of free variables. Free variables are the local variables of an outer function that are accessed by its inner function.

>> f.__code__.co_freevars('t', 'x')

co_code returns to get the bytecode stored in a code object, you can use its co_code attribute

>> f.__code__.co_codeb'd\x01}\x01d\x02\x89\x00\x87\x00\x87\x01f\x02d\x03d\x04\x84\x08}\x02|\x02S\x00'

— — — — — — — — — — — — — — — — — — — — — — — — — — — — — —

In order to understand the holly world of the Python byte codes, we will try to access it and see what we can find!, in order to do so, i will by using the built-in function compile(), in addtion to some helper function from Understanding Python Bytecode article.

s=’’’a=1
b=2
c=a+b ‘’’
c=compile(s, “”, “exec”)
disassemble(c)

and we get:

1         0 LOAD_CONST               0 (1)
2 STORE_NAME 0 (a)

2 4 LOAD_CONST 1 (2)
6 STORE_NAME 1 (b)

3 8 LOAD_NAME 0 (a)
10 LOAD_NAME 1 (b)
12 BINARY_ADD
14 STORE_NAME 2 (c)
16 LOAD_CONST 2 (None)
18 RETURN_VALUE

The first instruction is LOAD_CONST 0. The instruction pushes the value co_consts[consti] onto the stack. So we are pushing co_consts[0] (which is equal to 1) onto the stack.

The same thing happens when an object or its value is popped off the stack. Again its reference is popped. The interpreter knows how to retrieve or store the object’s data using references to the actual objects in a heap.

The instructionSTORE_NAME 0 pops the element on top of the stack (which is 1) and stores it in an object. The reference to this object is co_names[0] which is a. These two instructions are the bytecode equivalent of a=1 in the source code. b=2 is converted similarly, and now the interpreter has created the objects a and b. The last line of the source code is c=a+b. The instruction

BINARY_ADD

pops the top two elements of the stack (1 and 2), adds them together and pushes the result (3) onto the stack. So now it is on top of the stack. After that, STORE_NAME 2 pops the top of the stack into the local object (referred by) c. The instruction LOAD_CONST 2 pushes co_consts[2]=None onto the stack, and the instruction

RETURN_VALUE

returns with the top of the stack to the function's caller. None Is the final result that remains on top of the stack. Figure 1 shows all the bytecode operations with offsets 0 to 14 (Again, it should be noted that the references to the objects are pushed onto the stack, not the objects or their values.

İllustraion of the bytecode operations

It worth mention here is there are different kinds of stacks depending that have different roles. To check them out, please refer to this article.

Final thoughts:

As we can see, there is a lot that is going under the hood when you run your simple two-line python codes, and I can say that we as programmers we are not required to know the bolts and nuts of python programming, however knowing them can differentiate you from the crowd 💚

if you want to dig more into this subject, I suggest a nice book by Obi Ike-Nwosu.

Thank you for reading, and stay tuned for the next blogs! 🙋

References

https://opensource.com/article/18/4/introduction-python-bytecode

http://beyondthegeek.com/2016/08/14/stack-based-vm-vs-register-based-vm

More content at plainenglish.io

--

--