An Introduction to Python Data Structures — Hash-map, Tree, Graph
In this article, we will cover non-linear data structures in Python such as Tree, Hash-map, and Graph.
Prerequisite
You should have basic knowledge of the Python programming language.
The entire code for this article is available in my GitHub repo.
In the first part of this article, we covered the basics of data structures and linear data structures such as Stack, Queue, Linked-List, Arrays.
Now in this article, we will cover non-linear data structures such as Tree, Hash-map, and Graph.
What are Non-linear Data Structures?
A data structure in which data is not stored in a linear format like arrays or stack. This type of data structure is known as non-linear data structures.
Non-linear data structure is a recursive type of data structure.
In non-linear data structures, we cannot traverse the data structure in a linear format; there are multiple levels for storing the data.
The implementation of non-linear data structures is a little bit more complex than linear data structures.
There are various types of non-linear data structuresbut we will cover —
- Hash-map
- Tree
- Graph
Non-Linear Data Structures
1. Hash-Map —
Basically, Hash-map is a key-value pair type of data structure in which the key is calculated using the Hash function and then that data is stored as key (calculated by the Hash function).
Now, first we have to understand what is a hash function. So the hash function is the function that will calculate a key using any mathematical function so that we can store data at the key position. Here, the mathematical function could be anything. There is no fixed function for finding the key using a hash function.
But the main disadvantage of this hash function is that the hash function will return the same key for two different data. So there will be a collision at that key position. This is the major disadvantage of hash functions.
There are various ways to handle this collision. We will handle it by creating a list at that key if the hash function returns the same key for different data. And then append data to the list of that particular key.
Hash tables are generally implemented using arrays.
Basic operations you can perform on Hash-map
- Insert — Insert Item into the hash-map
- Delete — Delete item from the hash-map
- Get Item — get item from hash-map using keys
Implementation of Hash-map using Arrays —
class Hashmap:
def __init__(self):
self.MAX = 100
self.arr = [[] for i in range(self.MAX)]
def get_hash(self,key):
hash_key = 0
for char in key:
hash_key += ord(char)
return hash_key % 100 def add(self,key,value):
hash_key = self.get_hash(key)
found = False
for index,element in enumerate(self.arr[hash_key]):
if element[0] == key and len(element) == 2:
self.arr[hash_key][index] = (key,value)
found = True
break
if not found:
self.arr[hash_key].append((key,value))
def get(self,key):
hash_key = self.get_hash(key)
for element in self.arr[hash_key]:
if element[0] == key:
return element[1]
def __delitem__(self,key):
hash_key = self.get_hash(key)
for index,element in enumerate(self.arr[hash_key]):
if element[0] == key:
del self.arr[hash_key][index]
def print_hm(self):
values = [i for i in self.arr if i != []]
return values
Code explanation —
- First of all, we have to create a class named
Hashmap
and in that class, we have to initialize a list calledself.arr
of sizeself.MAX
. And in thatself.arr
, we will have an empty nested list (List in list, so ifself.MAX
is 10 there will be 10 empty lists inself.arr
). - Now, we will implement the most important method in Hash-map data structures which is
get_hash
. This method will take the key as an argument and calculate and returnhash_key
usingord
(this function will return numeric value for given string key) function. So using this method, all thehash_key
gets calculated. - Now the next method is
add
which is used to add the item to the hash map. This method will takedata
andkey
as a parameter. First, this method will calculate the hash value(or hash key) for that givenkey
parameter usingget_hash
function. After that, we will traverse all the lists to find thathash_key
position and check if there are any elements added at thathash_key
before this data. And if there is an element at the givenhash_key
list, then we will append newdata
as a form of tuple in that list. This tuple will includekey
anddata
. And if there is no data at that hash_key, then we will simply add that data to thathash_key
list. - Now, the next method is
get
which is used to fetch items from the hash map. This method will takekey
as a parameter. First, this method will calculate the hash value(orhash_key
) for that given key parameter usingget_hash
function. After that, we will traverse through the list of a givenhash_key
position and check if there are any elements in that list and if there is(key, data)
tuple available, then it will returndata
for the givenkey
parameter. - Next is
__delitem__
. This method will take the key as a parameter and just like all other methods, it will calculatehash_key
usingget_hash
method. After that, it will traverse through the list at givenhash_key
and find the tuple with the given parameterkey
and delete that item. - And the
print_hm
method is self-explanatory so no need to describe that.
2. Tree—
Tree data structure is a hierarchical type of data structure in which there is a main root node which has two fields — data and children. In the data field, it stores data and in children field, it stores its children nodes.
So mainly, a tree has two main parts — root node and sub-tree.
As the name suggests, trees have the same structure as a real tree but only upside-down.
Trees are a very broad data structure and I cannot describe everything about them on this blog.
There are many types of Trees —
- General Tree (this is the tree which parent node can have as many children as we want to add).
- Binary Tree (in this type of tree a parent node can only have 2 or less than 2 child).
- Binary Search Tree (this tree is the same as binary tree; the only difference is the left child node of the parent can have a value less than its parent and the right child node can have a value greater than its parent).
There are many other Trees such as Red-black tree, AVL tree, N-ary Tree, etc.
Basic operations you can perform on a Tree —
- Add Child (Add the child to the node)
- Delete Child (Delete child from the node)
- Print (Print whole Tree)
- Get Level (Get the level of the node)
class TreeNode:
def __init__(self,data):
self.data = data
self.parent = None
self.children = []# For Adding child to tree
def add_child(self,child):
child.parent = self
self.children.append(child)
# For Getting level to any node
def get_level(self):
level = 0
p=self.parent
while p:
p=p.parent
level+=1
return level# For Printinf whole treedef print_tree(self):
spaces = " " * self.get_level() * 3
prefix = spaces+"|---" if self.parent else ''
print(prefix+self.data)
if self.children:
for child in self.children:
child.print_tree() # For Deleting child from tree
def delete_child(self,child):
if child in self.children:
self.children.remove(child)
print(f"{child.data} Removed from f{child.parent.data}")
else:
print(f'There is no {child.data} in Tree')#Example for making code explanation Easy
def electronincs_tree():
root = TreeNode('Electronics')
hp = TreeNode('HP')
mac = TreeNode('Mac-Book')
dell = TreeNode('Dell') laptops = TreeNode('Laptops')
laptops.add_child(hp)
laptops.add_child(mac)
laptops.add_child(dell)
motorola = TreeNode('Motorola')
apple = TreeNode('Apple')
nokia = TreeNode('Nokia') mobiles = TreeNode('Mobiles')
mobiles.add_child(motorola)
mobiles.add_child(apple)
mobiles.add_child(nokia)
reconnect = TreeNode('Reconnect')
lg = TreeNode('LG')
samsung = TreeNode('Samsung') tvs = TreeNode('TVs')
tvs.add_child(reconnect)
tvs.add_child(lg)
tvs.add_child(samsung)
root.add_child(laptops)
root.add_child(mobiles)
root.add_child(tvs)
root.print_tree()
tvs.delete_child(lg)
root.print_tree()
electronincs_tree()
Code explanation —
- So in this code explanation, I have taken used an example as otherwise it would be very difficult to explain using only code.
- So firstly, we will create a class for the tree node. In this, we will initialize three variables
self.data
,self.parent
,self.children
using the__init__
method. Here,self.data
will store data of that tree node.self.parent
will store the reference to the parent of that tree node andself.children
will store the list of all the child tree nodes of the current tree node. - Now, the first method that we will implement is
add_child
which has one parameter calledchild
. This method will simply add a new child to the parent tree node (for example, inelectronincs_tree
method, first we have created a few instances of the tree node and then we are adding that tree node to their parents using theadd_child
method). - The second method we will implement is
delete_child
which also has one parameter calledchild
. Firstly, this method will check if that parent Tree Node has any child that we want to delete; and if yes, then it will remove it from theself.children
list. And if there is no child that we want to delete, then it will print that “there is no child like this in theself.children
list” (for example, in theelectronincs_tree
method, we are removinglg
tree node fromtvs
parent tree node). - Now we will implement
get_level
method. This method does not take any parameters. This method will return the level of tree node by which we are calling this method. So what this method will do is create one variable calledlevel
and give value to0
and then we will create awhile
loop and loop through all the parent of the given tree node and if tree node has a parent tree node, thenlevel
value will be increased by 1. And loop will terminate when we reach the top of the tree, that is the root tree node of the tree. And then, we will return thatlevel
. - Now the last method is the
print_tree
method. This method simply just prints all the node hierarchicies so we can visualize the whole tree. This method will decide hierarchy by theget_level
method. And then assign the appropriate prefix to data to print them in hierarchy as per the level of the tree node.
3. Graph—
Graph is a non-linear data structure that consists of vertex and edge. Here the vertex is also called a node. In Graph data structure, vertex or node contains data or some value to be stored. And edges connect those vertices. And by combining both one or more vertex and zero or more edges, we get the graph data structure.
The graph and tree data structures are the most common and favourite data structures asked in interview questions.
There are so many problems that we can solve using the Graph data structure.
For example:
- Finding the shortest path between point A to point B.
- Finding the connection between two users on social media (Facebook, LinkedIn).
- Finding the best route in computer networks.
- Finding the best route between two cites or places.
There are many more problems that can be solved using the graph data structure.
Also, there are many types of graphs like Null Graph, Infinite Graph, but we can not cover everything about graphs in this article so here is the link to learn more about it.
Code explanation —
- Now in this code, we have created a class called
Graph
which takesedges
as parameters as a list of tuples. Here theroutes
variable is an example of edges variable which has a list of tuples in which the first value of the tuple represents the start of the edge and the second value represents the end of that particular edge ((“Mumbai”, “Paris”)
Mumbai is the start of that edge and Paris is the end of that edge; and all the cities are represented as nodes). - Now in the
__init__
method of this class, we will create agraph_dict
variable which is a representation of the graph as a dictionary. So, in this method, we will loop through all the tuples of theedges
list and create the graph accordingly. It will check if any other entry of the start of the edge is added tograph_dict
variable or not, and if it is added, then it will append end value tograph_dict[start]
. And if the value of start is added for the first time tograph_dict
, then it will create a new key-value pair. - Now the second method we have implemented in the Graph class is the
get_paths
method which takes three arguments as parameters —start
,end
,paths
. Here,start
represents the start of the path from where we want to find the path andend
represents the end of the path to which path should terminate or end. This method will find all the paths from start to end. - First, this method will check if
start
andend
variables are the same and if so, then it will simply returnstart
variable as a path. This method will also check ifstart
node ingraph_dict
’s keys list or not. And ifstart
node is not ingraph_dict
’s keys list, then it means that there is no path fromstart
toend
. So, at that time, this method will simply return an empty list. Otherwise, this method will create an empty list calledpaths
. In thefor
loop section of this method, we will go through all the adjacent nodes of thestart
node and on that adjacent node, we will use recursion to find the path from the adjacent node to theend
node and after that, append it to the mainpath
variable. And it will append that path to thepaths
list and at the end of the method, we will return thepaths
list that contains all the paths fromstart
toend
.
Conclusion
So here in this article, we have covered the basics of data structures and non-liners data structures.
In case you missed the first part of this article, here is the link.
Hope you guys have enjoyed reading and learning.
If you liked this article, feel free to leave a clap and share it with others so that they can learn from it too.