A Complete Beginner-Level Python Course to Learn Data Science and Machine Learning
Day 5: Pandas
It is the fifth day of our journey to Learn all the Python we need for Machine Learning and DataScience. All of my new code is linked to the previous code from the earlier parts of this series. You can find them below.
Day 05
Pandas Series
Definition
A series in Python is a kind of one-dimensional array of any data type that we specified in the Numpy module. The only difference you can find was, each value in a Python series is associated with the index. The default index value of Python Series is from 0 to n-1, or you can specify your own indexes.
Pandas Series is nothing but a column in an excel sheet. As depicted in the picture below, columns with Name, Age, and Designation representing a Series
Purpose
Series is a one-dimensional labeled array capable of holding data of any type (integer, string, float, Python objects, etc.). The axis labels are collectively called indexes.
Importance
Series is a very important data structure in Python it is like an excel datasheet or a database that is used to save our data in the form of tables so that data will remain categorized and readable.
They are used in many ML Algorithms to hold different types of data and to perform different functions.
Strengths
It helps us in many functions to save the data in tabular form to use it and it makes the data more readable and the series datatype is used in many algorithms.
There are some algorithms build in the libraries for Python that requires the data in the series format as it is easy to manipulate the data in series.
It is very easy to index and geet the relevant subset of information from the series containing a large amount of data.
It is easy to replicate the series and to make changes in them and update it
Weakness
It does not has a weakness as I say but there are some functions that allow th manipulation on the series data and other functions.
Example 01
Task
create a pandas series that use index, values, and use slicing to get some data out of it.
Code
import pandas as pdimport numpy as npimport matplotlib.pyplot as pltnp.random.randn(5)s = pd.Series(np.random.randn(5), index=['a', 'b', 'c', 'd', 'e'])s.indexIndex(['a', 'b', 'c', 'd', 'e'], dtype='object')s.valuespd.Series(np.random.randn(5))studentsSeries=pd.Series(studentsDict)studentsSeriespd.Series({3.2,3.06,0.29,0.36,1},index=['s1','s2','s3','s4','s5'])studensMarks[0][0]studensMarks[0:3]
Output
array([ 0.58100127, -0.08284663, -1.10031443, -1.3100632 , 1.690669 ])a 1.363077
b -0.957584
c 0.192168
d -0.441061
e -0.904848
dtype: float64array([ 1.36307667, -0.95758415, 0.19216809, -0.44106068, -0.90484761])0 0.363901
1 -0.013482
2 1.105250
3 0.300175
4 0.867615
dtype: float64Student1 {'name': 'Muhammad Umair', 'age': 20, 'Departm... Student3 {'name': 'Muhammad Abdullah Tahir', 'age': 20,... Student2 {'name': 'Hamdan Ijaz', 'age': 22, 'Department... dtype: objectstudensMarks=pd.Series(studentsResult)studensMarks0 (Muhammad Umair, 3.06)
1 (Hamdad Ijaz, 2.8)
2 (Muhammad Abdullah Tahir, 2.7)
dtype: objectTypeError
Traceback (most recent call last)<ipython-input-168-57b600e94cf5> in <module>
----> 1 pd.Series({3.2,3.06,0.29,0.36,1},index=['s1','s2','s3','s4','s5'])~/anaconda3/lib/python3.8/site-packages/pandas/core/series.py in __init__(self, data, index, dtype, name, copy, fastpath)
272 pass
273 elif isinstance(data, (set, frozenset)):
--> 274 raise TypeError(f"'{type(data).__name__}' type is unordered")
275 elif isinstance(data, ABCSparseArray):
276 # handle sparse passed here (and force conversion)TypeError: 'set' type is unordered'Muhammad Umair'0 (Muhammad Umair, 3.06)
1 (Hamdad Ijaz, 2.8)
2 (Muhammad Abdullah Tahir, 2.7)
dtype: object
Example 02
Task
Get conditional Data out of series
Code
s[s > s.median()]s[[4, 3, 1]]
Output
a 1.363077
c 0.192168
dtype: float64e -0.904848
d -0.441061
b -0.957584
dtype: float64
Example 03
Task
Get exponent of the data using exp function and get the data out of it.
Code
np.exp(s)s['a']s['e'] = 12.s'e' in ss['f']
Output
a 3.908199
b 0.383819
c 1.211874
d 0.643354
e 0.404604
dtype: float641.3630766691037222a 1.363077
b -0.957584
c 0.192168
d -0.441061
e 12.000000
dtype: float64TrueTypeError Traceback (most recent call last)~/anaconda3/lib/python3.8/site-packages/pandas/core/indexes/base.py in get_value(self, series, key)
4410 try:
-> 4411 return libindex.get_value_at(s, key)
4412 except IndexError:
Example 04
Task
Get Data out of series using get function and apply +, *,’**’
Code
s.get('f')s.get('f', np.nan)s+ss*2s**2s = pd.Series(np.random.randn(5), name='something')s
Output
nan
a 2.726153
b -1.915168
c 0.384336
d -0.882121
e 24.000000
dtype: float64a 2.726153
b -1.915168
c 0.384336
d -0.882121
e 24.000000
dtype: float64a 1.857978
b 0.916967
c 0.036929
d 0.194535
e 144.000000
dtype: float640 1.278812
1 -0.416320
2 1.495156
3 0.313534
4 -1.240909
Name: something, dtype: float64
Example 05
Task
use name function on the series and use rename fun to change its name
Code
s.names2 = s.rename("different")s2.name
Output
'something'
'different'
More content at plainenglish.io