Data Extraction: Parse a 3-Nested JSON Object and Convert it to a pandas dataframe

Tejeswini
Python in Plain English
4 min readJun 6, 2021

--

Reference: Jason Bourne Poster

JSON, or JavaScript Object Notation, is a human-readable text-based format for data exchange between a server and web application, as an alternative to XML.

Querying API in Python with JSON Output

As a follow-up to my previous blog which mentions rapidapi.com as one of the data sources to extract OHLC (Open, High, Low and Close price) data for cryptocurrency, let's look at querying an API in Python.

We can use the requests package for this purpose. To get the data as a JSON output, we call the request.get method. The response variable in the below code, stores the returned values, and the JSON output is parsed using the json( ) method.

API request to rapidapi.com

The request.get method has the following parameter values: requests.get(url, params={key: value}, args).

Detailed documentation can be looked at here:

What is a Nested-JSON ?

A JSON object is a collection of key and value pairs.

For example, { ‘data’ : [ {‘screen_ID’ : ‘63’, in this instance ‘data’ and ‘screen_ID’ are the keys, and ‘63’ is the value. Keys must be strings, and values must be a valid JSON data type (string, number, object, array, boolean or null).

So, what is a Nested-JSON? A Nested-JSON is a JSON object which has other JSON objects or Javascript arrays as its values.

Nested-JSON Types (Source:www.digitalocean.com)

The API request to rapidapi.com produces a nested JSON output with Javascript arrays, structured as follows:

JSON output of API request to rapidapi.com

JSON Output to Pandas Dataframe

Each nested JSON object has a unique access path. To get first-level keys, we can use the json.keys( ) method. In this case, it returns ‘data’ which is the first level key and can be seen from the above image of the JSON output.

pd.json_normalize is a function of pandas that comes in handy in flattening the JSON output into a datatable. So, using the first level key in the following code format returns a datatable like below:

df stores flattened table of json output

Our problem statement here is to extract the keys and corresponding values of ‘color’, ‘date’, ‘high’, ‘low’, ‘open’, ‘perc_chg’, ‘price’, ‘vol’ to a data table format. These values are contained in ‘screen_data.data’( To access nested fields, concatenate the field names with a . (dot) as separator) column of the df, dataframe, using below code we extract this column to a datatable:

dt stores df[‘screen_data.data’]

But here all the values are stored as a list. To transform each element of this list to a row in the dataframe, we can use pd.dataframe.explode. This returns exploded lists as rows of the subset columns; index will be duplicated for these rows as below.

Here, dt stores exploded dataframe, dt

Now that we have a dictionary in each row of the datatable dt, we can use pandas.Series to convert the dictionaries to a Pandas series format and apply pandas.Dataframe to further convert it into type dataframe.

Final datafram df_final

“Data extraction through API requests and web scraping is especially useful for analyzing or automating data service on web-based applications.”

The JSON format is already compatible in itself and pandas functions add extra flexibility in making the parsing and readability of this data more user-friendly.

Follow me on Medium for more articles on cryptocurrency and topics related to Data Analysis, AI, Machine Learning, Design and Technology

Please feel free to share your thoughts, comments and feedback

Follow me here on Medium or connect with me on LinkedIn to collaborate and share ideas.

More content at plainenglish.io

--

--