Data Extraction: Parse a 3-Nested JSON Object and Convert it to a pandas dataframe
JSON, or JavaScript Object Notation, is a human-readable text-based format for data exchange between a server and web application, as an alternative to XML.
Querying API in Python with JSON Output
As a follow-up to my previous blog which mentions rapidapi.com as one of the data sources to extract OHLC (Open, High, Low and Close price) data for cryptocurrency, let's look at querying an API in Python.
We can use the requests package for this purpose. To get the data as a JSON output, we call the request.get method. The response variable in the below code, stores the returned values, and the JSON output is parsed using the json( ) method.
The request.get method has the following parameter values: requests.get(url, params={key: value}, args).
Detailed documentation can be looked at here:
What is a Nested-JSON ?
A JSON object is a collection of key and value pairs.
For example, { ‘data’ : [ {‘screen_ID’ : ‘63’, in this instance ‘data’ and ‘screen_ID’ are the keys, and ‘63’ is the value. Keys must be strings, and values must be a valid JSON data type (string, number, object, array, boolean or null).
So, what is a Nested-JSON? A Nested-JSON is a JSON object which has other JSON objects or Javascript arrays as its values.
The API request to rapidapi.com produces a nested JSON output with Javascript arrays, structured as follows:
JSON Output to Pandas Dataframe
Each nested JSON object has a unique access path. To get first-level keys, we can use the json.keys( ) method. In this case, it returns ‘data’ which is the first level key and can be seen from the above image of the JSON output.
pd.json_normalize is a function of pandas that comes in handy in flattening the JSON output into a datatable. So, using the first level key in the following code format returns a datatable like below:
Our problem statement here is to extract the keys and corresponding values of ‘color’, ‘date’, ‘high’, ‘low’, ‘open’, ‘perc_chg’, ‘price’, ‘vol’ to a data table format. These values are contained in ‘screen_data.data’( To access nested fields, concatenate the field names with a . (dot) as separator) column of the df, dataframe, using below code we extract this column to a datatable:
But here all the values are stored as a list. To transform each element of this list to a row in the dataframe, we can use pd.dataframe.explode. This returns exploded lists as rows of the subset columns; index will be duplicated for these rows as below.
Now that we have a dictionary in each row of the datatable dt, we can use pandas.Series to convert the dictionaries to a Pandas series format and apply pandas.Dataframe to further convert it into type dataframe.
“Data extraction through API requests and web scraping is especially useful for analyzing or automating data service on web-based applications.”
The JSON format is already compatible in itself and pandas functions add extra flexibility in making the parsing and readability of this data more user-friendly.
Follow me on Medium for more articles on cryptocurrency and topics related to Data Analysis, AI, Machine Learning, Design and Technology
Please feel free to share your thoughts, comments and feedback
Follow me here on Medium or connect with me on LinkedIn to collaborate and share ideas.
More content at plainenglish.io