Python in Plain English

New Python content every day. Follow to join our 3.5M+ monthly readers.

Follow publication

How I improved RMSE on Big Mart competition question using CatBoost

Crystal X
Python in Plain English
4 min readNov 28, 2020

--

Analytics Vidhya’s Big Mart Sales practice problem was one of my earlier tries at scoring well in a data science competition. At that time I still knew very little about data science, but decided to have a go at it with my limited skills. As I experimented on different models, I was able to achieve a score of 1154 using Keras tensorflow a few months ago.

Having learned a few more techniques and models, I decided to look at this problem again with a view to improving the algorithm and obtaining a higher score.

The datasets for this practice problem can be found on the Analytics Vidhya website:- https://datahack.analyticsvidhya.com/contest/practice-problem-big-mart-sales-iii/#MySubmissions

Excerpts from the Big Mart practice problem statement read as follows:-

“The data scientists at BigMart have collected 2013 sales data for 1559 products across 10 stores in different cities. Also, certain attributes of each product and store have been defined. The aim is to build a predictive model and predict the sales of each product at a particular outlet.

Using this model, BigMart will try to understand the properties of products and outlets which play a key role in increasing sales.

Please note that the data may have missing values as some stores might not report all the data due to technical glitches. Hence, it will be required to treat them accordingly.”

I wrote the program for this practice problem on Google Colab because it is free and has most of the libraries already installed, which only need to be imported into the system. Pandas and numpy are standard libraries dealing specifically with text and numeric manipulation, and should always be installed in any Python program. Matplotlib and seaborn are libraries dealing specifically with graphical representation of data.

After the libraries have been imported, the train, test and sample files will need to be loaded and read into the program:-

--

--

Published in Python in Plain English

New Python content every day. Follow to join our 3.5M+ monthly readers.

Written by Crystal X

I have over five decades experience in the world of work, being in fast food, the military, business, non-profits, and the healthcare sector.

Responses (1)

Write a response