Adidas Sales: A Data-Driven Dive into the Sportswear Giant

Zeh Brien
Python in Plain English
9 min readNov 6, 2023

--

Introduction

Adidas is one of the world’s leading sportswear brands, with a rich history and a loyal customer base. However, in recent years, Adidas has faced increasing competition from other sportswear brands, as well as from online retailers. In order to maintain its competitive edge, Adidas must be able to understand its customers and their needs, and to identify new opportunities for growth.

Stock Adidas Logo

This data analysis and visualization project aims to provide Adidas with valuable insights into its sales data, including:

  • The top-selling Adidas products and categories
  • The most popular Adidas retailers
  • Sales trends over time and by region

These insights will help Adidas to make informed decisions about its product development, marketing, and retail strategies.

The project will use a variety of data analysis and visualization techniques, including:

  • Data cleaning and preprocessing
  • Exploratory data analysis (EDA) to identify patterns and trends in the data
  • Data visualization to communicate the findings of the analysis in a clear and concise way

Without further Ado, let’s dive into it right now.

Importing necessary libraries and data

Google Colab was the main tool used for this project due to its interoperability between several devices. This is a key advantage over Jupyter Notebook, which can be cumbersome to use on multiple devices.
Even though Jupyter Notebook is more popular, Google Colab is a powerful tool that can be especially useful for data scientists who need to work on multiple devices. That’s just my personal opinion, don’t take it too seriously😅.

Moving on, the code below was used to import the data and the various libraries into the notebook

# Importing necessary libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns


# Importing dataset
from google.colab import files
upload = files.upload()

# Printing the dataset
df = pd.read_excel('Adidas US Sales Datasets.xlsx')
df.head(100)
Adidas dataset

The dataset is very messy, has wrong column names and is full of NaN values. It needs to be cleaned before we can proceed with our analysis. Let’s do that.

Data cleaning and preprocessing

We’ll start by dropping the first column: ‘Unnamed: 0’. It contains only NaN values and is therefore pretty useless for us.

  • Dropping ‘Unnamed: 0' column
# Dropping the unnamed: 0 column
df.drop('Unnamed: 0', axis = 1, inplace = True)
df.head()

We can see that the other columns are also named incorrectly. The real names of the columns are listed in row number 3. Therefore, we need to remove the first 3 rows and set the column names using row number 3. Let’s go.

# Deleting the first three rows
df = df.drop(df.index[0:3])

# Reset the index
df = df.reset_index(drop=True)

# Setting the values of the first row as column headers
df.columns = df.iloc[0]
df = df[1:].reset_index(drop=True)

df.head()

The datasets is about Adidas’ sales in the USA between a period of 2 years from 2020 to 2022. Information about the columns can be seen below.

# information about the various columns of the dataset
df.info()
Dataset information

The invoice date is not in a date format. We need to convert it to a date format before we can proceed with our analysis. Let’s do that.

  • Fixing Invoice date
# Convert the column to datetime format
df['Invoice Date'] = pd.to_datetime(df['Invoice Date'])
df.info()

Now that we’ve fixed that invoice date column, let’s look for null and duplicates in the dataset.

  • Looking for null and duplicates.
# Looking for null values
df.isnull().any()
# Looking for duplicates
df.duplicated.any()

There were no null and duplicates in the dataset.

Dataset description

The retail sales dataset under examination provides a detailed snapshot of the sales operations of Adidas in North America. It captures essential information about retailers, invoices, geographical locations, products, pricing, and financial performance. Let’s take a closer look at the dataset’s columns:

  • Retailer: Represents the name or identifier of the retailer involved in the sales transaction.
  • Retailer ID: An alphanumeric identifier assigned to each retailer for tracking and identification purposes.
  • Invoice Date: The date when the sales transaction took place, providing a temporal perspective for analysis.
  • Region: Denotes the broader geographical region where the sales activity occurred, facilitating regional analysis.
  • State: Specifies the state or province associated with the sales transaction, enabling localized insights.
  • City: Indicates the city where the sales took place, allowing for analysis based on specific urban areas.
  • Product: Describes the name or identifier of the product being sold, reflecting the diverse product portfolio.
  • Price per Unit: Represents the unit price of the product sold, a crucial factor influencing sales revenue.
  • Units Sold: Signifies the quantity of units sold for a particular product in a single transaction, contributing to overall sales volume.
  • Total Sales: The total monetary value generated by the sales transaction, calculated by multiplying the price per unit and units sold.
  • Operating Profit: Refers to the profit earned from the sales transaction after accounting for various operating expenses.
  • Operating Margin: Represents the profitability ratio, calculated by dividing the operating profit by the total sales.
  • Sales Method: Indicates the method or channel through which the sales transaction was conducted, such as online, in-store, or through a third-party platform.

Exploratory Data Analysis

The project’s objectives encompass five key areas: Sales performance evaluation, product analysis, pricing and profitability analysis, sales channel evaluation and market segmentation. The visualizations will cover all of these topics. Let’s begin!!

  • Daily sales trend
# Grouping data by date and calculate the total sales for each date
daily_sales = df.groupby('Invoice Date')['Total Sales'].sum()

# line plot
plt.figure(figsize=(10,5))
plt.plot(daily_sales.index, daily_sales.values)

plt.xlabel('Date')
plt.ylabel('Total Sales')
plt.title('Daily Sales Trend')

plt.show()
Daily sales trend
  • Sales performance by retailer
# Group the data by the retailers
grouped_data = df.groupby('Retailer')['Total Sales'].sum()

# Bar plot
plt.figure(figsize=(10,5))
plt.bar(grouped_data.index, grouped_data.values)

plt.xlabel('Retailer')
plt.ylabel('Total Sales')
plt.title('Sales Performance by Retailer')

# Rotating the x-axis labels for better readability (optional)
plt.xticks(rotation=45)

plt.show()
Sales performance by retailer
  • Sales distribution by product category
# Grouping the data by product category and calculate the total sales for each category
category_sales = df.groupby('Product')['Total Sales'].sum()

# Pie chart
plt.figure(figsize=(10,5))
plt.pie(category_sales.values, labels=category_sales.index, autopct='%1.1f%%', shadow=True)

plt.title('Sales Distribution by Product Category')

plt.show()

Sales distribution by product category
  • Sales and product Performance by sales method
# Grouping the data by sales methodsgrouped_data = df.groupby(['Sales Method', 'Product'])['Total Sales'].sum().reset_index()

# Stacked bar plot
plt.figure(figsize=(10,5))
sns.barplot(data=grouped_data, x='Sales Method', y='Total Sales', hue='Product')

plt.xlabel('Sales Channel')
plt.ylabel('Total Sales')
plt.title('Sales and Product Performance by Sales Method')

plt.show()
Sales and Product Performance by sales method
  • Sales and Product performance by region
# Grouping the data by region
grouped_data = df.groupby(['Region', 'Product'])['Total Sales'].sum().reset_index()

# Stacked bar plot
plt.figure(figsize=(10,5))
sns.barplot(data=grouped_data, x='Region', y='Total Sales', hue='Product')

plt.xlabel('Sales Channel')
plt.ylabel('Total Sales')
plt.title('Sales and Product Performance by Region')

plt.show()
Sales and product performance by region
  • Sales and product performance by retailer
# Grouping the data by retailers
grouped_data = df.groupby(['Retailer', 'Product'])['Total Sales'].sum().reset_index()

# Stacked bar plot
plt.figure(figsize=(10,5))
sns.barplot(data=grouped_data, x='Retailer', y='Total Sales', hue='Product')

plt.xlabel('Sales Channel')
plt.ylabel('Total Sales')
plt.title('Sales and Product Performance by Retailer')

plt.show()
Sales and product performance by retailer
  • Top 5 states by sales

# Grouping data
grouped_data = df.groupby('State')['Total Sales'].sum().reset_index().head(5)

# Barplot
plt.figure(figsize=(10,5))
sns.barplot(data = grouped_data, x='State', y='Total Sales')

plt.xlabel('States')
plt.ylabel('Total Sales')
plt.title('Top 5 States by sales')

plt.show()

Top 5 states by sales
  • Units sold by product

# Extracting the product data
product_data = df.groupby('Product')['Units Sold'].sum()

# Bar plot
plt.figure(figsize=(10,5))
product_data.plot(kind='bar')

plt.xlabel('Product')
plt.ylabel('Units Sold')
plt.title('Comparison of Units Sold by Product')

plt.show()
Units sold by product
  • Histogram of price per unit
# Extracting the price per unit data
price_per_unit_data = df['Price per Unit']

# Hiistogram
plt.figure(figsize=(10,5))
plt.hist(price_per_unit_data, bins=10)

plt.xlabel('Price per Unit')
plt.ylabel('Frequency')
plt.title('Histogram of Price per Unit')

plt.show()
Histogram of price per unit
  • Price per Unit distribution across product categories
# Extracting the price per unit data grouped by product category
price_per_unit_data = df.groupby('Product')['Price per Unit'].apply(list)

# box plot
plt.figure(figsize=(10,5))
plt.boxplot(price_per_unit_data)

plt.xlabel('Product Category')
plt.ylabel('Price per Unit')
plt.title('Distribution of Price per Unit across Product Categories')
plt.xticks(rotation=45)

product_categories = df['Product'].unique()
plt.xticks(range(1, len(product_categories) + 1), product_categories)

plt.show()
Price per unit distribution across Product categories
  • Operating profit per retailer
# Grouping the data by retailer and calculate the total operating profit
operating_profit_by_retailer = df.groupby('Retailer')['Operating Profit'].sum()

# bar plot
plt.figure(figsize=(10,5))
plt.bar(operating_profit_by_retailer.index, operating_profit_by_retailer)

plt.xlabel('Retailer')
plt.ylabel('Operating Profit')
plt.title('Operating Profit by Retailer')
plt.xticks(rotation=45)

plt.show()
Operating profit by retailer
  • Operating margins per region
# Creating a dictionary to store operating margin data by region
operating_margin_by_region = {}

# Iterating over each unique region
for region in df['Region'].unique():

# Filtering the data for the current region
region_data = df[df['Region'] == region]

# Adding the operating margins for the region to the dictionary
operating_margin_by_region[region] = region_data['Operating Margin'].tolist()

# list of operating margin values for each region
operating_margin_values = list(operating_margin_by_region.values())

# list of region labels
region_labels = list(operating_margin_by_region.keys())

# box plot
plt.figure(figsize=(10,5))
plt.boxplot(operating_margin_values, labels=region_labels)

plt.xlabel('Region')
plt.ylabel('Operating Margin')
plt.title('Operating Margin by Region')

plt.show()
Operating margin per region
  • Sales method Proportions
# total sales for each sales method
sales_by_method = df.groupby('Sales Method')['Total Sales'].sum()

# pie chart
plt.figure(figsize=(10,5))
plt.pie(sales_by_method, labels=sales_by_method.index, autopct='%1.1f%%', shadow=True)
plt.title('Sales Method Proportions')

plt.show()
Sales method Proportions
  • Operating profit by Product
# Grouping the data by product and calculate the total operating profit
operating_profit_by_product = df.groupby('Product')['Operating Profit'].sum()

# Bar plot
plt.figure(figsize=(10,5))
plt.bar(operating_profit_by_product.index, operating_profit_by_product)

plt.xlabel('Product')
plt.ylabel('Operating Profit')
plt.title('Operating Profit by Product')

plt.xticks(rotation=90)

plt.show()
Operating profit by product
  • Top 10 states in terms of sales
 # Grouping data
total_sales_by_region = df.groupby('State')['Total Sales'].sum().reset_index()

top_ten_regions = total_sales_by_region.nlargest(10, 'Total Sales')

plt.figure(figsize=(10, 5))
sns.barplot(x='Total Sales', y='State', data=top_ten_regions, orient='h')
plt.xlabel('Total Sales')
plt.ylabel('Region')
plt.title('Top 10 States in Terms of Total Sales')
plt.show()
Top 10 states in terms of sales
  • Top selling products in the top 10 states with most sales
# Grouping data
total_sales_by_state = df.groupby('State')['Total Sales'].sum().reset_index()
top_10_states = total_sales_by_state.nlargest(10, 'Total Sales')

# Top selling product for each state
top_selling_product_by_state = df.loc[df['State'].isin(top_10_states['State'])].groupby('State')['Product'].agg(lambda x: x.value_counts().index[0]).reset_index()
top_10_states_with_product = pd.merge(top_10_states, top_selling_product_by_state, on='State')

# Bar plot
plt.figure(figsize=(10, 5))
sns.barplot(x='Total Sales', y='State', hue='Product', data=top_10_states_with_product)
plt.xlabel('Total Sales')
plt.ylabel('State')
plt.title('Top 10 States in Terms of Sales with Top Selling Product')
plt.legend(title='Top Selling Product')
plt.show()
Most sold products in the top 10 states

Conclusion

This project involved visualizing sales data using line plots, bar plots, and pie charts. The goal was to gain insights into the units sold in different cities, the top city for each product, and the trend of units sold over time etc.

Overall, the visualizations created in this project using Seaborn and Matplotlib allowed for a comprehensive exploration of the sales data, enabling the identification of sales patterns, top-performing cities, and the performance of individual products. These visualizations can assist stakeholders in making data-driven decisions and developing effective sales strategies.
For more information about the project, you can get the entire notebook and the dataset here on GitHub. It includes extra plots which couldn’t be added here because of time constraints.

For any inquiries you can contact me through email here. I am open to opportunities. Thank you for reading, a clap and follow is highly appreciated.

In Plain English

Thank you for being a part of our community! Before you go:

--

--

Passionate writer. Curious explorer. Sharing insights on Data Science, technology, and python.