0% found this document useful (0 votes)
86 views8 pages

Sales Data Visualization Techniques

Uploaded by

Mbogo Alex
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
86 views8 pages

Sales Data Visualization Techniques

Uploaded by

Mbogo Alex
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

BUSINESS INTELLIGENCE AND ANALYTICS

VISUALIZATION

BY

Mbogo Alex
Business Questions:

I will use visualization effects which are dashboards, heat maps, fever charts and dial gauges to answer
the following question:

1. To analyze and visualize the overall sales trends over time.


2. How does the average quantity ordered vary across different product lines and months?
3. To visualize the average sales value using a dial gauge.

Selected Dataset:

Sample Sales Data is the name of the dataset that includes different types of information about orders,
sales, customers, shipping, and more. Its primary aim was to facilitate segmentation, customer
analytics, clustering, and retail analytics. Initially, Pentaho Data Integration (DI) Kettle, a popular data
integration and ETL (Extract, Transform, Load) tool, was used to process the dataset. María Carina
Roldán recognized the potential for Sales Simulation training and modified it accordingly, however, as
the creator.

The dataset consists of the following columns:

1. ORDERNUMBER: A unique identifier for each order.


2. QUANTITYORDERED: The quantity of products ordered in each order.
3. PRICEEACH: The unit price of each product.
4. ORDERLINENUMBER: A sequential number assigned to each line item within an order.
5. SALES: The total sales amount for each order (calculated as QUANTITYORDERED
multiplied by PRICEEACH).
6. ORDERDATE: The date when the order was placed.
7. STATUS: The status of the order (e.g., processed, shipped, delivered, etc.).
8. QTR_ID: The quarter of the year when the order was placed (e.g., 1 for Q1, 2 for Q2, etc.).
9. MONTH_ID: The month when the order was placed (e.g., 1 for January, 2 for February, etc.).
10. YEAR_ID: The year when the order was placed.
11. PRODUCTLINE: The product line/category to which the ordered product belongs.
12. MSRP: Manufacturer's Suggested Retail Price for the product.
13. PRODUCTCODE: A unique code identifying each product.
14. CUSTOMERNAME: The name of the customer who placed the order.
15. PHONE: The contact phone number of the customer.
16. ADDRESSLINE1: The first line of the customer's address.
17. ADDRESSLINE2: The second line of the customer's address.
18. CITY: The city where the customer is located.
19. STATE: The state where the customer is located.
20. POSTALCODE: The postal code of the customer's location.
21. COUNTRY: The country where the customer is located.
22. TERRITORY: The territorial region associated with the customer's location.
23. CONTACTLASTNAME: The last name of the customer contact.
24. CONTACTFIRSTNAME: The first name of the customer contact.
25. DEALSIZE: A categorical variable indicating the size of the deal (e.g., small, medium, large).
Selected tools:

Pandas – Allows us to provide DataFrame data structures used to handle and manipulate a dataset.
Matplotlib – A visualization library for creating chrarts.
Seaborn – Python library for creating visually appealing statistical graphics.

How visualization was performed:

Dashboard:

1. Business question: To analyze and visualize the overall sales trends over time.
2. Visualization process: The data is grouped by the ‘ORDERDATE’ column and the sum of
sales for each date is calculated. The resulting data is plotted using the matplotlib library.
3. Results: The result is a line chart that depicts the sales trend over time. Stakeholders can
observe the upward or downward trends, identify peak periods and also they can assess the
overall sales trajectory.
Heat Maps:

Question: How does the average quantity ordered vary across different product lines and months?

Creating the Heatmap:

• I first created a pivot table, the pivot table calculates the average quantity ordered
(`QUANTITYORDERED`) by (`PRODUCTLINE`) and (`MONTH_ID`).
• The resulting pivot table, `heatmap_data` is the data source for the heatmap.
• The heatmap is created by passing the following parameters:
◦ z=heatmap_data.values
◦ x=heatmap_data.columns
◦ y=heatmap_data.index
◦ colorscale=‘Viridis’
Answer to the business question: The average of the quatity ordered varies across different product
lines and months as shown in the heatmap. Darker colors indicate higher average quantities ordered and
lighter colors indicate lower average quantities ordered.
Dial Gauge:

1. The business question being addressed: To visualize the average sales value using a dial
gauge.
2. ETL tool description and application: I used Python’s ‘plotly.graph_objects’ to do the
visualization.
3. Visualization process: The ‘[Link]’ and ‘[Link]’ classes from ‘plotly.graph_objects’
were first initialized. Then the gauge was constructed:

4. Results: The average sales value indicated by the dial gauge was 3.55K.

Common questions

Powered by AI

Dashboards are best suited for answering comprehensive questions about overall sales trends over time, providing stakeholders with a macro view of business performance. Heat maps address questions related to variations in average quantities ordered across different product lines and months, offering insights into product demand dynamics. Dial gauges are effective for visualizing questions focused on average sales values, delivering quick assessments of sales performance. Each type fulfills different analytical needs, standing out for its ability to visually communicate specific facets of the business data .

The dataset's categorical variables, such as 'PRODUCTLINE', 'STATUS', 'COUNTRY', and 'DEALSIZE', combined with quantitative variables like 'SALES', 'QUANTITYORDERED', and 'PRICEEACH', enable comprehensive segmentation and trend analysis. By examining interactions between these variables, businesses can identify profitable product lines, geographical sales hotspots, and customer purchase behaviors. For instance, cross-referencing 'DEALSIZE' with 'SALES' and 'COUNTRY' can reveal potential market opportunities or challenges. This multi-faceted analysis supports targeted marketing, inventory management, and strategic forecasting, enhancing both customer understanding and retail efficiency .

María Carina Roldán's modifications likely involved structuring the dataset to include various scenarios, metrics, and attributes crucial for simulating real-world sales environments. By tailoring the dataset for training purposes, it aligns more closely with practical learning outcomes, enabling users to engage with authentic data interactions, scenario analysis, and decision-making exercises that reflect true market dynamics. This enhances the dataset’s educational value, providing a comprehensive tool for honing skills in sales predictions, trends analysis, and strategic planning .

Line charts illustrating sales trends over time allow stakeholders to identify patterns such as seasonal fluctuations, peak sales periods, and long-term growth trajectories. Such insights are significant because they inform strategic decisions like inventory management, marketing campaigns, and resource allocation. For instance, recognizing peak periods can align promotional efforts to boost revenue, while identifying off-season trends can guide budget adjustments .

Pentaho Data Integration (Kettle) is instrumental in processing the sales dataset as it provides an infrastructure for ETL operations—Extract, Transform, Load—which helps in cleaning and manipulating large datasets efficiently before visualization. By using these capabilities, the dataset can be refined to focus on key metrics such as sales, quantity ordered, and pricing, enhancing the effectiveness of subsequent visual analyses like dashboards and gauges .

Segmentation in the dataset is crucial for customer analytics as it enables the classification of customers based on transaction behavior, geographical location, and deal size. By breaking down the dataset into segments such as product lines and territories, businesses can tailor marketing strategies, improve targeting efficiency, and predict customer needs. This results in improved decision-making and business intelligence because specific insights can be derived from patterns and trends unique to each segment .

Python libraries like Pandas, Matplotlib, and Seaborn enhance data visualization by offering robust data handling, transformation, and graphical representation capabilities. Pandas provides data structures like DataFrames that simplify manipulation and analysis, while Matplotlib allows for the creation of a wide range of static, animated, and interactive visualizations. Seaborn builds on Matplotlib's foundation to generate aesthetically pleasing statistical plots. Together, these tools streamline the visualization process, enabling the creation of detailed and insightful graphics that improve data comprehension and decision-making .

Heatmaps provide a dense and intuitive display of variations in average quantities ordered across product lines and months by using a color gradient that signifies high and low values. This method is particularly effective for spotting patterns and anomalies at a glance, which would be more challenging in textual or tabular formats. The visual intensity of data representation through color gradients enables quick comparative analysis, making heatmaps suitable for this type of multidimensional data .

The line chart is created by grouping data by the 'ORDERDATE' column and calculating the sum of sales for each date using matplotlib, which allows stakeholders to observe sales trends over time . Conversely, heat maps involve creating a pivot table to calculate the average quantity ordered by 'PRODUCTLINE' and 'MONTH_ID', using a Viridis color scale to show variations across product lines and months . For the dial gauge, 'plotly.graph_objects' with 'go.Figure' and 'go.Indicator' classes is used to visualize the average sales value, providing an immediate sense of value distribution . Each method serves different analytical purposes: line charts show trends, heat maps illustrate quantity distribution, and dial gauges indicate sales values.

Using color scales like 'Viridis' in heatmaps enhances interpretation by providing a clear visual indication of data magnitude through color intensity. Darker and lighter shades represent higher and lower values, respectively, making it easier to identify patterns, clusters, or outliers within the data. The continuous color gradient of 'Viridis', in particular, is perceptually uniform, which helps ensure that variations in color truly reflect proportional differences in data values, thereby aiding in accurate and quick analysis .

You might also like