0% found this document useful (0 votes)
38 views66 pages

Empowering Small Companies With Automated Sales Forecasting

The document presents a project focused on developing an automated sales forecasting web application aimed at empowering small businesses. It utilizes machine learning algorithms such as Linear Regression, XGBoost, and LSTM to analyze historical sales data and provide accurate multi-year forecasts. The system is designed to be user-friendly, scalable across various industries, and includes features for real-time visualization and actionable insights to enhance decision-making in inventory and resource management.

Uploaded by

nidhulk777
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views66 pages

Empowering Small Companies With Automated Sales Forecasting

The document presents a project focused on developing an automated sales forecasting web application aimed at empowering small businesses. It utilizes machine learning algorithms such as Linear Regression, XGBoost, and LSTM to analyze historical sales data and provide accurate multi-year forecasts. The system is designed to be user-friendly, scalable across various industries, and includes features for real-time visualization and actionable insights to enhance decision-making in inventory and resource management.

Uploaded by

nidhulk777
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 66

EMPOWERING SMALL COMPANIES WITH

AUTOMATED SALES FORECASTING

AD3811-PROJECT WORK

Submitted by

NANDESH N 721921243076
SANJITH E U 721921243094
SUMESH K S 721921243116
ADARSH R G 721921243301

in partial fulfilment for award of the degree


of

BACHELOR OF TECHNOLOGY
In

ARTIFICIAL INTELLIGENCE AND DATA SCIENCE

DHANALAKSHMI SRINIVASAN COLLEGE OF ENGINEERING

(AUTONOMOUS)

COIMBATORE - 641105

ANNA UNIVERSITY :: CHENNAI 600 025

MAY 2025

1
ANNA UNIVERSITY : CHENNAI 600 025
BONAFIDE CERTIFICATE

This is to Certify that the AD8811- PROJECT WORK report “EMPOWERING SMALL
COMPANIES WITH AUTOMATED SALES FORECASTING” is the Bonafide work of the
following students

SANJITH E U (721921243094)
ADARSH R G (721921243301)
NANDESH N (721921243076)
SUMESH K S (721921243116)

who carried out the project work under my supervision during the academic year 2024- 2025

SIGNATURE SIGNATURE

Mr.P.CHANDRASEKAR M.E, Ph.D Mrs.SRINJU.M, M.E

HEAD OF THE DEPARTMENT SUPERVISOR


Dept. of Artificial Intelligence Dept. of Artificial Intelligence

and Data Science, and Data Science

Dhanalakshmi Srinivasan Dhanalakshmi Srinivasan

College of Engineering, College of Engineering,

Coimbatore-641105 Coimbatore-641105

Submitted for the project viva-voce held on ………………………………

INTERNAL EXAMINER EXTERNAL EXAMINER


2
ACKNOWLEDGEMENT

First and foremost, I would like to thank almighty for showering the blessings throughout
our life. I take the privilege to express hearty thanks to my parents for their valuable support and
effort to complete this project.

I take this chance to express my deep sense of gratitude to our Management, our beloved
Principal Dr C. JEGADHEESAN, M.E (PhD) and our Dean Dr K. BAGHIRATHI for
providing an excellent infrastructure and support to pursue work at our college.

I express my profound thanks to our beloved Head of the Department


Mr. P. CHANDRASEKAR, M.E (PhD) for his able administration and keen interest, which
motivated us to completed this project.

I extend my thanks to, Mrs. SRINJU. M, M.E AP/AI&DS for her valuable guidance at
each and every stage of the project, which helped me a lot in the successful completion of the
project.

I take immense pleasure to express my heartfelt thanks to Dr. A. T RAVI, M.E, PhD for
her valuable suggestions and constant support as project coordinator to complete this project
report.

I am very much grateful to all my teaching and non-teaching staffs and my friends who
helped me to complete the project.

3
TABLE OF CONTENTS

S.NO CONTENTS PAGE.NO

ABSTRACT VI

LIST OF FIGURES VII

LIST OF TABLES VIII

LIST OF ABBREVIATION IX

1
1 INTRODUCTION
1
1.1 Overview of the Project
2
1.2 Model Description
4
2 LITERATURE SURVEY

3 SYSTEM STUDY 7

3.1 Existing System 7


3.1.1 Disadvantages 7
3.2 Proposed System 7
3.2.1 Advantages 8

4 SYSTEM DESCRIPTION 9

4.1 Module Description 9


4.2 System Design 13

4
4.2.1 Input Design 13
4.2.2 Output Design 13
4.2.3 Code Design 13
4.2.4 Database Design 13
4.3 System flow Diagram 14
4.3.1 Data flow Diagram 15
4.4 System Testing 17
4.4.1 Types of Test unit Testing 17
4.4.2 Integration Testing 17
4.4.3 Functional Testing 17
4.4.4 System Test 18
4.4.5 Feasibility Study 18

5 SOFTWARE REQUIREMENT 19
5.1 Software Description 19

6 CONCLUSION 27

7 SCOPE OF THE FEATURE ENHANCEMENT 28

8 REFERENCE 30

9 APPENDIX 31

a) Source Coding 31

b) Screenshot 45

5
ABSTRACT

Sales forecasting is a vital aspect of any business, helping organizations


anticipate future revenue and plan operations accordingly. This project focuses on
building an interactive and intelligent Sales Forecasting Web Application that
leverages historical sales data and machine learning algorithms to predict future sales
trends accurately. The primary goal is to enable users, especially small businesses, to
upload their sales data and receive multi-year forecasts through an intuitive, user-
friendly interface.

The system supports multiple algorithms — Linear Regression, XGBoost, and


LSTM — allowing users to select the best-fit model for their dataset. These models
analyze past patterns, seasonal trends, and variations to provide meaningful insights
into future performance. The application features an in-browser dataset preview, real-
time visualization of predictions, and a downloadable analytical report, offering
transparency and ease of use.

Additionally, the tool is designed with scalability and usability in mind. It


supports various CSV formats and adapts well across industries like retail and
wholesale. The visualization components, powered by Chart.js, clearly distinguish
between actual and predicted sales using different colors and styles, making trends
easy to interpret.

The outcome of this project empowers decision-makers with actionable


forecasts, helping in inventory planning, budget estimation, and resource allocation.
By automating the forecasting process through this web application, businesses can
reduce manual effort, improve planning accuracy, and support strategic growth with
data-backed decisions.

6
LIST OF FIGURES

FIGURE NO FIGURE NAME PAGE NO


1 SYSTEM ARCHITECTURE 8
2 LINEAR REGRESSION 10
3 LONG-SHORT TERM MEMORY 11
4 DATA FLOW DIAGRAM LEVEL 0 14
5 DATA FLOW DIAGRAM LEVEL 1 15

7
LIST OF ABBREVIATION

S.NO ABBREVIATION FULL FORM OF ABBREVIATION


1 AI ARTIFICIAL INTELLIGENCE
2 R² COEFFICIENT OF DETERMINATION
3 LSTM LONG-SHORT TERM MEMORY
4 XGBOOST EXTREME GRADIENT BOOST
5 MAE MEAN ABSOLUTE ERROR
6 RMSE ROOT MEAN SQUARE ERROR

8
CHAPTER 1

INTRODUCTION

1.1 OVERVIEW OF THE PROJECT

Sales forecasting is defined as the process of estimating future sales volumes using
historical data, seasonal patterns, and external factors. Accurate forecasting is essential for
setting realistic targets, managing inventory efficiently, planning production, and optimizing
marketing efforts. It is a critical function that helps companies anticipate demand fluctuations
and make informed strategic decisions, ultimately mitigating risks and improving overall
business performance.
Small companies often face significant challenges when it comes to forecasting sales.
They typically have limited historical data, which is often inconsistent or incomplete, making it
difficult to identify clear trends. Additionally, these companies usually lack dedicated data
science teams, meaning they must rely on generic, off-the-shelf forecasting tools that may not
be tailored to their unique needs. The high cost and complexity of enterprise-grade solutions
further restrict their ability to implement advanced analytics, leading to a pressing need for a
cost-effective, user-friendly alternative. Sales forecasting is a critical component of business
strategy, enabling organizations to predict future sales performance, optimize operations, and
make informed decisions. This project aims to develop an advanced sales forecasting system
using machine learning techniques to deliver accurate and actionable insights. By leveraging
historical sales data, external factors, and real-time inputs, the model will predict future sales
trends while identifying key influencing factors such as seasonality, promotions, and market
dynamics. The proposed system employs state-of-the-art algorithms, including Extreme
Gradient Boosting, Linear Regression, and Long Short-Term Memory (LSTM) networks, to
handle complex patterns in data. The solution will also incorporate anomaly detection to
identify irregularities, ensuring data reliability. A key focus of this project is enhancing the
interpretability of the model, enabling stakeholders to trust and understand the results.
Additionally, the project explores scalability across industries, ensuring adaptability to retail,
manufacturing, and e-commerce sectors. By integrating the forecasting model with a user-
friendly dashboard, businesses will gain access to real-time insights, enabling them to optimize
inventory management, supply chain operations, and workforce allocation.

9
1.2 MACHINE LEARNING:

Machine Learning (ML) is a transformative field within the domain of Artificial Intelligence
(AI) that empowers computer systems to learn from data, identify hidden patterns, and make
intelligent decisions or predictions with minimal human intervention. Unlike traditional rule-
based programming, where logic must be explicitly coded, ML systems automatically
improve their performance through experience—by analysing past observations and
continuously refining their predictive capabilities.

In the context of sales forecasting, ML algorithms are trained on vast historical datasets that
include sales figures, promotional schedules, seasonal trends, customer behaviour, economic
indicators, and other relevant features. By learning from this multi-dimensional data, the
models can uncover intricate, non-linear relationships that are difficult to detect through
conventional statistical methods.

Machine learning models—such as Linear Regression, XGBoost, and Long Short-Term


Memory (LSTM) neural networks—are particularly effective for time series forecasting due
to their ability to adapt to temporal changes and recognize trend shifts, anomalies, and
seasonality. These capabilities allow businesses to make proactive decisions, such as
optimizing inventory levels, adjusting marketing strategies, or planning workforce
requirements.

The strength of ML in this domain lies in its ability to generalize from the past to predict the
future with a high degree of accuracy. As new data becomes available, the models can be
retrained to reflect the latest market dynamics, ensuring scalable, flexible, and data-driven
decision-making.

Ultimately, machine learning is not just a technological tool, but a strategic enabler that
enhances business intelligence, drives operational efficiency, and fosters innovation—
particularly in the competitive landscape of modern commerce.

10
Advantages of ML in this project

1. Improved Forecast Accuracy:- ML models like Linear Regression, XGBoost, and LSTM
are capable of identifying complex patterns and trends in historical sales data, leading to
more accurate and reliable forecasts compared to traditional statistical methods.

2. Scalability Across Industries:- The forecasting models are not hardcoded for one
specific dataset—they can be adapted and scaled for different businesses in retail,
manufacturing, e-commerce, and more with minimal adjustments.

3. Efficient Anomaly Detection:- ML can identify unusual spikes or dips in sales data,
helping flag potential issues such as stockouts, demand surges, or reporting errors,
improving data integrity.

4. Seamless Integration with Visualization Tools:- The combination of ML and tools like
Chart.js allows users to view model outputs in an intuitive and interactive manner, aiding
in clear communication of insights to stakeholders.

5. Enhanced Decision-Making:- With real-time predictions and visual insights, businesses


can make informed decisions regarding inventory planning, marketing campaigns, and
resource allocation.

6. Reduction of Manual Effort:- Once trained, ML models automate the prediction process,
reducing the time and effort needed for manual forecasting or spreadsheet-based
analysis.

11
CHAPTER 2
LITERATURE SURVEY

Gärtner D et al. (2021)


Proposed an automated machine learning pipeline for SMEs using ARIMA and Prophet. This
reduces reliance on data scientists and speeds up demand forecasting deployment. Ideal for
structured business environments. [1]

Shaik Vadla M et al. (2024)


Used BERT for sentiment analysis on Amazon reviews. The model captures deep contextual
meaning, offering high accuracy. It is computationally expensive and not ideal for real-time
applications. [2]

Habil A et al. (2023)


Introduced AI-based recommendation systems for market prediction. These systems
personalize targeting strategies but lack responsiveness to dynamic market changes. [3]

Pereira J et al. (2023)


Applied regression and decision tree models to forecast consultancy pricing. The models are
interpretable and useful in practice but may not scale efficiently. [4]

Ahmad I et al. (2023)


Implemented Facebook Prophet for demand forecasting. It efficiently models holidays and
trends but performs poorly with irregular data. [5]

Ensafi N et al. (2022)


Forecasted seasonal product sales using machine learning. Effective for predictable patterns,
but unsuitable for irregular or off-season trends. [6]

Shaik M and Verma A (2022)


Combined ARIMA and machine learning for mobile phone sales prediction. The hybrid
model improves accuracy but is complex to fine-tune. [7]

Lee H et al. (2022)


Used XGBoost for retail sales forecasting, showing high accuracy and clarity. However, it
requires careful parameter tuning to avoid overfitting. [8]

Jena M et al. (2022)


Reviewed deep learning models used in forecasting across domains. Offers comprehensive
insight but lacks experimental or case-based validation. [9]

Wassan M et al. (2021)


Performed sentiment analysis of Amazon reviews with machine learning. Useful for gauging
customer feedback but not directly predictive. [10]

12
Alvarez G and Chang W (2021)
Applied ensemble methods for e-commerce sales forecasting. These models are robust but
harder to interpret compared to simpler models. [11]

Banerjee S et al. (2021)


Used regression and data mining for retail forecasting. Performs well with clean, large
datasets but is sensitive to noisy inputs. [12]

Sohrabpour A et al. (2021)


Developed an AI model for export sales prediction. Offers strong performance in export
sectors, though not generalizable to all markets. [13]

Thomas A and George L (2020)


Utilized RNNs for time-series stock and sales forecasting. Effective for sequential data but
suffers from vanishing gradient problems. [14]

Kuo Y and Lin C (2020)


Adopted LSTM networks for short-term retail forecasting. Captures temporal dependencies
well but requires extensive data and compute power. [15]

13
CHAPTER 3
SYSTEM STUDY

3.1 EXSISTING SYSTEM

The existing system for sales forecasting used by most small companies is typically manual
or based on basic spreadsheet tools (like Excel). In some cases, businesses may try using free
or off-the-shelf software, but these tools often aren't optimized for their specific needs. The
existing system is inefficient, error-prone, and not scalable. It creates challenges in making
accurate business decisions, managing inventory, and planning marketing or production
activities. These limitations highlight the need for a smarter, automated forecasting tool
designed specifically for small businesses

3.1.1 DISADVANTAGE
Inaccuracy due to limited, inconsistent historical data and lack of automation. Manual Effort
Requires human intervention for data cleaning, model selection, and interpretation.

3.2 PROPOSED SYSTEM

The proposed system is a web-based, self-service sales forecasting tool specifically designed
to help small businesses make accurate sales predictions without needing deep technical
knowledge or expensive software. The proposed system empowers small businesses with
advanced forecasting capabilities that are easy to use, affordable, and highly effective. It
helps them plan better, reduce costs, avoid overstocking/stockouts, and respond quickly to
market changes.

14
3.2.1 ADVANTAGE

• Accuracy
• Efficiency
• Cost-Effective

3.3 SYSTEM ARCHITECTURE

FIG: 3.3 System Architecture

15
CHAPTER 4

SYSTEM DESCRIPTION

4.1 MODULE DESCRIPTION

 DATASET COLLECTION
 PRE-PROCESSING
 DATA VISUALIZATION
 MODEL IMPLEMENTATION
 PREDICTION & WEB INTEGRATION

Module 1: Dataset Collection

This module enables the user to upload historical sales data in .csv format. The data must
contain at least two columns: ORDERDATE (timestamp) and SALES (numerical sales values).
Users interact with this module via a web form designed using HTML and Bootstrap.
Uploaded files are stored in a designated uploads/ directory and prepared for further
processing.

Module 2: Pre-processing

This module cleans and prepares the data for model training and forecasting:
 Converts ORDERDATE to a datetime object.
 Extracts the Year component to aggregate sales data annually.
 Groups data by year and sums sales for each period.
 Handles column validation and error-checking for missing headers. The result is a
transformed dataset that is ready for predictive modeling.

16
Module 3: Data Visualization

To enhance interpretability, this module generates a line chart that visualizes both historical
and predicted sales:
 A Matplotlib chart is rendered to display sales trends over time.
 The chart is saved as static/prediction_plot.png and embedded into the result page. 
The visual output helps users quickly grasp sales fluctuations and forecasted values.

Module 4: Model Implementation


This module focuses on building predictive models to forecast future sales. A multi-model
approach is adopted to balance simplicity, performance, and interpretability:

1) Linear Regression:

 Acts as a baseline model.


 Assumes linear relationships between independent variables and sales.
 Useful for quick insights and interpretability, although limited in handling non-
linearities and complex interactions.

17
2) XGBoost (Extreme Gradient Boosting):
 A powerful ensemble learning method known for its speed and accuracy in structured data tasks.
 Automatically handles missing values and variable importance evaluation.
 Suitable for capturing non-linear relationships and interactions between variables .

3) LSTM (Long Short-Term Memory Networks):


 A type of recurrent neural network (RNN) optimized for sequential data and time series
forecasting.
 Captures long-term dependencies and temporal patterns in sales data.

 Enhances predictions for scenarios involving recurring trends and seasonal effects.

18
Model Evaluation Metrics:

 MAE (Mean Absolute Error): Measures average magnitude of errors without considering their
direction
 RMSE (Root Mean Squared Error): Penalizes larger errors more than MAE, highlighting
significant prediction deviations.
 R² Score (Coefficient of Determination): Quantifies the proportion of variance in the target
variable explained by the model.

Module 5: Prediction & Web Integration

To ensure accessibility and usability for non-technical stakeholders, a web-based interface is


built using the Flask framework. This module allows real-time interaction and makes advanced
forecasting tools available to small- and medium-scale retailers.

Key Features:

 File Upload: Users can input a CSV file containing relevant features.

 Model Selection: Auto-select which automatically chooses the model with the highest validation.

 Prediction Output: Sales forecasts are displayed in a user-friendly interface.

4.2 SYSTEM DESIGN

19
4.2.1 INPUT DESIGN
Input design determines how data is entered into the system to generate accurate forecasts. In the
developed application, users interact with a web-based form developed using HTML and Bootstrap
through the Flask framework.

 Users upload input CSV files (test_updated.csv) containing future data for prediction.
 Inputs are validated on the client and server side before processing.
 Preprocessed inputs are structured into numerical formats using one-hot encoding and
standard scaling before being fed into models.

20
4.2.2 OUTPUT DESIGN
Output design determines how the system presents results to the user in a meaningful
and understandable way. In the developed application, forecast results are visualized and made
available for download through a responsive web interface.

 Forecasted sales are displayed in the form of an interactive line chart using Chart.js,
distinguishing between historical (actual) and predicted data.
 Predicted values are stored and exported in a CSV format (e.g., prediction_report.csv),
allowing users to download and analyse results offline.
 The web interface presents a clear visualization of yearly sales trends.
 Users are able to download reports of the forecasted sales with a single click, ensuring
easy integration of results into business workflows and presentations

4.2.3 CODE DESIGN


The code design outlines the modular structure of the application and how various
components interact to accomplish the forecasting task.

 Design Code means a code setting out the broad means parameters with reference to
which
 the Developer will secure uniform standards of design quality, character of design,
building
 materials, density of development and site layout.

4.3 SYSTEM FLOW DIAGRAM


4.3.1 DATA FLOW DIAGRAM

21
A Data Flow Diagram (DFD) is a graphical tool used to describe and analyse the flow of data
within a system. In the context of the Sales Forecasting Web Application, the DFD illustrates
how data moves through the various stages of the application—from user input to prediction
output. The system follows a modular approach where each component performs a specific
function, ensuring flexibility, scalability, and clarity.

1. User Input

The user initiates the process by uploading a CSV file containing historical sales data. This is
done through the web interface provided by the application. The file typically includes fields
like ORDERDATE, SALES, and possibly other relevant features.

2. Web Interface

The web interface, built using Flask for backend routing and HTML/CSS/Bootstrap for
frontend rendering, handles the file upload and form submission. It allows users to:

 Upload the dataset.

 Preview the dataset after upload.

 Select the prediction algorithm (Linear Regression, XGBoost, or LSTM).

 Trigger the prediction and visualize results.

3. Data Preprocessing

Once the data is uploaded, it is passed through a preprocessing module. This module:

 Validates and parses the input CSV.

 Converts the ORDERDATE to datetime and extracts Year, Month, Week, and Day as
new features.

 Removes null or invalid entries.

 Formats the data to match the model requirements. This stage is crucial for ensuring
data consistency and quality before applying machine learning techniques.

4. Model Selection

 Based on evaluate_models.py module the best model will select automatically.

5. Forecasting Engine

The selected model processes the pre-processed data to predict future sales.

22
6. Output Generation

The output of the prediction is:

 A CSV report file summarizing actual and predicted sales.

 A graphical chart rendered using Matplotlib and/or Chart.js to visually present the sales
trend over the years, distinguishing between actual and predicted values.

7. Result Display and Download

The final results are rendered on the result page:

 Users can view the interactive chart displaying forecasted sales.

 A Download Report button allows users to download the predictions as a CSV file for
offline analysis or documentation.

4.4 SYSTEM TESTING


The purpose of testing is to discover errors. Testing is the process of trying to discover
every conceivable fault or weakness in a work product. It provides a way to check the
functionality of components, sub – assemblies, assemblies and/or a finished product It is the
process of exercising software with the intent of ensuring that the Software system meets its
requirements and user expectations and does not fail in an unacceptable manner. There are
various types of tests. Each test type addresses a specific testing requirement.

4.4.1 UNIT TESTING

Unit testing involves the design of test cases that validate that the internal program logic
is functioning properly, and that program inputs produce valid outputs. All decision branches
and internal code flow should be validated. It is the testing of individual software units of the
application .it is done after the completion of an individual unit before integration. This is a
structural testing, that relies on knowledge of its construction and is invasive. Unit tests
perform basic tests at component level and test a specific business process, application, and/or
system configuration. Unit tests ensure that each unique path of a business process performs

23
accurately to the documented specifications and contains clearly defined inputs and expected
results.

4.4.2 INTEGRATION TESTING


Integration tests are designed to test integrated software components to determine if
they actually run as one program. Testing is event driven and is more concerned with the basic
outcome of screens or fields. Integration tests demonstrate that although the components were
individually satisfaction, as shown by successfully unit testing, the combination of components
is correct and consistent. Integration testing is specifically aimed at exposing the problems that
arise from the combination of components.

4.4.3 FUNCTIONALTESTING

Functional tests provide systematic demonstrations that functions tested are available as
specified by the business and technical requirements, system documentation, and user manuals.

4.4.4 FEASIBILITY STUDY

The feasibility of the project is analyzed in this phase and business proposal is put forth
with a very general plan for the project and some cost estimates. During system analysis the
feasibility study of the proposed system is to be carried out. This is to ensure that the proposed
system is not a burden to the company. For feasibility analysis, some understanding of the
major requirements for the system is essential.

24
CHAP
TER 5

SOFTWARE REQUIREMENT

5.1 SOFTWARE DESCRIPTION

This project utilizes a range of software tools and technologies for the development of
a Sales Forecasting Tool. The software requirements span across the programming language
used, libraries required for data processing and machine learning, web development
technologies, and integrated development environments (IDES)

5.2 PYTHON PROGRAMMING LANGUAGE

Python serves as the core language for this project. It is an open-source, high-level,
interpreted programming language known for its simplicity, readability, and versatility. Python
is widely adopted in data science, artificial intelligence, and web development fields due to its
strong support for machine learning libraries and frameworks.

Python provides various built-in modules and supports third party packages which
significantly streamline the implementation of complex machine learning and deep learning
models

25
5.3 PYTHON LIBRARIES AND PACKAGES
 NumPy is used for performing numerical operations
 Pandas helps in handling and preprocessing large datasets.
 Matplotlib, Seaborn, and Plotly are utilized for creating data visualization.
 Scikit-learn is used to implement machine learning models like Linear Regression and to
evaluate model performance using metrics such as MAE, RMSE, and R2 Score
XGBoost is employed for gradient boosting-based sales prediction due to its efficiency
and accuracy.
 TensorFlow and Keras are used to develop Long Short- Term Memory (LSTM) models
suitable for time-series forecasting.
 Flask is a lightweight web framework that enables the integration of the machine learning
model into a user-friendly web interface.
 Pickle and Joblib are used for model serialization and loading during web deployment.
Datetime, OS, and Glob modules are used for date manipulation and file handling
operations.

5.4 Web Technologies


 HTML5 and CSS3 to structure and style the web pages.
 Bootstrap to create responsive and mobile-friendly layouts.
 JavaScript (optional) to add interactivity to forms and inputs.
 Jinja2, the templating engine used in Flask, allows dynamic content to be displayed
within HTML pages.

5.5 Development Environment


The main development environment used is Visual Studio Code, a widely used source-
code editor with powerful extensions for Python and Flask. Additionally, Jupyter Notebook is
used during the exploratory data analysis and initial model development phases due to its
interactive nature. The project was developed using Python version 3.10.5, as observed in the
screenshot of the environment.

26
Optional tools such as Postman may be used to test API endpoints, and Git or GitHub can be
used for version control and collaborative development.

27
5.6 Development Tools
 Visual Studio Code: The integrated development environment (IDE) used to write, edit
and debug the Python and HTML code.
 Python 3.10.5 installed locally for code execution.
 Google Chrome / Mozilla Firefox: Used to access and test the web interface.

5.7 Operating System Compatibility


The application is platform-independent and can be executed on any system that supports
Python, including:

 Windows 10/11
 Linux distributions (Ubuntu 20.04 and later)
 macOS (latest versions)

5.8 Other Tools & Dependencies


 Send file (Flask Utility): Used for downloading the prediction result as a CSV file.

5.9 HARDWARE REQUIREMENTS


Minimum Requirements (For Development & Local Testing):

 Processor: Intel Core i3 or AMD Ryzen 3 (2.0 GHz or higher)


 RAM: 4 GB
 Storage: 256 GB HDD/SSD
 Graphics: Integrated Graphics (for basic visualization)
 Operating System: Windows 10 / Ubuntu 20.04 or later
 Network: Internet connectivity (for Python package installations and updates)

28
CHAPTER 6

CONCLUSION

This project aims to bridge the gap between small businesses and advanced data-driven
decision-making by providing an intuitive, web-based sales forecasting tool. By automating
complex tasks such as data preprocessing, model selection, and forecast visualization, the
solution empowers non-technical users to generate accurate predictions effortlessly. Leveraging
both classical and modern machine learning models like Linear regression, LSTM, and
XGBoost, the tool offers reliable insights into future sales trends.

Through this approach, small companies can enhance inventory management, reduce
operational costs, and respond proactively to market changes—ultimately gaining a competitive
edge in an increasingly data-centric world. The scalable and modular design ensures the system
can evolve with business needs, supporting long-term growth and innovation.

29
CHAPTER 7
SCOPE OF THE FEATURE ENHANCEMENT

Automated sales forecasting leverages data analytics, machine learning, and Al tools to
predict future sales trends. For small businesses, this can be transformative. Below is a detailed
scope covering key areas where automation in sales forecasting empowers small companies:

1. Data collection and integration: Automating the gathering of sales data from CRM, ERP, POS
systems, and e-commerce platforms. Integration with social media, marketing campaigns,
customer feedback, and market trends. Saves time and reduces errors from manual data entry.

2. Trend Analysis & Seasonality Detection: Identifying recurring sales patterns (daily, weekly,
seasonal). Using historical data to detect peak and low-demand periods. Helps in inventory
planning, marketing timing, and workforce allocation.

3. Predictive Modeling: Applying AI/ML models like time-series forecasting (LSTM,


XGBoost), regression models, or neural networks to predict future sales. Scenario modeling for
best-case, worst-case, and expected outcomes. Enables smarter decision-making based on data-
driven insights.

4. Customer Behavior Analysis: Segmenting customers and forecasting demand based on past
behavior, demographics, and preferences. Predicting customer churn or upsell opportunities.
Personalizes sales strategies and enhances customer retention.

5. Inventory and Supply Chain Optimization: Aligning sales forecasts with inventory needs to
prevent overstocking or stockouts. Automating restocking and supply chain decisions based on
predicted demand. Saves costs and improves operational efficiency.

6. Performance Monitoring and Dashboards: Real-time dashboards showing key forecasting


metrics (actual vs. predicted sales). Alerts and visual analytics for quick decision-making

30
CHAPTER 8
REFERENCE

[1] Gärtner, Lippert, & Konigorski, 2021, Automated Demand Forecasting in Small to Medium-
Sized Enterprises.

[2] Shaik Vadla et al., 2024, Enhancing Product Design through Al-Driven Sentiment Analysis
of Amazon Reviews Using BERT.

[3] Pereira et al., 2023, Application of Machine Learning Methods in Forecasting Sales Prices in
a Project Consultancy.

[4] Habil, El-Deeb, & El-Bassiouny, 2023, Al-Based Recommendation Systems: The Ultimate
Solution for Market Prediction and Targeting.

[5] Ensafi et al., 2022, Time-Series Forecasting of Seasonal Items Sales using Machine Learning
- A Comparative Analysis.

[6] Shaik & Verma, 2022, Predicting Present Day Mobile Phone Sales using Time Series-based
Hybrid Prediction Model.

[7] Ahmad, Khan, & Aslam, 2023, Demand Forecasting Using Prophet.

[8] Jena, Rout, & Mishra, 2022, A Survey of Deep Learning in Forecasting.

[9] Lee, Park, & Kim, 2022, Retail Sales Forecasting Using XGBoost.

[10] Sohrabpour et al., 2021, Export Sales Forecasting using Artificial Intelligence.

[11] Wassan et al., 2021, Amazon Product Sentiment Analysis using Machine Learning
Techniques.

[12] Alvarez & Chang, 2021, Predicting E-commerce Sales with Ensemble Methods.

[13] Banerjee, Shah, & Kulkarni, 2021, Data-Driven Forecasting in Retail.

[14] Thomas & George, 2020, Stock and Sales Forecasting Using RNNs.

[15] Kuo & Lin, 2020, Short-Term Forecasting of Retail Sales Using LSTM Networks.

31
APPENDIX 1:

CODING

Index.html

<!DOCTYPE html>

<html lang="en">

<head>

<meta charset="UTF-8">

<title>Sales Forecasting Tool</title>

<link href="https://cdn.jsdelivr.net/npm/[email protected]/dist/css/bootstrap.min.css"
rel="stylesheet">

<link rel="stylesheet" href="{{ url_for('static', filename='styles.css') }}">

</head>

<body>

<nav class="navbar navbar-expand-lg navbar-dark bg-primary">

<div class="container-fluid">

<a class="navbar-brand" href="#">📊 Sales Forecasting</a>

<div class="d-flex">

<span class="navbar-text text-white">Welcome, Admin</span>

</div>

</div>

</nav>

<div class="container-fluid mt-4">


32
<div class="row">

<!-- Sidebar / Upload Section -->

<div class="col-md-4">

<div class="card shadow-sm mb-4">

<div class="card-header bg-secondary text-white">

Upload Data

</div>

<div class="card-body">

<form action="{{ url_for('predict') }}" method="post" enctype="multipart/form-data">

<div class="mb-3">

<label for="file" class="form-label">CSV File</label>

<input class="form-control" type="file" name="file" id="file" required>

</div>

<button type="submit" class="btn btn-primary w-100">Upload & Predict</button>

</form>

</div>

</div>

</div>

<!-- Main Content / Instructions -->

<div class="col-md-8">

<div class="card shadow-sm">

<div class="card-header bg-light">

Instructions

</div>

33
<div class="card-body">

<ul>

<li>Upload your historical sales data in CSV format.</li>

<li>The system will predict future yearly sales using trained models.</li>

<li>Results will be visualized and downloadable.</li>

</ul>

<div class="alert alert-info mt-4">

Sample files can be found in the <code>/uploads</code> folder.

</div>

</div>

</div>

</div>

</div>

</div>

</body>

</html>

34
result.html

<!DOCTYPE html>

<html lang="en">

<head>

<meta charset="UTF-8">

<title>Sales Forecast Result</title>

<link rel="stylesheet"
href="https://cdn.jsdelivr.net/npm/[email protected]/dist/css/bootstrap.min.css">

<link rel="stylesheet" href="{{ url_for('static', filename='styles.css') }}">

<script src="https://cdn.jsdelivr.net/npm/chart.js"></script>

</head>

<body>

<nav class="navbar navbar-expand-lg navbar-dark bg-primary">

<div class="container-fluid">

<span class="navbar-brand">📊 Sales Forecasting Dashboard</span>

</div>

</nav>

<div class="container my-5">

<div class="text-center mb-4">

<h2>📈 Predicted Sales Overview</h2>

<p class="lead">Here's your forecast based on historical sales data.</p>

</div>

<div class="chart-container">

<canvas id="salesChart"></canvas>

</div>

35
<div class="text-center">

<a href="{{ url_for('download_prediction') }}" class="download-btn">⬇️Download Report</a>

</div>

</div>

<script>

const ctx = document.getElementById('salesChart').getContext('2d');

const salesChart = new Chart(ctx, {

type: 'line',

data: {

labels: {{ years | tojson }},

datasets: [{

label: 'Predicted Sales',

data: {{ predictions | tojson }},

borderColor: 'rgba(54, 162, 235, 1)',

backgroundColor: 'rgba(54, 162, 235, 0.2)',

borderWidth: 2,

fill: true,

tension: 0.3

}]

},

options: {

responsive: true,

maintainAspectRatio: false,

plugins: {

legend: {

36
display: true,

position: 'top'

},

scales: {

y: {

title: {

display: true,

text: 'Sales'

},

x: {

title: {

display: true,

text: 'Year'

});

</script>

</body>

</html>

37
app.py

from flask import Flask, render_template, request, send_file

import pandas as pd

import matplotlib.pyplot as plt

from sklearn.linear_model import LinearRegression

import os

app = Flask(__name__)

UPLOAD_FOLDER = 'uploads'

PLOT_PATH = 'static/prediction_plot.png'

REPORT_PATH = 'static/prediction_report.csv'

@app.route('/')

def home():

return render_template('index.html')

@app.route('/predict', methods=['POST'])

def predict():

try:

file = request.files['file']

if not file:

return "❌ No file uploaded."

# Save uploaded file

file_path = os.path.join(UPLOAD_FOLDER, file.filename)

file.save(file_path)

38
# Load and preprocess

df = pd.read_csv(file_path)

if 'ORDERDATE' not in df.columns or 'SALES' not in df.columns:

return "❌ CSV must contain 'ORDERDATE' and 'SALES' columns."

df['ORDERDATE'] = pd.to_datetime(df['ORDERDATE'])

df['Year'] = df['ORDERDATE'].dt.year

yearly_sales = df.groupby('Year')['SALES'].sum().reset_index()

# Train simple Linear Regression

X = yearly_sales[['Year']]

y = yearly_sales['SALES']

model = LinearRegression()

model.fit(X, y)

# Predict next year

next_year = yearly_sales['Year'].max() + 1

predicted_sales = model.predict([[next_year]])[0]

# Append prediction

predicted_df = pd.concat([

yearly_sales,

pd.DataFrame({'Year': [next_year], 'SALES': [predicted_sales]})

], ignore_index=True)

39
# Save report

predicted_df.to_csv(REPORT_PATH, index=False)

# Plot

plt.figure(figsize=(10, 5))

plt.plot(predicted_df['Year'], predicted_df['SALES'], marker='o', color='blue', label='Predicted


Sales')

plt.title('Yearly Sales Forecast')

plt.xlabel('Year')

plt.ylabel('Sales')

plt.grid(True)

plt.legend()

plt.tight_layout()

plt.savefig(PLOT_PATH)

plt.close()

years = predicted_df['Year'].tolist()

predictions = predicted_df['SALES'].tolist()

return render_template('result.html',

years=years,

predictions=predictions)

except Exception as e:

40
return f"❌ Error: {str(e)}"

@app.route('/download')

def download_prediction():

return send_file(REPORT_PATH, as_attachment=True)

if __name__ == '__main__':

app.run(debug=True)

41
hello.py

import pandas as pd

import os

base_path = os.path.dirname(os.path.abspath(__file__))

data_path = os.path.join(base_path, "uploads")

def load_csv(file_path, name):

try:

df = pd.read_csv(file_path, encoding='ISO-8859-1')

print(f"✅ {name} loaded successfully. Preview:\n", df.head(), "\n")

return df

except FileNotFoundError:

print(f"❌ {name} not found at {file_path}.")

return pd.DataFrame()

except Exception as e:

print(f"❌ Error loading {name}: {e}")

return pd.DataFrame()

train_df = load_csv(os.path.join(data_path, "sales_data_sample.csv"), "Train Data

(sales_data_sample.csv)")

test_df = load_csv(os.path.join(data_path, "sample_test.csv"), "Test Data (sample_test.csv)")

features_df = load_csv(os.path.join(base_path, "features.csv"), "Features Data")

stores_df = load_csv(os.path.join(base_path, "stores.csv"), "Stores Data")

42
data_preprocessing.py

import pandas as pd

def preprocess_data(input_path='uploads/sales_data_sample.csv', output_path='cleaned_train_data.csv'):

df = pd.read_csv(input_path, encoding='ISO-8859-1')

# Drop missing values

df.dropna(inplace=True)

# Parse date

df['ORDERDATE'] = pd.to_datetime(df['ORDERDATE'])

df['Year'] = df['ORDERDATE'].dt.year

df['Month'] = df['ORDERDATE'].dt.month

df['Week'] = df['ORDERDATE'].dt.isocalendar().week

df['Day'] = df['ORDERDATE'].dt.day

# Rename SALES column

df.rename(columns={'SALES': 'Weekly_Sales'}, inplace=True)

# Add placeholder columns for compatibility

df['Store'] = 1 # Dummy Store ID

df['Dept'] = 1 # Dummy Dept ID

df['Size'] = 50000 # Assumed store size

df['Temperature'] = 70 # Placeholder

43
df['Fuel_Price'] = 3.5

df['CPI'] = 220

df['Unemployment'] = 6.5

# Reorder columns for model compatibility

columns = ['Store', 'Dept', 'Size', 'Temperature', 'Fuel_Price', 'CPI', 'Unemployment',

'Year', 'Month', 'Week', 'Day', 'Weekly_Sales']

df = df[columns]

# Save outputs

df.to_csv(output_path, index=False)

print(f"✅ Cleaned data saved to {output_path}")

# Save yearly total sales for forecasting

yearly_sales = df.groupby('Year')['Weekly_Sales'].sum().reset_index()

yearly_sales.to_csv('features.csv', index=False)

print("✅ Yearly aggregated sales saved to features.csv")

if __name__ == '__main__':

preprocess_data()

train_linear_regression.py

44
import pandas as pd

import numpy as np

from pathlib import Path

from sklearn.model_selection import train_test_split, cross_val_score

from sklearn.linear_model import LinearRegression

from sklearn.preprocessing import StandardScaler

from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score

import joblib

import os

def load_dataset(file_path, required_columns=None):

"""Load a dataset with error handling and column validation."""

try:

df = pd.read_csv(file_path)

if required_columns:

missing_cols = set(required_columns) - set(df.columns)

if missing_cols:

raise ValueError(f"Missing required columns: {missing_cols}")

return df

except FileNotFoundError:

print(f"❌ Error: File not found - {file_path}")

raise

except Exception as e:

print(f"❌ Error loading {file_path}: {str(e)}")


45
raise

# === Set the data path ===

data_path = Path("./")

try:

print("📂 Loading dataset...")

data = load_dataset(

data_path / "cleaned_train_data.csv",

['Store', 'Dept', 'Size', 'Temperature', 'Fuel_Price', 'CPI', 'Unemployment',

'Year', 'Month', 'Week', 'Day', 'Weekly_Sales']

# Select features & target variable

feature_cols = ['Store', 'Dept', 'Size', 'Temperature', 'Fuel_Price',

'CPI', 'Unemployment', 'Year', 'Month', 'Week', 'Day']

X = data[feature_cols]

y = data['Weekly_Sales']

# Split data

print("✂️Splitting data...")

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Scale features

print("📏 Scaling features...")


46
scaler = StandardScaler()

X_train_scaled = scaler.fit_transform(X_train)

X_test_scaled = scaler.transform(X_test)

# Train model

print("🧠 Training Linear Regression model...")

model = LinearRegression()

model.fit(X_train_scaled, y_train)

# Predict

y_pred = model.predict(X_test_scaled)

# Evaluate

mae = mean_absolute_error(y_test, y_pred)

mse = mean_squared_error(y_test, y_pred)

rmse = np.sqrt(mse)

r2 = r2_score(y_test, y_pred)

cv_scores = cross_val_score(model, X_train_scaled, y_train, cv=5, scoring='r2')

print(f"✅ Model Trained Successfully!")

print(f"📊 Mean Absolute Error: {mae:.2f}")

print(f"📊 Root Mean Squared Error: {rmse:.2f}")

print(f"📊 R² Score: {r2:.4f}")

print(f"📊 Cross-validation R² Score: {cv_scores.mean():.4f} (+/- {cv_scores.std() * 2:.4f})")

47
# Save model and scaler

print("💾 Saving model and scaler...")

model_dir = data_path / "models"

model_dir.mkdir(parents=True, exist_ok=True)

joblib.dump(model, model_dir / "linear_regression_model.joblib")

joblib.dump(scaler, model_dir / "scaler.joblib")

print(f"📁 Model saved as: {model_dir / 'linear_regression_model.joblib'}")

print(f"📁 Scaler saved as: {model_dir / 'scaler.joblib'}")

except Exception as e:

print(f"❌ An error occurred: {str(e)}")

raise

train_xgboost.py

48
import pandas as pd

import numpy as np

import xgboost as xgb

from pathlib import Path

from sklearn.model_selection import train_test_split

from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score

import joblib

import os

import pickle

def load_dataset(file_path, required_columns=None):

"""Load a dataset with error handling and column validation."""

try:

df = pd.read_csv(file_path)

if required_columns:

missing_cols = set(required_columns) - set(df.columns)

if missing_cols:

raise ValueError(f"Missing required columns: {missing_cols}")

return df

except FileNotFoundError:

print(f"❌ Error: File not found - {file_path}")

raise

except Exception as e:

print(f"❌ Error loading {file_path}: {str(e)}")

raise
49
# === Set the data path ===

data_path = Path("./")

try:

print("📂 Loading dataset...")

data = load_dataset(

data_path / "cleaned_train_data.csv",

['Store', 'Dept', 'Size', 'Temperature', 'Fuel_Price', 'CPI', 'Unemployment',

'Year', 'Month', 'Week', 'Day', 'Weekly_Sales']

# Select features & target variable

feature_cols = ['Store', 'Dept', 'Size', 'Temperature', 'Fuel_Price', 'CPI',

'Unemployment', 'Year', 'Month', 'Week', 'Day']

X = data[feature_cols]

y = data['Weekly_Sales']

# Split data

print("✂️Splitting data...")

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train model

print("🧠 Training XGBoost model...")

model = xgb.XGBRegressor(
50
n_estimators=200,

max_depth=6,

learning_rate=0.1,

subsample=0.8,

colsample_bytree=0.8,

random_state=42,

objective="reg:squarederror"

model.fit(X_train, y_train)

# Predict

y_pred = model.predict(X_test)

# Evaluate

mae = mean_absolute_error(y_test, y_pred)

rmse = np.sqrt(mean_squared_error(y_test, y_pred))

r2 = r2_score(y_test, y_pred)

print(f"✅ XGBoost Model Trained!")

print(f"📊 Mean Absolute Error: {mae:.2f}")

print(f"📊 Root Mean Squared Error: {rmse:.2f}")

print(f"📊 R² Score: {r2:.4f}")

# Save model

model_dir = data_path / "models"


51
model_dir.mkdir(parents=True, exist_ok=True)

joblib.dump(model, model_dir / "xgboost_model.joblib")

print(f"💾 XGBoost model saved to {model_dir / 'xgboost_model.joblib'}")

# Save evaluation results

eval_results = {"XGBoost": r2}

with open(model_dir / "evaluation_results.pkl", "wb") as f:

pickle.dump(eval_results, f)

print(f"📁 Evaluation results saved to {model_dir / 'evaluation_results.pkl'}")

except Exception as e:

print(f"❌ An error occurred: {str(e)}")

raise

train_lstm.py

52
import numpy as np

import pandas as pd

import tensorflow as tf

from tensorflow.keras.models import Sequential

from tensorflow.keras.layers import LSTM, Dense, Dropout

from tensorflow.keras.callbacks import EarlyStopping

from sklearn.preprocessing import StandardScaler

from sklearn.model_selection import train_test_split

from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score

from pathlib import Path

import pickle

import logging

# Configure logging

logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')

def load_dataset(file_path, required_columns=None):

"""Load a dataset with error handling and column validation."""

try:

df = pd.read_csv(file_path)

if required_columns:

missing_cols = set(required_columns) - set(df.columns)

if missing_cols:

raise ValueError(f"Missing required columns: {missing_cols}")

return df

53
except FileNotFoundError:

logging.error(f"❌ File not found - {file_path}")

raise

except Exception as e:

logging.error(f"❌ Error loading {file_path}: {str(e)}")

raise

# === Set the data path ===

data_path = Path("./")

try:

logging.info("📂 Loading dataset...")

data = load_dataset(

data_path / "cleaned_train_data.csv",

['Store', 'Dept', 'Size', 'Temperature', 'Fuel_Price', 'CPI',

'Unemployment', 'Year', 'Month', 'Week', 'Day', 'Weekly_Sales']

# Prepare features and target

feature_cols = ['Store', 'Dept', 'Size', 'Temperature', 'Fuel_Price', 'CPI',

'Unemployment', 'Year', 'Month', 'Week', 'Day']

X = data[feature_cols].values

y = data['Weekly_Sales'].values

# Scale features
54
logging.info("🔄 Scaling features...")

scaler = StandardScaler()

X_scaled = scaler.fit_transform(X)

# Reshape input for LSTM [samples, timesteps, features]

X_reshaped = X_scaled.reshape((X_scaled.shape[0], 1, X_scaled.shape[1]))

# Train/test split

logging.info("✂️Splitting data...")

X_train, X_test, y_train, y_test = train_test_split(X_reshaped, y, test_size=0.2, random_state=42)

# Build the LSTM model

logging.info("🔧 Building LSTM model...")

model = Sequential([

LSTM(64, input_shape=(1, X.shape[1]), return_sequences=True),

Dropout(0.2),

LSTM(32),

Dropout(0.2),

Dense(16, activation='relu'),

Dense(1)

])

model.compile(optimizer='adam', loss='mse', metrics=['mae'])

# Set early stopping


55
early_stopping = EarlyStopping(

monitor='val_loss',

patience=10,

restore_best_weights=True,

verbose=1

# Directory to save models

model_dir = data_path / "models"

model_dir.mkdir(parents=True, exist_ok=True)

# Train model

logging.info("🚀 Training LSTM model...")

history = model.fit(

X_train, y_train,

epochs=50,

batch_size=32,

validation_split=0.2,

callbacks=[early_stopping],

verbose=1

# Evaluate model

logging.info("📈 Evaluating model...")

y_pred = model.predict(X_test)
56
mae = mean_absolute_error(y_test, y_pred)

rmse = np.sqrt(mean_squared_error(y_test, y_pred))

r2 = r2_score(y_test, y_pred)

print("\n=== Model Performance ===")

print(f"📊 Mean Absolute Error: {mae:.2f}")

print(f"📊 Root Mean Squared Error: {rmse:.2f}")

print(f"📊 R² Score: {r2:.4f}")

# Save model

model_path = model_dir / "lstm_model.h5"

logging.info(f"💾 Saving model to {model_path}")

model.save(model_path)

# Save evaluation results

eval_path = model_dir / "evaluation_results.pkl"

eval_results = {"LSTM": r2}

with open(eval_path, "wb") as f:

pickle.dump(eval_results, f)

logging.info(f"✅ Model saved to {model_path}")

logging.info(f"✅ Evaluation results saved to {eval_path}")

except Exception as e:

logging.error(f"❌ An error occurred: {str(e)}")


57
raise

evaluate_models.py

import pandas as pd

import numpy as np
58
import pickle

import os

import joblib

from pathlib import Path

from tensorflow.keras.models import load_model

from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score

import logging

# Setup logging

logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')

def load_pickle(file_path):

"""Load a pickle file with error handling."""

try:

with open(file_path, "rb") as f:

return pickle.load(f)

except FileNotFoundError:

logging.error(f"❌ File not found - {file_path}")

raise

except Exception as e:

logging.error(f"❌ Error loading {file_path}: {str(e)}")

raise

def safe_mape(y_true, y_pred):

"""Safely compute MAPE, avoiding divide-by-zero."""


59
y_true = np.array(y_true)

y_pred = np.array(y_pred)

mask = y_true != 0

if not np.any(mask):

return np.inf

return np.mean(np.abs((y_true[mask] - y_pred[mask]) / y_true[mask])) * 100

# === Paths ===

data_path = Path("./")

models_dir = data_path / "models"

models_dir.mkdir(exist_ok=True)

try:

logging.info("🔍 Looking for available models...")

available_models = {}

# Check for available models

lstm_path = models_dir / "lstm_model.h5"

xgb_path = models_dir / "xgboost_model.joblib"

lr_path = models_dir / "linear_regression_model.joblib"

if lstm_path.exists():

available_models["LSTM"] = lstm_path

if xgb_path.exists():

available_models["XGBoost"] = xgb_path

60
if lr_path.exists():

available_models["LinearRegression"] = lr_path

if not available_models:

raise FileNotFoundError("❌ No trained models found. Please train at least one model first.")

# Load evaluation results if they exist

eval_results = {}

eval_results_path = models_dir / "evaluation_results.pkl"

if eval_results_path.exists():

logging.info("📂 Loading existing evaluation results...")

eval_results = load_pickle(eval_results_path)

# Load test data

test_data_path = data_path / "cleaned_train_data.csv"

if not test_data_path.exists():

raise FileNotFoundError(f"❌ Test file not found at {test_data_path}. Run data preprocessing first.")

logging.info("📁 Loading test data...")

test_data = pd.read_csv(test_data_path)

# Prepare features

feature_cols = ['Store', 'Dept', 'Size', 'Temperature', 'Fuel_Price', 'CPI',

'Unemployment', 'Year', 'Month', 'Week', 'Day']

X_test = test_data[feature_cols]
61
y_true = test_data['Weekly_Sales']

# Evaluate all available models

for model_name, model_path in available_models.items():

logging.info(f"⚙️Evaluating {model_name} model...")

try:

if model_name == "LSTM":

model = load_model(model_path)

X_input = X_test.values.reshape((X_test.shape[0], 1, X_test.shape[1]))

y_pred = model.predict(X_input)

# Handle sequence outputs

if len(y_pred.shape) > 1 and y_pred.shape[1] > 1:

y_pred = y_pred[:, 0]

y_pred = y_pred.flatten()

else:

model = joblib.load(model_path)

y_pred = model.predict(X_test)

# Evaluation metrics

mae = mean_absolute_error(y_true, y_pred)

rmse = np.sqrt(mean_squared_error(y_true, y_pred))

mape = safe_mape(y_true, y_pred)

r2 = r2_score(y_true, y_pred)

62
print(f"\n=== 📊 {model_name} Model Performance ===")

print(f"MAE : {mae:.2f}")

print(f"RMSE : {rmse:.2f}")

print(f"MAPE : {mape:.2f}%")

print(f"R² : {r2:.4f}")

# Save results

eval_results[model_name] = r2

result_df = pd.DataFrame({

"True_Sales": y_true,

"Predicted_Sales": y_pred

})

result_file = data_path / f"{model_name.lower()}_predictions.csv"

result_df.to_csv(result_file, index=False)

joblib.dump({"mae": mae, "rmse": rmse, "mape": mape, "r2": r2}, models_dir /


f"{model_name.lower()}_test_results.pkl")

logging.info(f"✅ Saved predictions to {result_file}")

except Exception as e:

logging.error(f"❌ Error while evaluating {model_name}: {str(e)}")

continue

63
# Save updated evaluation results

with open(eval_results_path, "wb") as f:

pickle.dump(eval_results, f)

# Report best model

if eval_results:

print("\n📊 Summary of R² scores:")

for model, score in eval_results.items():

print(f"{model}: {score:.4f}")

best_model = max(eval_results, key=eval_results.get)

print(f"\n🏆 Best Model: {best_model} (R² Score: {eval_results[best_model]:.4f})")

else:

print("\n❌ No model evaluation succeeded.")

except Exception as e:

logging.error(f"❌ An error occurred: {str(e)}")

raise

64
APPENDIX 2:
SCREENSHOT

65
66

You might also like