Empowering Small Companies With Automated Sales Forecasting
Empowering Small Companies With Automated Sales Forecasting
AD3811-PROJECT WORK
Submitted by
NANDESH N 721921243076
SANJITH E U 721921243094
SUMESH K S 721921243116
ADARSH R G 721921243301
BACHELOR OF TECHNOLOGY
In
(AUTONOMOUS)
COIMBATORE - 641105
MAY 2025
1
ANNA UNIVERSITY : CHENNAI 600 025
BONAFIDE CERTIFICATE
This is to Certify that the AD8811- PROJECT WORK report “EMPOWERING SMALL
COMPANIES WITH AUTOMATED SALES FORECASTING” is the Bonafide work of the
following students
SANJITH E U (721921243094)
ADARSH R G (721921243301)
NANDESH N (721921243076)
SUMESH K S (721921243116)
who carried out the project work under my supervision during the academic year 2024- 2025
SIGNATURE SIGNATURE
Coimbatore-641105 Coimbatore-641105
First and foremost, I would like to thank almighty for showering the blessings throughout
our life. I take the privilege to express hearty thanks to my parents for their valuable support and
effort to complete this project.
I take this chance to express my deep sense of gratitude to our Management, our beloved
Principal Dr C. JEGADHEESAN, M.E (PhD) and our Dean Dr K. BAGHIRATHI for
providing an excellent infrastructure and support to pursue work at our college.
I extend my thanks to, Mrs. SRINJU. M, M.E AP/AI&DS for her valuable guidance at
each and every stage of the project, which helped me a lot in the successful completion of the
project.
I take immense pleasure to express my heartfelt thanks to Dr. A. T RAVI, M.E, PhD for
her valuable suggestions and constant support as project coordinator to complete this project
report.
I am very much grateful to all my teaching and non-teaching staffs and my friends who
helped me to complete the project.
3
TABLE OF CONTENTS
ABSTRACT VI
LIST OF ABBREVIATION IX
1
1 INTRODUCTION
1
1.1 Overview of the Project
2
1.2 Model Description
4
2 LITERATURE SURVEY
3 SYSTEM STUDY 7
4 SYSTEM DESCRIPTION 9
4
4.2.1 Input Design 13
4.2.2 Output Design 13
4.2.3 Code Design 13
4.2.4 Database Design 13
4.3 System flow Diagram 14
4.3.1 Data flow Diagram 15
4.4 System Testing 17
4.4.1 Types of Test unit Testing 17
4.4.2 Integration Testing 17
4.4.3 Functional Testing 17
4.4.4 System Test 18
4.4.5 Feasibility Study 18
5 SOFTWARE REQUIREMENT 19
5.1 Software Description 19
6 CONCLUSION 27
8 REFERENCE 30
9 APPENDIX 31
a) Source Coding 31
b) Screenshot 45
5
ABSTRACT
6
LIST OF FIGURES
7
LIST OF ABBREVIATION
8
CHAPTER 1
INTRODUCTION
Sales forecasting is defined as the process of estimating future sales volumes using
historical data, seasonal patterns, and external factors. Accurate forecasting is essential for
setting realistic targets, managing inventory efficiently, planning production, and optimizing
marketing efforts. It is a critical function that helps companies anticipate demand fluctuations
and make informed strategic decisions, ultimately mitigating risks and improving overall
business performance.
Small companies often face significant challenges when it comes to forecasting sales.
They typically have limited historical data, which is often inconsistent or incomplete, making it
difficult to identify clear trends. Additionally, these companies usually lack dedicated data
science teams, meaning they must rely on generic, off-the-shelf forecasting tools that may not
be tailored to their unique needs. The high cost and complexity of enterprise-grade solutions
further restrict their ability to implement advanced analytics, leading to a pressing need for a
cost-effective, user-friendly alternative. Sales forecasting is a critical component of business
strategy, enabling organizations to predict future sales performance, optimize operations, and
make informed decisions. This project aims to develop an advanced sales forecasting system
using machine learning techniques to deliver accurate and actionable insights. By leveraging
historical sales data, external factors, and real-time inputs, the model will predict future sales
trends while identifying key influencing factors such as seasonality, promotions, and market
dynamics. The proposed system employs state-of-the-art algorithms, including Extreme
Gradient Boosting, Linear Regression, and Long Short-Term Memory (LSTM) networks, to
handle complex patterns in data. The solution will also incorporate anomaly detection to
identify irregularities, ensuring data reliability. A key focus of this project is enhancing the
interpretability of the model, enabling stakeholders to trust and understand the results.
Additionally, the project explores scalability across industries, ensuring adaptability to retail,
manufacturing, and e-commerce sectors. By integrating the forecasting model with a user-
friendly dashboard, businesses will gain access to real-time insights, enabling them to optimize
inventory management, supply chain operations, and workforce allocation.
9
1.2 MACHINE LEARNING:
Machine Learning (ML) is a transformative field within the domain of Artificial Intelligence
(AI) that empowers computer systems to learn from data, identify hidden patterns, and make
intelligent decisions or predictions with minimal human intervention. Unlike traditional rule-
based programming, where logic must be explicitly coded, ML systems automatically
improve their performance through experience—by analysing past observations and
continuously refining their predictive capabilities.
In the context of sales forecasting, ML algorithms are trained on vast historical datasets that
include sales figures, promotional schedules, seasonal trends, customer behaviour, economic
indicators, and other relevant features. By learning from this multi-dimensional data, the
models can uncover intricate, non-linear relationships that are difficult to detect through
conventional statistical methods.
The strength of ML in this domain lies in its ability to generalize from the past to predict the
future with a high degree of accuracy. As new data becomes available, the models can be
retrained to reflect the latest market dynamics, ensuring scalable, flexible, and data-driven
decision-making.
Ultimately, machine learning is not just a technological tool, but a strategic enabler that
enhances business intelligence, drives operational efficiency, and fosters innovation—
particularly in the competitive landscape of modern commerce.
10
Advantages of ML in this project
1. Improved Forecast Accuracy:- ML models like Linear Regression, XGBoost, and LSTM
are capable of identifying complex patterns and trends in historical sales data, leading to
more accurate and reliable forecasts compared to traditional statistical methods.
2. Scalability Across Industries:- The forecasting models are not hardcoded for one
specific dataset—they can be adapted and scaled for different businesses in retail,
manufacturing, e-commerce, and more with minimal adjustments.
3. Efficient Anomaly Detection:- ML can identify unusual spikes or dips in sales data,
helping flag potential issues such as stockouts, demand surges, or reporting errors,
improving data integrity.
4. Seamless Integration with Visualization Tools:- The combination of ML and tools like
Chart.js allows users to view model outputs in an intuitive and interactive manner, aiding
in clear communication of insights to stakeholders.
6. Reduction of Manual Effort:- Once trained, ML models automate the prediction process,
reducing the time and effort needed for manual forecasting or spreadsheet-based
analysis.
11
CHAPTER 2
LITERATURE SURVEY
12
Alvarez G and Chang W (2021)
Applied ensemble methods for e-commerce sales forecasting. These models are robust but
harder to interpret compared to simpler models. [11]
13
CHAPTER 3
SYSTEM STUDY
The existing system for sales forecasting used by most small companies is typically manual
or based on basic spreadsheet tools (like Excel). In some cases, businesses may try using free
or off-the-shelf software, but these tools often aren't optimized for their specific needs. The
existing system is inefficient, error-prone, and not scalable. It creates challenges in making
accurate business decisions, managing inventory, and planning marketing or production
activities. These limitations highlight the need for a smarter, automated forecasting tool
designed specifically for small businesses
3.1.1 DISADVANTAGE
Inaccuracy due to limited, inconsistent historical data and lack of automation. Manual Effort
Requires human intervention for data cleaning, model selection, and interpretation.
The proposed system is a web-based, self-service sales forecasting tool specifically designed
to help small businesses make accurate sales predictions without needing deep technical
knowledge or expensive software. The proposed system empowers small businesses with
advanced forecasting capabilities that are easy to use, affordable, and highly effective. It
helps them plan better, reduce costs, avoid overstocking/stockouts, and respond quickly to
market changes.
14
3.2.1 ADVANTAGE
• Accuracy
• Efficiency
• Cost-Effective
15
CHAPTER 4
SYSTEM DESCRIPTION
DATASET COLLECTION
PRE-PROCESSING
DATA VISUALIZATION
MODEL IMPLEMENTATION
PREDICTION & WEB INTEGRATION
This module enables the user to upload historical sales data in .csv format. The data must
contain at least two columns: ORDERDATE (timestamp) and SALES (numerical sales values).
Users interact with this module via a web form designed using HTML and Bootstrap.
Uploaded files are stored in a designated uploads/ directory and prepared for further
processing.
Module 2: Pre-processing
This module cleans and prepares the data for model training and forecasting:
Converts ORDERDATE to a datetime object.
Extracts the Year component to aggregate sales data annually.
Groups data by year and sums sales for each period.
Handles column validation and error-checking for missing headers. The result is a
transformed dataset that is ready for predictive modeling.
16
Module 3: Data Visualization
To enhance interpretability, this module generates a line chart that visualizes both historical
and predicted sales:
A Matplotlib chart is rendered to display sales trends over time.
The chart is saved as static/prediction_plot.png and embedded into the result page.
The visual output helps users quickly grasp sales fluctuations and forecasted values.
1) Linear Regression:
17
2) XGBoost (Extreme Gradient Boosting):
A powerful ensemble learning method known for its speed and accuracy in structured data tasks.
Automatically handles missing values and variable importance evaluation.
Suitable for capturing non-linear relationships and interactions between variables .
Enhances predictions for scenarios involving recurring trends and seasonal effects.
18
Model Evaluation Metrics:
MAE (Mean Absolute Error): Measures average magnitude of errors without considering their
direction
RMSE (Root Mean Squared Error): Penalizes larger errors more than MAE, highlighting
significant prediction deviations.
R² Score (Coefficient of Determination): Quantifies the proportion of variance in the target
variable explained by the model.
Key Features:
File Upload: Users can input a CSV file containing relevant features.
Model Selection: Auto-select which automatically chooses the model with the highest validation.
19
4.2.1 INPUT DESIGN
Input design determines how data is entered into the system to generate accurate forecasts. In the
developed application, users interact with a web-based form developed using HTML and Bootstrap
through the Flask framework.
Users upload input CSV files (test_updated.csv) containing future data for prediction.
Inputs are validated on the client and server side before processing.
Preprocessed inputs are structured into numerical formats using one-hot encoding and
standard scaling before being fed into models.
20
4.2.2 OUTPUT DESIGN
Output design determines how the system presents results to the user in a meaningful
and understandable way. In the developed application, forecast results are visualized and made
available for download through a responsive web interface.
Forecasted sales are displayed in the form of an interactive line chart using Chart.js,
distinguishing between historical (actual) and predicted data.
Predicted values are stored and exported in a CSV format (e.g., prediction_report.csv),
allowing users to download and analyse results offline.
The web interface presents a clear visualization of yearly sales trends.
Users are able to download reports of the forecasted sales with a single click, ensuring
easy integration of results into business workflows and presentations
Design Code means a code setting out the broad means parameters with reference to
which
the Developer will secure uniform standards of design quality, character of design,
building
materials, density of development and site layout.
21
A Data Flow Diagram (DFD) is a graphical tool used to describe and analyse the flow of data
within a system. In the context of the Sales Forecasting Web Application, the DFD illustrates
how data moves through the various stages of the application—from user input to prediction
output. The system follows a modular approach where each component performs a specific
function, ensuring flexibility, scalability, and clarity.
1. User Input
The user initiates the process by uploading a CSV file containing historical sales data. This is
done through the web interface provided by the application. The file typically includes fields
like ORDERDATE, SALES, and possibly other relevant features.
2. Web Interface
The web interface, built using Flask for backend routing and HTML/CSS/Bootstrap for
frontend rendering, handles the file upload and form submission. It allows users to:
3. Data Preprocessing
Once the data is uploaded, it is passed through a preprocessing module. This module:
Converts the ORDERDATE to datetime and extracts Year, Month, Week, and Day as
new features.
Formats the data to match the model requirements. This stage is crucial for ensuring
data consistency and quality before applying machine learning techniques.
4. Model Selection
5. Forecasting Engine
The selected model processes the pre-processed data to predict future sales.
22
6. Output Generation
A graphical chart rendered using Matplotlib and/or Chart.js to visually present the sales
trend over the years, distinguishing between actual and predicted values.
A Download Report button allows users to download the predictions as a CSV file for
offline analysis or documentation.
Unit testing involves the design of test cases that validate that the internal program logic
is functioning properly, and that program inputs produce valid outputs. All decision branches
and internal code flow should be validated. It is the testing of individual software units of the
application .it is done after the completion of an individual unit before integration. This is a
structural testing, that relies on knowledge of its construction and is invasive. Unit tests
perform basic tests at component level and test a specific business process, application, and/or
system configuration. Unit tests ensure that each unique path of a business process performs
23
accurately to the documented specifications and contains clearly defined inputs and expected
results.
4.4.3 FUNCTIONALTESTING
Functional tests provide systematic demonstrations that functions tested are available as
specified by the business and technical requirements, system documentation, and user manuals.
The feasibility of the project is analyzed in this phase and business proposal is put forth
with a very general plan for the project and some cost estimates. During system analysis the
feasibility study of the proposed system is to be carried out. This is to ensure that the proposed
system is not a burden to the company. For feasibility analysis, some understanding of the
major requirements for the system is essential.
24
CHAP
TER 5
SOFTWARE REQUIREMENT
This project utilizes a range of software tools and technologies for the development of
a Sales Forecasting Tool. The software requirements span across the programming language
used, libraries required for data processing and machine learning, web development
technologies, and integrated development environments (IDES)
Python serves as the core language for this project. It is an open-source, high-level,
interpreted programming language known for its simplicity, readability, and versatility. Python
is widely adopted in data science, artificial intelligence, and web development fields due to its
strong support for machine learning libraries and frameworks.
Python provides various built-in modules and supports third party packages which
significantly streamline the implementation of complex machine learning and deep learning
models
25
5.3 PYTHON LIBRARIES AND PACKAGES
NumPy is used for performing numerical operations
Pandas helps in handling and preprocessing large datasets.
Matplotlib, Seaborn, and Plotly are utilized for creating data visualization.
Scikit-learn is used to implement machine learning models like Linear Regression and to
evaluate model performance using metrics such as MAE, RMSE, and R2 Score
XGBoost is employed for gradient boosting-based sales prediction due to its efficiency
and accuracy.
TensorFlow and Keras are used to develop Long Short- Term Memory (LSTM) models
suitable for time-series forecasting.
Flask is a lightweight web framework that enables the integration of the machine learning
model into a user-friendly web interface.
Pickle and Joblib are used for model serialization and loading during web deployment.
Datetime, OS, and Glob modules are used for date manipulation and file handling
operations.
26
Optional tools such as Postman may be used to test API endpoints, and Git or GitHub can be
used for version control and collaborative development.
27
5.6 Development Tools
Visual Studio Code: The integrated development environment (IDE) used to write, edit
and debug the Python and HTML code.
Python 3.10.5 installed locally for code execution.
Google Chrome / Mozilla Firefox: Used to access and test the web interface.
Windows 10/11
Linux distributions (Ubuntu 20.04 and later)
macOS (latest versions)
28
CHAPTER 6
CONCLUSION
This project aims to bridge the gap between small businesses and advanced data-driven
decision-making by providing an intuitive, web-based sales forecasting tool. By automating
complex tasks such as data preprocessing, model selection, and forecast visualization, the
solution empowers non-technical users to generate accurate predictions effortlessly. Leveraging
both classical and modern machine learning models like Linear regression, LSTM, and
XGBoost, the tool offers reliable insights into future sales trends.
Through this approach, small companies can enhance inventory management, reduce
operational costs, and respond proactively to market changes—ultimately gaining a competitive
edge in an increasingly data-centric world. The scalable and modular design ensures the system
can evolve with business needs, supporting long-term growth and innovation.
29
CHAPTER 7
SCOPE OF THE FEATURE ENHANCEMENT
Automated sales forecasting leverages data analytics, machine learning, and Al tools to
predict future sales trends. For small businesses, this can be transformative. Below is a detailed
scope covering key areas where automation in sales forecasting empowers small companies:
1. Data collection and integration: Automating the gathering of sales data from CRM, ERP, POS
systems, and e-commerce platforms. Integration with social media, marketing campaigns,
customer feedback, and market trends. Saves time and reduces errors from manual data entry.
2. Trend Analysis & Seasonality Detection: Identifying recurring sales patterns (daily, weekly,
seasonal). Using historical data to detect peak and low-demand periods. Helps in inventory
planning, marketing timing, and workforce allocation.
4. Customer Behavior Analysis: Segmenting customers and forecasting demand based on past
behavior, demographics, and preferences. Predicting customer churn or upsell opportunities.
Personalizes sales strategies and enhances customer retention.
5. Inventory and Supply Chain Optimization: Aligning sales forecasts with inventory needs to
prevent overstocking or stockouts. Automating restocking and supply chain decisions based on
predicted demand. Saves costs and improves operational efficiency.
30
CHAPTER 8
REFERENCE
[1] Gärtner, Lippert, & Konigorski, 2021, Automated Demand Forecasting in Small to Medium-
Sized Enterprises.
[2] Shaik Vadla et al., 2024, Enhancing Product Design through Al-Driven Sentiment Analysis
of Amazon Reviews Using BERT.
[3] Pereira et al., 2023, Application of Machine Learning Methods in Forecasting Sales Prices in
a Project Consultancy.
[4] Habil, El-Deeb, & El-Bassiouny, 2023, Al-Based Recommendation Systems: The Ultimate
Solution for Market Prediction and Targeting.
[5] Ensafi et al., 2022, Time-Series Forecasting of Seasonal Items Sales using Machine Learning
- A Comparative Analysis.
[6] Shaik & Verma, 2022, Predicting Present Day Mobile Phone Sales using Time Series-based
Hybrid Prediction Model.
[7] Ahmad, Khan, & Aslam, 2023, Demand Forecasting Using Prophet.
[8] Jena, Rout, & Mishra, 2022, A Survey of Deep Learning in Forecasting.
[9] Lee, Park, & Kim, 2022, Retail Sales Forecasting Using XGBoost.
[10] Sohrabpour et al., 2021, Export Sales Forecasting using Artificial Intelligence.
[11] Wassan et al., 2021, Amazon Product Sentiment Analysis using Machine Learning
Techniques.
[12] Alvarez & Chang, 2021, Predicting E-commerce Sales with Ensemble Methods.
[14] Thomas & George, 2020, Stock and Sales Forecasting Using RNNs.
[15] Kuo & Lin, 2020, Short-Term Forecasting of Retail Sales Using LSTM Networks.
31
APPENDIX 1:
CODING
Index.html
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<link href="https://cdn.jsdelivr.net/npm/[email protected]/dist/css/bootstrap.min.css"
rel="stylesheet">
</head>
<body>
<div class="container-fluid">
<div class="d-flex">
</div>
</div>
</nav>
<div class="col-md-4">
Upload Data
</div>
<div class="card-body">
<div class="mb-3">
</div>
</form>
</div>
</div>
</div>
<div class="col-md-8">
Instructions
</div>
33
<div class="card-body">
<ul>
<li>The system will predict future yearly sales using trained models.</li>
</ul>
</div>
</div>
</div>
</div>
</div>
</div>
</body>
</html>
34
result.html
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<link rel="stylesheet"
href="https://cdn.jsdelivr.net/npm/[email protected]/dist/css/bootstrap.min.css">
<script src="https://cdn.jsdelivr.net/npm/chart.js"></script>
</head>
<body>
<div class="container-fluid">
</div>
</nav>
</div>
<div class="chart-container">
<canvas id="salesChart"></canvas>
</div>
35
<div class="text-center">
</div>
</div>
<script>
type: 'line',
data: {
datasets: [{
borderWidth: 2,
fill: true,
tension: 0.3
}]
},
options: {
responsive: true,
maintainAspectRatio: false,
plugins: {
legend: {
36
display: true,
position: 'top'
},
scales: {
y: {
title: {
display: true,
text: 'Sales'
},
x: {
title: {
display: true,
text: 'Year'
});
</script>
</body>
</html>
37
app.py
import pandas as pd
import os
app = Flask(__name__)
UPLOAD_FOLDER = 'uploads'
PLOT_PATH = 'static/prediction_plot.png'
REPORT_PATH = 'static/prediction_report.csv'
@app.route('/')
def home():
return render_template('index.html')
@app.route('/predict', methods=['POST'])
def predict():
try:
file = request.files['file']
if not file:
file.save(file_path)
38
# Load and preprocess
df = pd.read_csv(file_path)
df['ORDERDATE'] = pd.to_datetime(df['ORDERDATE'])
df['Year'] = df['ORDERDATE'].dt.year
yearly_sales = df.groupby('Year')['SALES'].sum().reset_index()
X = yearly_sales[['Year']]
y = yearly_sales['SALES']
model = LinearRegression()
model.fit(X, y)
next_year = yearly_sales['Year'].max() + 1
predicted_sales = model.predict([[next_year]])[0]
# Append prediction
predicted_df = pd.concat([
yearly_sales,
], ignore_index=True)
39
# Save report
predicted_df.to_csv(REPORT_PATH, index=False)
# Plot
plt.figure(figsize=(10, 5))
plt.xlabel('Year')
plt.ylabel('Sales')
plt.grid(True)
plt.legend()
plt.tight_layout()
plt.savefig(PLOT_PATH)
plt.close()
years = predicted_df['Year'].tolist()
predictions = predicted_df['SALES'].tolist()
return render_template('result.html',
years=years,
predictions=predictions)
except Exception as e:
40
return f"❌ Error: {str(e)}"
@app.route('/download')
def download_prediction():
if __name__ == '__main__':
app.run(debug=True)
41
hello.py
import pandas as pd
import os
base_path = os.path.dirname(os.path.abspath(__file__))
try:
df = pd.read_csv(file_path, encoding='ISO-8859-1')
return df
except FileNotFoundError:
return pd.DataFrame()
except Exception as e:
return pd.DataFrame()
(sales_data_sample.csv)")
42
data_preprocessing.py
import pandas as pd
df = pd.read_csv(input_path, encoding='ISO-8859-1')
df.dropna(inplace=True)
# Parse date
df['ORDERDATE'] = pd.to_datetime(df['ORDERDATE'])
df['Year'] = df['ORDERDATE'].dt.year
df['Month'] = df['ORDERDATE'].dt.month
df['Week'] = df['ORDERDATE'].dt.isocalendar().week
df['Day'] = df['ORDERDATE'].dt.day
df['Temperature'] = 70 # Placeholder
43
df['Fuel_Price'] = 3.5
df['CPI'] = 220
df['Unemployment'] = 6.5
df = df[columns]
# Save outputs
df.to_csv(output_path, index=False)
yearly_sales = df.groupby('Year')['Weekly_Sales'].sum().reset_index()
yearly_sales.to_csv('features.csv', index=False)
if __name__ == '__main__':
preprocess_data()
train_linear_regression.py
44
import pandas as pd
import numpy as np
import joblib
import os
try:
df = pd.read_csv(file_path)
if required_columns:
if missing_cols:
return df
except FileNotFoundError:
raise
except Exception as e:
data_path = Path("./")
try:
data = load_dataset(
data_path / "cleaned_train_data.csv",
X = data[feature_cols]
y = data['Weekly_Sales']
# Split data
print("✂️Splitting data...")
# Scale features
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
# Train model
model = LinearRegression()
model.fit(X_train_scaled, y_train)
# Predict
y_pred = model.predict(X_test_scaled)
# Evaluate
rmse = np.sqrt(mse)
r2 = r2_score(y_test, y_pred)
47
# Save model and scaler
model_dir.mkdir(parents=True, exist_ok=True)
except Exception as e:
raise
train_xgboost.py
48
import pandas as pd
import numpy as np
import joblib
import os
import pickle
try:
df = pd.read_csv(file_path)
if required_columns:
if missing_cols:
return df
except FileNotFoundError:
raise
except Exception as e:
raise
49
# === Set the data path ===
data_path = Path("./")
try:
data = load_dataset(
data_path / "cleaned_train_data.csv",
X = data[feature_cols]
y = data['Weekly_Sales']
# Split data
print("✂️Splitting data...")
# Train model
model = xgb.XGBRegressor(
50
n_estimators=200,
max_depth=6,
learning_rate=0.1,
subsample=0.8,
colsample_bytree=0.8,
random_state=42,
objective="reg:squarederror"
model.fit(X_train, y_train)
# Predict
y_pred = model.predict(X_test)
# Evaluate
r2 = r2_score(y_test, y_pred)
# Save model
pickle.dump(eval_results, f)
except Exception as e:
raise
train_lstm.py
52
import numpy as np
import pandas as pd
import tensorflow as tf
import pickle
import logging
# Configure logging
try:
df = pd.read_csv(file_path)
if required_columns:
if missing_cols:
return df
53
except FileNotFoundError:
raise
except Exception as e:
raise
data_path = Path("./")
try:
data = load_dataset(
data_path / "cleaned_train_data.csv",
X = data[feature_cols].values
y = data['Weekly_Sales'].values
# Scale features
54
logging.info("🔄 Scaling features...")
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
# Train/test split
logging.info("✂️Splitting data...")
model = Sequential([
Dropout(0.2),
LSTM(32),
Dropout(0.2),
Dense(16, activation='relu'),
Dense(1)
])
monitor='val_loss',
patience=10,
restore_best_weights=True,
verbose=1
model_dir.mkdir(parents=True, exist_ok=True)
# Train model
history = model.fit(
X_train, y_train,
epochs=50,
batch_size=32,
validation_split=0.2,
callbacks=[early_stopping],
verbose=1
# Evaluate model
y_pred = model.predict(X_test)
56
mae = mean_absolute_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
# Save model
model.save(model_path)
pickle.dump(eval_results, f)
except Exception as e:
evaluate_models.py
import pandas as pd
import numpy as np
58
import pickle
import os
import joblib
import logging
# Setup logging
def load_pickle(file_path):
try:
return pickle.load(f)
except FileNotFoundError:
raise
except Exception as e:
raise
y_pred = np.array(y_pred)
mask = y_true != 0
if not np.any(mask):
return np.inf
data_path = Path("./")
models_dir.mkdir(exist_ok=True)
try:
available_models = {}
if lstm_path.exists():
available_models["LSTM"] = lstm_path
if xgb_path.exists():
available_models["XGBoost"] = xgb_path
60
if lr_path.exists():
available_models["LinearRegression"] = lr_path
if not available_models:
raise FileNotFoundError("❌ No trained models found. Please train at least one model first.")
eval_results = {}
if eval_results_path.exists():
eval_results = load_pickle(eval_results_path)
if not test_data_path.exists():
raise FileNotFoundError(f"❌ Test file not found at {test_data_path}. Run data preprocessing first.")
test_data = pd.read_csv(test_data_path)
# Prepare features
X_test = test_data[feature_cols]
61
y_true = test_data['Weekly_Sales']
try:
if model_name == "LSTM":
model = load_model(model_path)
y_pred = model.predict(X_input)
y_pred = y_pred[:, 0]
y_pred = y_pred.flatten()
else:
model = joblib.load(model_path)
y_pred = model.predict(X_test)
# Evaluation metrics
r2 = r2_score(y_true, y_pred)
62
print(f"\n=== 📊 {model_name} Model Performance ===")
print(f"MAE : {mae:.2f}")
print(f"RMSE : {rmse:.2f}")
print(f"MAPE : {mape:.2f}%")
print(f"R² : {r2:.4f}")
# Save results
eval_results[model_name] = r2
result_df = pd.DataFrame({
"True_Sales": y_true,
"Predicted_Sales": y_pred
})
result_df.to_csv(result_file, index=False)
except Exception as e:
continue
63
# Save updated evaluation results
pickle.dump(eval_results, f)
if eval_results:
print(f"{model}: {score:.4f}")
else:
except Exception as e:
raise
64
APPENDIX 2:
SCREENSHOT
65
66