A comparative online
sales forecasting
analysis: Data mining
techniques
ORGANIZER: AMINULLAH BADR
4TH COURSE
DISCIPLINE: DAF
5/10/24
Introduction
Predicting the sales of different commodities,
such as electricity, metals, clothing, and
agricultural products, has been developed
using various forecasting methods.
Qualitative forecasting methods analyze
market surveys, determine price influencing
factors, and make estimates and judgments
on prices according to market trends.
However, qualitative forecasting is
affected by subjective factors and cannot
provide an accurate numerical
description of price trends. Quantitative
forecasting methods use statistical
models to give a quantitative description
of price trends based on historical
statistical data.
Machine learning models have strong
generalization and mapping capabilities, and
several types have been devised, including
long-short term memory (LSTM) neural
network, extreme learning machine, and
support vector machine. These models have
been used to predict commodity sales, but
their computational cost and complexity have
limitations.
Experimental data acquisition
This study uses data from an online platform in China to
analyze sales data of various clothing products from May
2016 to April 2019. Ten key features were identified as best
reflecting sales: search popularity, search heat, number of
visitors, pageviews, favorites, collection times, additional
buyers, additional purchases, customer group index, and
transaction index. The higher the value of X1 and X2 (search
times), the higher the likelihood of purchasing the product. X5
and X6 (visitors to a single IP) and X7 and X8 (shopping cart
placement) also contribute to sales.
Experimental data mining
Data mining involves interdisciplinary intersection and fusion of intelligent
algorithms, machine learning models, mathematical statistics, and other
fields. Common data mining types include cluster analysis, classification
and regression, correlation rule analysis, and time series analysis. This
research focuses on classification and regression in data mining, and
correlation rule analysis to study the correlation degree of each feature
variable to sales and predict the sales of different commodities. The study
employs the gray correlation model for input feature selection, which has
advantages of low sensitivity to sample size and low requirement for
regularity of data. The correlation degree values between different
features and clothing sales were found to be the highest, with X8 having
the highest correlation degree. The study selected the four features most
related to sales as input features for the clothing sales forecasting model.
The correlation degree between input features and output features directly
impacts the predictive effect of the model.
Case 1: Pants sales forest
The SFOR-ELM model was tested on predicting
online sales of clothing products on an online
platform using data from 2016 to 2019. The model
was validated using the pants sales data set and
the dress sales data set. The predictive results were
compared with other models like ELM, LSTM, SSA-
ELM, and SFO-ELM. The ELM model had the worst
evaluation results, with an RMSE value exceeding
15% and an R2 value of 99.43%. However, the
SFOR-ELM model achieved satisfactory results,
reducing the MAPE by 1.66%, 1.4%, 0.79%, and
1.48%, and the RMSE value by 8.74%, 14.65%,
2.28%, and 4.8%, respectively. The SFOR-ELM
model achieved satisfactory pre-dictive results
compared to other models.
Case 2: Dress sales forest
The SFOR-ELM model was validated using dress
sales data from 2016 to 2019. The model
predicted dress sales based on the first five
months of the current month. The results
showed high fit between predicted and actual
dress sales, with a large deviation between the
forecasting and actual values. The LSTM model
was the most competitive, with a competitive
R2 value of 99.22%. The model's performance
improved the MAPE, RMSE, and fitting effect of
the model.
Conclusion
This research proposes an online sales forecast approach
for clothing products using data mining techniques. The
SFOR algorithm, based on random disturbance strategy,
exhibits high convergence accuracy and avoids local
extremum solutions. The SFOR-ELM model achieves
satisfactory predictive results, improving management
efficiency and economic benefits of e-commerce
platforms. However, limitations include the need for
further development of the SFOR algorithm and
scalability.
References
1: Gautam, A. S. S., & Singh, S. R. (2020). A new method of time series forecasting using intuitionistic fuzzy
set based on average-length. Journal of Industrial and Production Engineering, 37(4), 175–185. Huang, G. B.,
Zhu, Q. 2: Y., & Siew, C. K. (2006). Extreme learning machine: Theory and applications. Neurocomputing,
70(1–3), 489–501. Islam, M. J., Xia, Y. Y., & Sattar, J. (2020). Fast Underwater Image Enhancement for
Improved Visual
3: Perception. IEEE Robotics and Automation Letters, 5(2), 3227–3234. Ji, S. W., Wang, X. J., Zhao, W. P., &
Guo, D. (2019). An Application of a Three-Stage XGBoost-Based Model to Sales Forecasting of a Cross-
Border E-Commerce Enterprise. Mathematical Problems in Engineering,
4: 2019, 15. Karasu, S., Altan, A., Bekiros, S., & Ahmad, W. (2020). A new forecasting model with wrapper-
based feature selection approach using multi-objective optimization technique for chaotic crude oil time
series. Energy, 212, 12. Kardani, N., Bardhan,
5: A., Gupta, S., Samui, P., Nazem, M., Zhang, Y., & Zhou, A. (2021). Predicting permeability of tight
carbonates using a hybrid machine learning approach of modified equilibrium optimizer and extreme
learning machine.
6: Acta Geotechnica. Lai, X. J., Zhang, S., Mao, N., Liu, J. J., & Chen, Q. X. (2022). Kansei engineering for new
energy vehicle exterior design: An internet big data mining approach. Computers & Industrial Engineering,
165.
7: Larrea, M., Porto, A., Irigoyen, E., Barragan, A. J., & Andujar, J. M. (2021). Extreme learning machine
ensemble model for time series forecasting boosted by PSO: Application to an electric consumption
problem.
8: Neurocomputing, 452, 465–472. Li, L. L., Liu, Z. F., Tseng, M. L., Zheng, S. J., & Lim, M. K. (2021). Improved
tunicate swarm algorithm: Solving the dynamic economic emission dispatch problems.
9: Applied Soft Computing, 108. Liu, Z. F., Li, L. L., Liu, Y. W., Liu, J. Q., Li, H. Y., & Shen,
10: Q. (2021). Dynamic economic emission dispatch considering renewable energy generation: A novel
multi-objective optimization approach. Energy, 235.