|
1.INTRODUCTIONThere have been abundant research results on market demand predictions and some traditional prediction methods have been gradually improved and perfected after being put forward one after another. In general, current prediction methods are separated into two kinds: qualitative prediction methods and quantitative prediction methods1-3. Qualitative prediction methods include empirical judgment method, expert judgements method (Delphi method), subjective probability method, et al. And there are more types of quantitative prediction methods, mainly including classic econometric model method and methods using intelligent algorithms rising in recent years such as ANN (Artificial Neural Networks) prediction method, SVM (Support Vector Machine) prediction model4-5. In 1985, Rumelhart et al. developed a EBP (Error Back Propagation, in short BP) algorithm, which solved the calculation issue of the connection weight between multi-layer connectors and greatly improved the availability of Neutral Network model6. In practical applications, 80%-90% artificial neural network models are the variation forms based on the core idea of BP Neutral Network. The training process of BP Neutral Network is the process of continuously adjusting the connection weight between neurons based on sample set, in which, the learning is trained with a teacher and the sample set is composed by vector pairs in the form of (input vector, ideal output vector). All the vector pairs should be the actual operation results of the system to be simulated in the internet. And they can be collected by real operation system7. 2.BP NEUTRAL NETWORK MODEL2.1.BP neutral network structureBP Neutral Network usually contains at least three layers: input layer, hidden layer, and output layer as shown in Figure 1. Each layer consists of a certain number of neurons, which is the smallest information processing unit. Input layer provides information from the outside and passes the information to the middle layer through neurons. The information is processed in this layer. The structure of the middle layer can be increased or decreased according to the operation requirements, at least one layer. Processed information will be transferred to the output layer. The whole information transmission process is called forward propagation. When the error between actual output and expected output does not satisfy the requirements, neutral network will proceed error back propagation based on the error gradient descent training method, adjusting the connection weights between neurons layer by layer till the input layer. This is the learning method of neutral network: by information flowing forward and backward repeatedly until the error meets the requirements or the cycle reaches the maximum learning times8-9. 2.2.BP neutral network algorithmThe main learning steps of the BP Neutral Network algorithm are10: (1) Network initialization We suppose the node numbers of each layer are n, l, m. The weight from input layer to hidden layer is wij, while the weight from hidden layer to output layer is wjk. The bias between input layer and hidden layer is aj, while the bias between hidden layer and output layer is bk. Learning rate is η. Error function is set as e, as well as operation error ε and cycle upper limit M. The initial value of the connection weight of neurons is set between (-1, 1). Excitation function g(x) takes the Sigmoid function in the form of (2) Output information from hidden layer (3) Output information from output layer (4) Error calculation Error equation is where Yk is the expected output; Ok is the actual output; the error between them is Partial derivative is calculated by where i=1,2…n, j=1,2…l, k=1,2…m. (5) Weight update The update equation for weight is (6) Bias update The update equation for bias is The bias from hidden layer to output layer is improved by So the update equation for bias is 3.SAMPLE DATA SELECTION3.1.Automobile sales data selectionAutomobile sales data used in the paper comes from the statistical zone of China Association of Automobile Manufacturer (http://www.caam.org.cn/). The 36 months’ monthly sales data from Jan. 2018 to Dec. 2020 of Langyi model of SAIC VOLKSWAGEN was selected for model prediction analysis and afterwards tests as shown in Table 1. Due to the COVID-19 pandemic, automobile sales during Feb. 2020 and Mar. 2020 have been affected and decreased suddenly. Therefore, instead of the actual sales of the two months, the expected actual sales have been calculated by quadratic exponential smoothing method, which are 34858 and 35678 respectively, taking these replacements data for neural network training11. Table 1.Langyi model’s automobile monthly sales during 2018-2020 (unit: vehicle).
3.2.Keyword selectionAccording to Baidu’s investigation and research report on netizens, consumers will use search engine for related products’ information after they get the intention of buying an automobile and the most commonly used search keyword is the title of the model. Besides, consumer’s focuses are mainly on brand, manufacturer, price, performance and other information12. This paper takes factors like China’s automobile consumption environment, usage environment and consumer’s recognition on automobile products, and expands the keywords by Group discussion method and Baidu searching index recommendation on the basis of the above-mentioned keyword categories. Part of the selected initial keywords is shown in Table 2. Table 2.Part of the selected initial keywords.
The historical search data towards the keyword in the initial keyword thesaurus is collected but the keywords with data losing and not included in the thesaurus are eliminated. The Cross Correlation function of SPSS software is used for keywords Baidu index and Langyi monthly sales data for correlation test and time difference correlation analysis. To enhance the quality of the research data and simplify the model complexity, keywords with the correlation coefficient less than 0.5 are eliminated. Thirteen keywords are reserved for afterwards model training as shown in Table 3. Table 3.Keywords’ correlation coefficient and leading order.
The results of correlation analysis and time difference analysis have verified the previous idea that most of the keywords are ahead of the sales index in correlation analysis, only part of the keywords fall behind the sales index. Besides, search data of the internet keywords not only positively but also negatively relates with automobile’s monthly sales. That is to say, with the increase of the internet keywords with positive correlation, the afterwards sales will increase correspondingly, so as the keywords with negative correlation. 4.MODEL STRUCTURE AND PREDICTION RESULT ANALYSIS4.1.BP neutral network structureBP Neutral Network has three layers, with 13 input layer nodes, 1 output layer node. Based on empirical formula f = 1.5mn (f is the node number of hidden layer, n and m are the numbers of input neurons and output neurons). The nodes in hidden layer are preliminarily set as 5. By using Trial and Error Method, different numbers of nodes in hidden layer have been reset for network training. And the results show that 10 nodes in hidden layer are the best. Node transfer function uses the tangent sigmoid transfer function tansi, logarithmic sigmoid transfer function losi and linear transfer function purelin. Training function uses Momentum Back Propagation and dynamic adaptive learning rate on integrated stochastic gradient descent BP algorithm trainlm. And the BP neutral network is shown in Figure 2. 4.2.Prediction result analysisThe monthly search data of each of the 13 selected keywords is dislocated and aligned with monthly sales data according to the time difference. Sales data from Jan. 2018 to Sept. 2020 is aligned with 33 months’ keywords sales data, both taken as the training data and input into Matlab. In the 5th training, the set error range is reached out as shown in Figure 3. And the total training results are shown in Figure 4, in which the curves show the actual sales, square point means the neural network fitting value and the network training error is small. By inputting 13 keywords and the corresponding internet search data between Oct. 2020 and Dec. 2020, the predicted sales for the three months using already trained neutral network are shown in Table 4. And the error analysis shows that the absolute mean error of sales is 5.6% using this model predicting 3 months’ sales. And the prediction subject is specific model, which means the research is valuable in practical application. One more thing should be pointed out is that the timeliness of the prediction model depends on the number of leading periods of network keywords to actual sales and due to the keywords’ minimum leading period is 1, so the model’s prediction leading time is one month. Table 4.Prediction result and error analysis.
Absolute Percent Error (APE) and Mean Absolute Percent Error (MAPE) are the average values of absolute error, which can reflect the real situation of the predicted errors. MAPE shows the model’s prediction accuracy. The smaller of the MAPE, the higher the prediction accuracy. The definitions are 4.3.Model improvement based on principal components analysisThe 13 selected keywords have linear or non-linear relations with each other, which interferes the model’s training and prediction process with the repeated information. Therefore, by principal components analysis, collinearity indices are synthesized by transferring multiple indices into a few comprehensive indices with more information and stronger interpretation ability to simplify the model and improve the prediction accuracy. The factors affecting the automobile sales are calculated by SPSS and analyzed by principal components analysis method. After the analysis of the original internet search data, two principal components have the characteristic roots bigger than 1. And as to the default settings of SPSS, the first two main components’ accumulative variance contribution exceeds 73%, which can explain the information contained in the original data. And the other components containing little information have been abandoned. On the basis of the above-mentioned BP Neutral Network prediction model, this paper further conducts the principal components analysis on the keyword search indices affecting automobile sales to eliminate the collinearity between different search indices. Taking the selected two main components as the new network input, the input layer nodes are modified to 2. Based on empirical formula and comparative analysis, hidden layer nodes are modified to 2 and other structure and key parameters are kept still. The improved neutral network is shown in Figure 5. Then the reinput data of the model is trained, and the results show that with less data dimension, the structure of the model is simpler and the efficiency of the training is improved. The data fitting effect can be found in Figure 6. The comparison of the prediction values from the improved model and the actual values is shown in Table 5. In the table, we can see that the MAPE decreases by 0.5% and the prediction accuracy is increased after the principal component analysis on the keyword internet search data. Table 5.The prediction results of the principal component analysis on BP neutral network.
5.CONCLUSIONThis paper establishes a web search key Thesaurus on the characteristics of automobile products, which can effectively help in selecting web search data for prediction and analysis. By the correlation analysis and time difference analysis on keyword web search amount and the actual sales, the paper certifies that there is a strong correlation and leading time difference between part of the web search data and the actual automobile sales. And this indeed shows the value of the web search data. The paper trains the 33 months’ keyword search data and automobile sale data by the BP Neutral Network based automobile sales model, and the trained model is used to predict the afterwards three months’ automobile sales. The prediction results show that the absolute mean percentage error is 5.6%, the model’s MAPE decreases by 0.5%, and the prediction accuracy is improved. With the good model-fitting degree and prediction accuracy, the effectiveness and rationality of the model is certified. ACKNOWLEDGMENTSThis research was funded by Scientific Research Project of Wuhan Business University in 2020, “Optimization Design of Operation mode of Spare Parts Reuse and Material Recovery of Retired New Energy Vehicles” (2020KY005). REFERENCESWu, J. and Deng, Y. H.,
“Intercity information diffusion and price discovery in housing markets: Evidence from google searches,”
The Journal of Real Estate Finance and Economics, 50
(3), 289
–306
(2015). https://doi.org/10.1007/s11146-014-9493-9 Google Scholar
Sharad, G., Hofman, J. M., Sébastien, L., et al,
“Predicting consumer behavior with web search,”
in Proceedings of the National Academy of Sciences of the United States of America,
17486
–17490
(2019). Google Scholar
Sun, W., Teng, X. S. and Ma, N.,
“Fault diagnosis of relay protection device based on fuzzy Bp neutral network model,”
in Proc. of 2011 13th IEEE Joint Inter. Computer Science and Information Technology Conf. (JICSIT 2011),
02
(2011). Google Scholar
Cha, M. H., Lu, Z. H., Zhai, J. W. and Zhang, F. S.,
“Using double-suppressed BP neutral network model to predict water quality in Laoha River,”
Journal of Water Resources and Water Engineering, 29
(2), 56
–61
(2018). Google Scholar
Yu, Z., Qin, L., Chen, Y. J. and Parmar, M.,
“Stock price forecasting based on LLE-BP neural network model,”
Physica A: Statistical Mechanics and Its Applications, 553 124197
(2020). https://doi.org/10.1016/j.physa.2020.124197 Google Scholar
Li, B., Zhang, Y. F., Zhang, S. H. and Li, W. Y.,
“Prediction of grain yield in Henan province based on grey BP neural network model,”
Discrete Dynamics in Nature and Society, 2021
(2021). Google Scholar
Sun, X. and Lei, Y.,
“Research on financial early warning of mining listed companies based on BP neural network model,”
Resources Policy, 73 102223
(2021). https://doi.org/10.1016/j.resourpol.2021.102223 Google Scholar
Cavalcante, E. S., Vasconcelos, L., de Farias Neto, G. W., Ramos, W. and Brito, R.,
“Automotive painting process: Minimizing energy consumption by using adjusted convective heat transfer coefficients,”
Progress in Organic Coatings, 140 105479
(2020). https://doi.org/10.1016/j.porgcoat.2019.105479 Google Scholar
Chen, C., Liu, Y., Sun, X. F., Cairano-Gilfedder, C. D. and Titmus, S.,
“An integrated deep learning-based approach for automobile maintenance prediction with GIS data,”
Reliability Engineering and System Safety, 216
(2021). Google Scholar
Liang, Y., Jia, Y., Li, J., Chen, M., Hu, Y., Shi, Y. and Ma, F.,
“Online shop daily sale prediction using adaptive network-based fuzzy inference system,”
in 12th IEEE Inter. Cong. on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI),
(2020). Google Scholar
Meyer, A., Glock, K. and Radaschewski, F.,
“Planning profitable tours for field sales forces: A unified view on sales analytics and mathematical optimization,”
Omega, 105 102518
(2021). https://doi.org/10.1016/j.omega.2021.102518 Google Scholar
Kato, T.,
“Demand prediction in the automobile industry independent of big data,”
Annals of Data Science, 2
(2020). Google Scholar
|