Proceedings Article | 23 August 2022
KEYWORDS: Data modeling, Performance modeling, Machine learning, Feature selection, Data storage, Statistical modeling, Binary data, Artificial neural networks, Statistical analysis, Mathematics
In recent years, machine learning and data mining methods have been applied to stock price forecasting problems. However, due to the time series characteristics, data noise and multicollinearity of stock price data, many traditional machine learning methods cannot accurately predict the trend of stock price movement. This paper applies the Least Absolute Shrinkage and Selection Operator (LASSO) penalized logistic regression model and the ridge penalized regression model to the stock price prediction problem and evaluates their performance. First, we select the stock data of CMA, EQR and IRM from 2016-07-21 to 2020-07-13. Then we apply the TTR package to calculate 17 technical indicators. After that, we divide the data set into the training set and test set, train the model using the training set, and evaluate the performance of models on the test set. Finally, we introduce Support Vector Machine (SVM), Random Forest (RF), and Artificial Neural Network (ANN) as comparative models and evaluate the predictive performance of models by evaluating indicators such as sensitivity, specificity, and ROC curve. After the above work, we find that LASSO penalized logistic regression has the best prediction performance, reaching an accuracy of 0.81 on the EQR data set, and the ridge has a prediction performance of 0.79, which is much higher than RF, SVM and ANN.