KEYWORDS: Data modeling, Education and training, Machine learning, Nomenclature, Random forests, Performance modeling, Mining, Semantics, Matrices, Detection and tracking algorithms
Predicting the resource consumption and completion status of jobs is beneficial to improve the scheduling performance of the system. Many studies have shown that job name can effectively improve the accuracy of prediction. Therefore, by mining the structural semantic information of job name, this paper introduces new features of job name habit, including job name length, number of job name elements, editing distance, and analyzes each substructure of job name, adding classification features after clustering. The introduced new features can better characterize the similarity between jobs and provide strong support for model prediction. Based on the model trained by the new feature data set, the prediction accuracy is significantly improved compared with the model that only introduces the job name.
To evaluate the resource usage in the scheduler queue, assist users in selecting appropriate queues for fast calculation, and improve the throughput and utilization of the system in the high-performance computing platform, it is necessary to use the historical job data in the queue for data analysis, and then make timely and effective predictions on the number of nodes occupied by jobs. In this paper, a queue node occupancy prediction method based on improved Convolutional Neural Network (CNN) and Long Short Memory network (LSTM) is proposed. Cluster and group the historical data of high-performance clusters to obtain balanced samples. In the improved CNN-LSTM network, the weights of different channels are determined by replacing the pooling layer with the customized attention mechanism layer. Samples are selected and extracted, and L2 regularization is used to prevent over fitting training to obtain more accurate results of node resource occupation. The test results of historical operation data of a supercomputer show that the improved CNN-LSTM hybrid network model proposed in this paper is more accurate than CNN-LSTM hybrid network model, LSTM model, MLP model and random forest model.
Backfill scheduling is a common scheduling strategy in high-performance computing systems that allows priority execution of low-priority jobs to make better use of available resources. Job running time is an important parameter that affects the performance of backfill scheduling algorithm. However, in order to avoid job killing due to lack of time, the running time requested by users is often several times higher than the actual running time, resulting in a certain degree of resource waste. In order to improve resource utilization, a new job running time prediction algorithm is proposed by combining classification and ensemble learning methods. The algorithm first classifies the historical job set according to the application type, then uses Jaccard coefficient to calculate the similarity between the jobs, and further classifies the jobs. At the same time, different integration models are constructed for the jobs of different application types. New jobs are categorized, and the class's integration model is used to predict the running time of the new job. The algorithm was tested on the historical job data of the National Supercomputing Center Kunshan, Hefei Advanced Computing Center and "Wuzhen Light" supercomputing Center and compared with GA-sim algorithm and IRPA algorithm. The experimental results show that compared with the IRPA algorithm, the average absolute error of the algorithm is improved by 60% on the three data sets on average. Compared with the GA-sim algorithm, the average prediction accuracy of the algorithm is improved by 20% on the three data sets on average. Through the in-depth analysis of the experimental results, the amplification method for the low estimation of long and short jobs is given.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.