For a more thorough evaluation, we also add the following recent trackers with their corresponding results (success rate, precision, and FPS) to the comparison: STCT (0.640, 0.780, 2.5),36 RTT (0.588, 0.827, 3 to 4),37 and DLRT (0.512, 0.694, 3).38 Among these trackers, the proposed MSNT (0.564, 0.753, 13.2) achieves better performance than DLRT but is inferior to STCT and RTT. Nevertheless, our tracker has a faster processing speed than these trackers and achieves comparable performance as RTT in success rate and as STCT in precision. However, our tracker still has room to improve compared with the best tracker STCT. STCT regards CNN as an ensemble of base learners and trains the convolutional layers with random binary masks. These techniques reduce the correlation between the learned features and prevent overfitting effectively, although these lead to higher computation cost to some degree. Like the random binary masks in STCT, the similar trick, “Dropout,”39 may be used in our tracker to further avoid overfitting.