Video-text retrieval has been a crucial task with the exponential growth of video. Recent methods that leverage the pretrained CLIP model in the video-text retrieval task have demonstrated remarkable performance, surpassing many approaches trained on large-scale video-text datasets. However, these works ignore the performance-efficiency trade-off in the pursuit of better performance. Additionally, a completely new model is typically necessary for each task because many existing works fully fine-tune the pre-trained backbone. Therefore, to yield a compact and transferable model, we propose LoCLIP, a framework that transfers knowledge from CLIP in a parameter-efficient manner. Inspired by LoRA, we incorporate only a small set of trainable low-rank matrices per task, allowing adaptation to new tasks by simply replacing these matrices. In this way, we can acquire task-specific knowledge without compromising the prior knowledge stored in the pre-trained backbone. To demonstrate the effectiveness of our LoCLIP, we conduct extensive experiments and achieve comparable performance with state-of-the-art CLIP-based video-text retrieval methods while updating only a few parameters.
KEYWORDS: Time series analysis, Data analysis, Data mining, Data processing, Machine learning, Autoregressive models, Convolution, Data modeling, Neural networks, Visual process modeling, Performance modeling, Neurons, Feature extraction, Structural design
Predicting retail sales is a hot research topic that can help firms achieve on-demand procurement. That can reduce the extra costs caused by an inventory shortage or surpluses. Traditional methods usually regard the sales of a certain product as the independent time series and then solve the task by a curve fitting model. However, the sales volume is not a single value, but the one aggregated from many types of products. There are correlations between different types of product sales. The increase in one type of product sales may lead to changes in sales in another type. Without capturing the correlation between different product sales, it is difficult to obtain satisfactory performance. To solve this problem, we propose a new multi-task method based on multivariate time series learning. Considering there are multiple product types and each type has different trend characteristics, we first cluster the time series of all product types and then model each as a multivariate prediction task. We then design a new dilated convolution network to fusion the features of related products and the trend characteristic of each task. Moreover, we develop a trend structural entropy network to grasp the fluctuation features of the task. A new self-enhancement mechanism is proposed to finely capture the correlations among tasks. Through multi-task learning, the model can effectively improve prediction accuracy by using the complementary information of closely related time series. Experimental results on two real datasets show the effectiveness of our approach.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.