[1]雷智文,黄玲. 面向数字资源的自动标签模型[J].哈尔滨理工大学学报,2020,25(03):144-150.[doi:10.15938/j.jhust.2020.03.022]
 LEIZhi wen,HUANG Ling. An Automatic Tagging System Focused on Digital Resources[J].哈尔滨理工大学学报,2020,25(03):144-150.[doi:10.15938/j.jhust.2020.03.022]





 An Automatic Tagging System Focused on Digital Resources
 1.哈尔滨理工大学 自动化学院,哈尔滨 150080;
2.中国科学院 自动化研究所,北京 100190)
 LEIZhiwen12HUANG Ling1
(1.School of Automation,Harbin University of Science and Technology,Harbin 150080,China;
2.Institute of Automation,Chinese Academy of Science,Beijing 100190,China)
 Keywords:automatic tagging latent dirichlet allocation Word2Vec

摘要:针对数字资源标签数量不足,获取困难的问题,提出了一种新的自动标签方法,对于收集的公共文化资源数据集和其它公开数据集,能够有效的进行标签扩展。提出过程依据神经网络理论和生成学习理论,采用隐含狄利克雷分布(latent dirichlet allocation, LDA)和Word2Vec方法分别对资源和初始标签进行处理,生成资源和初始标签的表示向量,然后以此两种向量作为深度结构语义模型的输入,建立面向数字资源的自动标签模型。从结果来看,该方法的标签扩展效果在精确度、平均排序倒数、平均准确率等指标上表现上总体优于文中提到的其它对比方法,能够解决某些情况下资源标签不足的问题,提高资源的利用率。

 Abstract:In this paper, we proposed a novel automatic tagging system which aimed at the lack of tags about digital resources and the difficulty of extending tags This tagging system can effectively extend tags for public cultural resources we collected and other public data sets The algorithm of tagging system based on neural network and generative learning We use Latent Dirichlet Allocation (LDA) and Word2Vec to process resources and initial tags, generating the representation vectors of resources and initial tags, then use these two kinds of vector to build this automatic tagging system focused on digital resources From the results, the Precision, MRR, MAP and other indexes of this method is better than other comparison tagging methods mentioned in this paper, and it can solve the lack of tags in some cases Increasing utilization of resources



[1]LEI Zhiwen, YANG Yi, HUANG Weixing, et al. Tag Recommendation for Cultural Resources[C]// 2018 IEEE International Conference on Software Quality, Reliability and Security Companion (QRSC), Lisbon, 2018: 566.
[2]SOODS C, HAMMOND K J, OWSLEY S H, et al. TagAssist: Automatic Tag Suggestion for Blog Posts[C]// ICWSM, Colorado, USA, Mar 26-28, 2007.
[3]BELEM, FABIANO, EDER MARTINS, et al. Associative Tag Recommendation Exploiting Multiple Textual Features[C]// Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, 2011. 1033.
[4]HUANG Posen, HE Xiaodong, GAO Jianfeng, et al. Learning Deep Structured Semantic Models for Web Search Using Clickthrough Data[C]// Proceedings of the 22nd ACM International Conference on Conference on Information & Knowledge Management, ACM, 2013: 2333.
[5]NISHIDA KYOSUKE, FUJIMURA KO. Hierarchical Autotagging: Organizing Q&A Knowledge for Everyone[C]// Proceedings of the 19th ACM International Conference on Information and Knowledge Management, ACM, 2010: 1657.
[6]CHIRITA, PAULALEXANDRU, STEFANIA COSTACHE, et al. Ptag: Large Scale Automatic Generation of Personalized Annotation Tags for the Web[C]// Proceedings of the 16th International Conference on World Wide Web, ACM, 2007: 845.
[7]DIAZAVILES, ERNESTO, MIHAI GEORGESCU, et al. Lda for Onthefly Auto Tagging[C]// Proceedings of the Fourth ACM Conference on Recommender Systems, ACM, 2010: 309.
[8]SI Xiance, SUN Maosong. TagLDA for Scalable Realtime Tag Recommendation[J].Journal of Information&Computational Science, 2009, 6(2): 1009.
[9]HARA SUNAO, KITAOKA NORIHIDE, TAKEDA KAZUYA. Online Detection of Task Incompletion for Spoken Dialog Systems Using Utterance and Behavior Tag Ngram Vectors[C]// Proceedings of the Paralinguistic Information and its Integration in Spoken Dialogue Systems Workshop. Springer, New York, 2011: 215.
[10]SKOUTAS, DIMITRIOS, MOHAMMAD ALRIFAI. Ranking Tags in Resource Collections[C]// Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 2011: 1207.
[11]ZHANG Hongbin, JI Donghong, YIN Lan, et al. Product Image Sentence Annotation Based on Kernel Descriptors and Tagrank[J]. Journal of Southeast University, 2016, 32(2): 170.
[12]FRIGYIK B, KAPILA A, GUPTA R. Introduction to the Dirichlet Distribution and Related Processes[R]. Department of Electrical Engineering, University of Washignton, Uweetr20100006, 2010.
[13]CHEN LINCHIH. An Effective LDAbased Time Topic Model to Improve Blog Search Performance[J].Information Processing & Management, 2017, 53(6): 1299.
[14]PAVLINEK MIHA, PODGORELEC VILI. Text Classification Method Based on Selftraining and LDA Topic Models[J].Expert Systems with Applications, 2017, 80: 83.
[15]LU Yue, MEI Qiaozhu, ZHAI Chengxiang. Investigating Task Performance of Probabilistic Topic Models: An Empirical Study of PLSA and LDA[J].Information Retrieval, 2011, 14(2): 178.
[16]LE QUOC, MIKOLOV TOMAS. Distributed Representations of Sentences and Documents[C]// International Conference on Machine Learning, 2014: 1188.
[17]MIKOLOV TOMAS, TOMAS, CHEN Kai, GREG CORRADO, et al. Efficient Estimation of Word Representations in Vector Space[C]// arXiv Preprint arXiv:1301.3781, 2013.
[18]HUANG Chenghui, YIN Jian, HOU Fang. A Text Similarity Measurement Combining Word Semantic Information with TFIDF Method[J].Jisuanji Xuebao(Chinese Journal of Computers), 2011, 34(5): 856.
[19]李鹏,王斌,石志伟,等. TagTextRank:一种基于Tag的网页关键词抽取方法[C]// 全国信息检索学术会议,2010:456.
LI Peng, WANG Bin, SHI Zhiwei, et al. TagTextRank: A TagBased Keyword Extraction Method[C]. National Conference on Information Retrieval, 2010:456.
[20]LI Peng, WANG Bin, SHI Zhiwei, et al. TagTextRank: A Webpage Keyword Extraction Method Based on Tags[J].Journal of Computer Research and Development, 2012, 49(11): 2344.


 SUN Yong-quan,GUO Jian-ying,CHEN Hong-ke,et al.An Improved Reliability Growth Prediction Algorithm Based on AMSAA Model[J].哈尔滨理工大学学报,2010,15(03):49.
 TENG Zhi-jun,LI Xiao-xia,ZHENG Quan-long.Geometric Model for Mine MIMO Channels and Its Capacity Analysis[J].哈尔滨理工大学学报,2012,17(03):14.
 LI Yan-ping,ZHANG Li-yong.An Improved Algorithm of OFDM Symbol Timing Based on A New Training Sequence[J].哈尔滨理工大学学报,2012,17(03):19.
 ZHAO Yan-ling,CHE Chun-yu,XUAN Jia-ping,et al.[J].哈尔滨理工大学学报,2013,18(03):37.
 LI Dong-mei,LU Yang,LIU Wei-hua.[J].哈尔滨理工大学学报,2013,18(03):73.
 HUA Xiu-ying,LIU Wen-de.[J].哈尔滨理工大学学报,2013,18(03):76.
 GUI Cun-bing,LIU Yong,HE Ye-jun.[J].哈尔滨理工大学学报,2013,18(03):90.
 WENG Ling,YAN Li-wen,XIA Qian-shan.[J].哈尔滨理工大学学报,2013,18(03):25.
 JIANG Bin,LIN Ai-qin,WANG Song-tao,et al.[J].哈尔滨理工大学学报,2013,18(03):63.
[10]李星纬,李晓东,张颖彧,等.EVOH 磺酸锂电池隔膜的制备及微观形貌[J].哈尔滨理工大学学报,2013,18(05):18.
 LI Xing- wei,LI Xiao- dong,ZHANG Ying- yu,et al.The Preparation and Microcosmic Morphology oEVOH- SO Li Lithium Ion Battery Septum[J].哈尔滨理工大学学报,2013,18(03):18.


 收稿日期: 2018-11-23
基金项目: 国家科技支撑计划(2015BAK25B00)
更新日期/Last Update: 2020-10-14