[1]雷智文,黄玲. 面向数字资源的自动标签模型[J].哈尔滨理工大学学报,2020,25(03):144-150.[doi:10.15938/j.jhust.2020.03.022]
 LEIZhi wen,HUANG Ling. An Automatic Tagging System Focused on Digital Resources[J].哈尔滨理工大学学报,2020,25(03):144-150.[doi:10.15938/j.jhust.2020.03.022]
点击复制

 面向数字资源的自动标签模型()
分享到:

《哈尔滨理工大学学报》[ISSN:1007-2683/CN:23-1404/N]

卷:
25
期数:
2020年03期
页码:
144-150
栏目:
计算机与控制工程
出版日期:
2020-06-25

文章信息/Info

Title:
 An Automatic Tagging System Focused on Digital Resources
文章编号:
1007-2683(2020)03-0144-07
作者:
 雷智文12黄玲1
 1.哈尔滨理工大学 自动化学院,哈尔滨 150080;
2.中国科学院 自动化研究所,北京 100190)
Author(s):
 LEIZhiwen12HUANG Ling1
(1.School of Automation,Harbin University of Science and Technology,Harbin 150080,China;
2.Institute of Automation,Chinese Academy of Science,Beijing 100190,China)
关键词:
 关键词:标签扩展隐含狄利克雷分布Word2Vec
Keywords:
 Keywords:automatic tagging latent dirichlet allocation Word2Vec
分类号:
TP181
DOI:
10.15938/j.jhust.2020.03.022
文献标志码:
A
摘要:
 

摘要:针对数字资源标签数量不足,获取困难的问题,提出了一种新的自动标签方法,对于收集的公共文化资源数据集和其它公开数据集,能够有效的进行标签扩展。提出过程依据神经网络理论和生成学习理论,采用隐含狄利克雷分布(latent dirichlet allocation, LDA)和Word2Vec方法分别对资源和初始标签进行处理,生成资源和初始标签的表示向量,然后以此两种向量作为深度结构语义模型的输入,建立面向数字资源的自动标签模型。从结果来看,该方法的标签扩展效果在精确度、平均排序倒数、平均准确率等指标上表现上总体优于文中提到的其它对比方法,能够解决某些情况下资源标签不足的问题,提高资源的利用率。

Abstract:
 Abstract:In this paper, we proposed a novel automatic tagging system which aimed at the lack of tags about digital resources and the difficulty of extending tags This tagging system can effectively extend tags for public cultural resources we collected and other public data sets The algorithm of tagging system based on neural network and generative learning We use Latent Dirichlet Allocation (LDA) and Word2Vec to process resources and initial tags, generating the representation vectors of resources and initial tags, then use these two kinds of vector to build this automatic tagging system focused on digital resources From the results, the Precision, MRR, MAP and other indexes of this method is better than other comparison tagging methods mentioned in this paper, and it can solve the lack of tags in some cases Increasing utilization of resources


参考文献/References:

 

[1]LEI Zhiwen, YANG Yi, HUANG Weixing, et al. Tag Recommendation for Cultural Resources[C]// 2018 IEEE International Conference on Software Quality, Reliability and Security Companion (QRSC), Lisbon, 2018: 566.
[2]SOODS C, HAMMOND K J, OWSLEY S H, et al. TagAssist: Automatic Tag Suggestion for Blog Posts[C]// ICWSM, Colorado, USA, Mar 26-28, 2007.
[3]BELEM, FABIANO, EDER MARTINS, et al. Associative Tag Recommendation Exploiting Multiple Textual Features[C]// Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, 2011. 1033.
[4]HUANG Posen, HE Xiaodong, GAO Jianfeng, et al. Learning Deep Structured Semantic Models for Web Search Using Clickthrough Data[C]// Proceedings of the 22nd ACM International Conference on Conference on Information & Knowledge Management, ACM, 2013: 2333.
[5]NISHIDA KYOSUKE, FUJIMURA KO. Hierarchical Autotagging: Organizing Q&A Knowledge for Everyone[C]// Proceedings of the 19th ACM International Conference on Information and Knowledge Management, ACM, 2010: 1657.
[6]CHIRITA, PAULALEXANDRU, STEFANIA COSTACHE, et al. Ptag: Large Scale Automatic Generation of Personalized Annotation Tags for the Web[C]// Proceedings of the 16th International Conference on World Wide Web, ACM, 2007: 845.
[7]DIAZAVILES, ERNESTO, MIHAI GEORGESCU, et al. Lda for Onthefly Auto Tagging[C]// Proceedings of the Fourth ACM Conference on Recommender Systems, ACM, 2010: 309.
[8]SI Xiance, SUN Maosong. TagLDA for Scalable Realtime Tag Recommendation[J].Journal of Information&Computational Science, 2009, 6(2): 1009.
[9]HARA SUNAO, KITAOKA NORIHIDE, TAKEDA KAZUYA. Online Detection of Task Incompletion for Spoken Dialog Systems Using Utterance and Behavior Tag Ngram Vectors[C]// Proceedings of the Paralinguistic Information and its Integration in Spoken Dialogue Systems Workshop. Springer, New York, 2011: 215.
[10]SKOUTAS, DIMITRIOS, MOHAMMAD ALRIFAI. Ranking Tags in Resource Collections[C]// Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 2011: 1207.
[11]ZHANG Hongbin, JI Donghong, YIN Lan, et al. Product Image Sentence Annotation Based on Kernel Descriptors and Tagrank[J]. Journal of Southeast University, 2016, 32(2): 170.
[12]FRIGYIK B, KAPILA A, GUPTA R. Introduction to the Dirichlet Distribution and Related Processes[R]. Department of Electrical Engineering, University of Washignton, Uweetr20100006, 2010.
[13]CHEN LINCHIH. An Effective LDAbased Time Topic Model to Improve Blog Search Performance[J].Information Processing & Management, 2017, 53(6): 1299.
[14]PAVLINEK MIHA, PODGORELEC VILI. Text Classification Method Based on Selftraining and LDA Topic Models[J].Expert Systems with Applications, 2017, 80: 83.
[15]LU Yue, MEI Qiaozhu, ZHAI Chengxiang. Investigating Task Performance of Probabilistic Topic Models: An Empirical Study of PLSA and LDA[J].Information Retrieval, 2011, 14(2): 178.
[16]LE QUOC, MIKOLOV TOMAS. Distributed Representations of Sentences and Documents[C]// International Conference on Machine Learning, 2014: 1188.
[17]MIKOLOV TOMAS, TOMAS, CHEN Kai, GREG CORRADO, et al. Efficient Estimation of Word Representations in Vector Space[C]// arXiv Preprint arXiv:1301.3781, 2013.
[18]HUANG Chenghui, YIN Jian, HOU Fang. A Text Similarity Measurement Combining Word Semantic Information with TFIDF Method[J].Jisuanji Xuebao(Chinese Journal of Computers), 2011, 34(5): 856.
[19]李鹏,王斌,石志伟,等. TagTextRank:一种基于Tag的网页关键词抽取方法[C]// 全国信息检索学术会议,2010:456.
LI Peng, WANG Bin, SHI Zhiwei, et al. TagTextRank: A TagBased Keyword Extraction Method[C]. National Conference on Information Retrieval, 2010:456.
[20]LI Peng, WANG Bin, SHI Zhiwei, et al. TagTextRank: A Webpage Keyword Extraction Method Based on Tags[J].Journal of Computer Research and Development, 2012, 49(11): 2344.



相似文献/References:

[1]孙永全,郭建英,陈洪科,等.AMSAA模型可靠性增长预测方法的改进[J].哈尔滨理工大学学报,2010,15(05):49.
 SUN Yong-quan,GUO Jian-ying,CHEN Hong-ke,et al.An Improved Reliability Growth Prediction Algorithm Based on AMSAA Model[J].哈尔滨理工大学学报,2010,15(03):49.
[2]滕志军,李晓霞,郑权龙,等.矿井巷道的MIMO信道几何模型及其信道容量分析[J].哈尔滨理工大学学报,2012,17(02):14.
 TENG Zhi-jun,LI Xiao-xia,ZHENG Quan-long.Geometric Model for Mine MIMO Channels and Its Capacity Analysis[J].哈尔滨理工大学学报,2012,17(03):14.
[3]李艳苹,张礼勇.新训练序列下的改进OFDM符号定时算法[J].哈尔滨理工大学学报,2012,17(02):19.
 LI Yan-ping,ZHANG Li-yong.An Improved Algorithm of OFDM Symbol Timing Based on A New Training Sequence[J].哈尔滨理工大学学报,2012,17(03):19.
[4]赵彦玲,车春雨,铉佳平,等.钢球全表面螺旋线展开机构运动特性分析[J].哈尔滨理工大学学报,2013,18(01):37.
 ZHAO Yan-ling,CHE Chun-yu,XUAN Jia-ping,et al.[J].哈尔滨理工大学学报,2013,18(03):37.
[5]李冬梅,卢旸,刘伟华,等.一类具有连续接种的自治SEIR传染病模型[J].哈尔滨理工大学学报,2013,18(01):73.
 LI Dong-mei,LU Yang,LIU Wei-hua.[J].哈尔滨理工大学学报,2013,18(03):73.
[6]华秀英,刘文德.奇Hamiltonian李超代数偶部的非负Z-齐次导子空间[J].哈尔滨理工大学学报,2013,18(01):76.
 HUA Xiu-ying,LIU Wen-de.[J].哈尔滨理工大学学报,2013,18(03):76.
[7]桂存兵,刘洋,何业军,等.基于LCC谐振电路阻抗匹配的光伏发电最大功率点跟踪[J].哈尔滨理工大学学报,2013,18(01):90.
 GUI Cun-bing,LIU Yong,HE Ye-jun.[J].哈尔滨理工大学学报,2013,18(03):90.
[8]翁凌,闫利文,夏乾善,等.PI/TiC@Al2O3复合薄膜的制备及其电性能研究[J].哈尔滨理工大学学报,2013,18(02):25.
 WENG Ling,YAN Li-wen,XIA Qian-shan.[J].哈尔滨理工大学学报,2013,18(03):25.
[9]姜彬,林爱琴,王松涛,等.高速铣刀安全性设计理论与方法[J].哈尔滨理工大学学报,2013,18(02):63.
 JIANG Bin,LIN Ai-qin,WANG Song-tao,et al.[J].哈尔滨理工大学学报,2013,18(03):63.
[10]李星纬,李晓东,张颖彧,等.EVOH 磺酸锂电池隔膜的制备及微观形貌[J].哈尔滨理工大学学报,2013,18(05):18.
 LI Xing- wei,LI Xiao- dong,ZHANG Ying- yu,et al.The Preparation and Microcosmic Morphology oEVOH- SO Li Lithium Ion Battery Septum[J].哈尔滨理工大学学报,2013,18(03):18.

备注/Memo

备注/Memo:
 收稿日期: 2018-11-23
基金项目: 国家科技支撑计划(2015BAK25B00)
作者简介:
雷智文(1994—),男,硕士研究生
通信作者:
黄玲(1975—),女,教授,硕士研究生导师,Email:huangling@hrbust.edu.cn
更新日期/Last Update: 2020-10-14