[1]张春祥,赵凌云,高雪瑶. 结合词形词性和译文的汉语词义消歧[J].哈尔滨理工大学学报,2020,25(03):131-136.[doi:1015938/jjhust202003020]
 ZHANG Chun xiang,ZHAO Ling yun,GAO Xue yao. Chinese Word Sense Disambiguation Based on Wordtranslation and Partofspeech[J].哈尔滨理工大学学报,2020,25(03):131-136.[doi:1015938/jjhust202003020]
点击复制

 结合词形词性和译文的汉语词义消歧()
分享到:

《哈尔滨理工大学学报》[ISSN:1007-2683/CN:23-1404/N]

卷:
25
期数:
2020年03期
页码:
131-136
栏目:
计算机与控制工程
出版日期:
2020-06-25

文章信息/Info

Title:
 Chinese Word Sense Disambiguation Based on Wordtranslation and Partofspeech
文章编号:
1007-2683(2020)03-0131-06
作者:
 张春祥1赵凌云2高雪瑶2
 
1.哈尔滨理工大学 软件与微电子学院,哈尔滨 150080; 2.哈尔滨理工大学 计算机科学与技术学院,哈尔滨 150080)
Author(s):
 ZHANG Chunxiang1ZHAO Lingyun2GAO Xueyao2
 
(1.School of Software and Microelectronics, Harbin University of Science and Technology, Harbin 150080, China;2.School of Computer Science and Technology, Harbin University of Science and Technology, Harbin 150080, China)
关键词:
 关键词:词汇歧义卷积神经网络词汇单元消歧特征词义消歧
Keywords:
 Keywords:vocabulary ambiguity convolution neural network lexical unit disambiguation feature word sense disambiguation
分类号:
TP3912
DOI:
1015938/jjhust202003020
文献标志码:
A
摘要:
 摘要:针对汉语中存在的词汇歧义问题,根据左右邻接词汇的词形、词性和译文信息,采用卷积神经网络(convolution neural network,CNN)来确定它的真实含义。选取歧义词汇的消歧词窗,共包含两个邻接词汇单元,抽取其词形、词性和译文作为消歧特征。以消歧特征为基础,结合卷积神经网络来构建词义消歧分类器。利用SemEval-2007: Task#5的训练语料和哈尔滨工业大学语义标注语料来优化CNN的参数。采用SemEval-2007: Task#5的测试语料对词义消歧分类器进行测试。实验结果表明:相对于贝叶斯(Bayes)模型和BP神经网络(BP neural network)而言,本文所提出方法的消歧平均准确率分别提高了14.94%和6.9%。
Abstract:
 Abstract:For vocabulary ambiguity problem in Chinese, CNN (Convolution Neural Network) is adopted to determine true meaning of ambiguous vocabulary where word, partofspeech and translation around its left and right adjacent words are used We select disambiguation window of ambiguous word which contains two adjacent lexical units and word, partofspeech and translation are extracted as disambiguation features Based on disambiguation features, convolution neural network is used to construct word sense disambiguation (WSD) classifier Training corpus in SemEval-2007: Task#5 and semantic annotation corpus in Harbin Institute of Technology are used to optimize parameters of CNN Test corpus in SemEval-2007: Task#5 is applied to test word sense disambiguation classifier Experimental results show that compared with Bayes model and BP neural network, the proposed method in this paper can make average disambiguation accuracy improve 14.94% and 6.9%

参考文献/References:

 [1]张仰森, 郭江. 四种统计词义消歧模型的分析与比较[J]. 北京信息科技大学学报(自然科学版), 2011, 26(2): 13.
ZHANG Yangsen, GUO Jiang. Analysis and Comparison of 4 Kinds of Statistical Word Sense Disambiguation Models[J]. Journal of Beijing Information Science and Technology University(Natural Science Edition), 2011, 26(2): 13.
[2]SINGH S, SIDDIQUI T J. Role of Karaka Relations in Hindi Word Sense Disambiguation[J]. Journal of Information Technology Research, 2015, 8(3): 21.
[3]赵谦, 荆琪, 李爱萍, 等. 一种基于语义与句法结构的短文本相似度计算方法[J]. 计算机工程与科学, 2018, 40(7): 1287.
ZHAO Qian, JING Qi, LI Aiping, et al. A Short Text Similarity Calculation Method Based on Semantics and Syntax Structure[J]. Computer Engineering & Science, 2018, 40(7): 1287.
[4]杨陟卓. 基于上下文翻译的有监督词义消歧研究[J]. 计算机科学, 2017, 44(4): 252.
YANG Zhizhuo. Supervised WSD Method Based on Context Translation[J]. Computer Science, 2017, 44(4): 252.
[5]李国臣, 吕雷, 王瑞波, 等. 基于同义词词林信息特征的语义角色自动标注[J]. 中文信息学报, 2016, 30(1): 101.
LI Guochen, LV Lei, WANG Ruibo, et al. Semantic Role Labeling Based on TongYiCi CiLin Derived Features[J]. Journal of Chinese Information Processing, 2016, 30(1): 101.
[6]KANG M Y, MIN T H, LEE J S. Sense Space for Word Sense Disambiguation[C]// IEEE International Conference on Big Data and Smart Computing, Shanghai, 2018: 669.
[7]WANG Y, ZHENG K, XU H, et al. Interactive Medical Word Sense Disambiguation Through Informed Learning[J]. Journal of the American Medical Informatics Association, 2018, 25(7): 800.
[8]VIJ S, JAIN A, TAYAL D, et al. Fuzzy Logic for Inculcating Significance of Semantic Relations in Word Sense Disambiguation Using a WordNet Graph[J]. International Journal of Fuzzy Systems, 2018, 20(2): 444.
[9]ABED S A, TIUN S, OMAR N. Word Sense Disambiguation in Evolutionary Manner[J]. Connection Science, 2016, 28(3): 1.
[10]DUQUE A, STEVENSON M, MARTINEZROMO J, et al. Cooccurrence Graphs for Word Sense Disambiguation in the Biomedical Domain[J]. Artificial Intelligence in Medicine, 2018, 1(28): 9.
[11]HUANG Z H, CHEN Y D. An Improving SRL Model With Word Sense Information Using An Improved Synergetic Neural Network Model[J]. Journal of Intelligent & Fuzzy Systems, 2016, 31(3): 1469.
[12]HENDERSON J, POPA D N. A Vector Space for Distributional Semantics for Entailment[C]// Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. Berlin: ACL, 2016, 4(1): 2052.
[13]DURGAPRASAD P, SUNITHA K V N, PADMAJA R B. Contextbased Word Sense Disambiguation in Telugu Using the Statistical Techniques[J]. Advances in Intelligent Systems and Computing, 2017,712(1): 271.
[14]翟继强, 王克奇. 依据TRIZ发明原理的中文专利自动分类[J]. 哈尔滨理工大学学报, 2013, 18(3): 1.
ZHAI Jiqiang, WANG Keqi. Automatic Classification of Chinese Patents According to TRIZ Inventive Principles[J]. Journal of Harbin University of Science and Technology, 2013, 18(3): 1.
[15]谭冠群, 丁华福. 改进的K最近特征线算法在文本分类中的应用[J]. 哈尔滨理工大学学报, 2008, 13(6): 19.
TAN Guanqun, DING Huafu. Improved K Nearest Feature Line Algorithm in Text Categorization[J]. Journal of Harbin University of Science and Technology, 2008, 13(6): 19.
[16]LOPEZAREVALO I, SOSASOSA V J, ROJASLOPEZ F, et al. Improving Selection of Synsets from WordNet for Domainspecific Word Sense Disambiguation[J]. Computer Speech & Language, 2017, 41(1): 128.
[17]SINOARA R A, CAMACHOCOLLADOS J, ROSSI R G, et al. Knowledgeenhanced Document Embeddings for Text Classification[J]. KnowledgeBased Systems, 2018, 163(1): 955.
[18]ABID M, HABIB A, ASHRAF J, et al. Urdu Word Sense Disambiguation Using Machine Learning Approach[J]. Cluster Computing, 2017, 21(1): 515.
[19]GUTIERREZ Y, VAZQUEZ S, MONTOYO A. Spreading Semantic Information By Word Sense Disambiguation[J]. KnowledgeBased Systems, 2017, 132(1): 47.
[20]ADRIAN W T, MANNA M. Navigating Online Semantic Resources for Entity Set Expansion[C]// The 20th International Symposium on Practical Aspects of Declarative Languages, Los Angeles, 2018: 170.

相似文献/References:

[1]孙永全,郭建英,陈洪科,等.AMSAA模型可靠性增长预测方法的改进[J].哈尔滨理工大学学报,2010,15(05):49.
 SUN Yong-quan,GUO Jian-ying,CHEN Hong-ke,et al.An Improved Reliability Growth Prediction Algorithm Based on AMSAA Model[J].哈尔滨理工大学学报,2010,15(03):49.
[2]滕志军,李晓霞,郑权龙,等.矿井巷道的MIMO信道几何模型及其信道容量分析[J].哈尔滨理工大学学报,2012,17(02):14.
 TENG Zhi-jun,LI Xiao-xia,ZHENG Quan-long.Geometric Model for Mine MIMO Channels and Its Capacity Analysis[J].哈尔滨理工大学学报,2012,17(03):14.
[3]李艳苹,张礼勇.新训练序列下的改进OFDM符号定时算法[J].哈尔滨理工大学学报,2012,17(02):19.
 LI Yan-ping,ZHANG Li-yong.An Improved Algorithm of OFDM Symbol Timing Based on A New Training Sequence[J].哈尔滨理工大学学报,2012,17(03):19.
[4]赵彦玲,车春雨,铉佳平,等.钢球全表面螺旋线展开机构运动特性分析[J].哈尔滨理工大学学报,2013,18(01):37.
 ZHAO Yan-ling,CHE Chun-yu,XUAN Jia-ping,et al.[J].哈尔滨理工大学学报,2013,18(03):37.
[5]李冬梅,卢旸,刘伟华,等.一类具有连续接种的自治SEIR传染病模型[J].哈尔滨理工大学学报,2013,18(01):73.
 LI Dong-mei,LU Yang,LIU Wei-hua.[J].哈尔滨理工大学学报,2013,18(03):73.
[6]华秀英,刘文德.奇Hamiltonian李超代数偶部的非负Z-齐次导子空间[J].哈尔滨理工大学学报,2013,18(01):76.
 HUA Xiu-ying,LIU Wen-de.[J].哈尔滨理工大学学报,2013,18(03):76.
[7]桂存兵,刘洋,何业军,等.基于LCC谐振电路阻抗匹配的光伏发电最大功率点跟踪[J].哈尔滨理工大学学报,2013,18(01):90.
 GUI Cun-bing,LIU Yong,HE Ye-jun.[J].哈尔滨理工大学学报,2013,18(03):90.
[8]翁凌,闫利文,夏乾善,等.PI/TiC@Al2O3复合薄膜的制备及其电性能研究[J].哈尔滨理工大学学报,2013,18(02):25.
 WENG Ling,YAN Li-wen,XIA Qian-shan.[J].哈尔滨理工大学学报,2013,18(03):25.
[9]姜彬,林爱琴,王松涛,等.高速铣刀安全性设计理论与方法[J].哈尔滨理工大学学报,2013,18(02):63.
 JIANG Bin,LIN Ai-qin,WANG Song-tao,et al.[J].哈尔滨理工大学学报,2013,18(03):63.
[10]李星纬,李晓东,张颖彧,等.EVOH 磺酸锂电池隔膜的制备及微观形貌[J].哈尔滨理工大学学报,2013,18(05):18.
 LI Xing- wei,LI Xiao- dong,ZHANG Ying- yu,et al.The Preparation and Microcosmic Morphology oEVOH- SO Li Lithium Ion Battery Septum[J].哈尔滨理工大学学报,2013,18(03):18.

备注/Memo

备注/Memo:
 
收稿日期: 2018-09-09
基金项目: 国家自然科学基金(61502124,60903082);中国博士后科学基金(2014M560249);黑龙江省自然科学基金(F2015041,F201420);黑龙江省普通高校基本科研业务费专项资金(LGYC2018JC014)
作者简介:
张春祥(1974—),男,博士,教授;
赵凌云(1991—),女,硕士研究生
通信作者:
高雪瑶(1979—),女,博士,教授,硕士研究生导师,Email:xueyao_gao@163com
更新日期/Last Update: 2020-10-16