[1]温雪岩,赵丽影,徐克生,等.改进的MDSMOTE与FCSVM在不平衡数据集分类中的应用[J].哈尔滨理工大学学报,2018,(04):87-94.[doi:10.15938/j.jhust.2018.04.016]
 WEN Xue yan,ZHAO Li ying,XU Ke sheng,et al. Application of Improved MDSMOTE and FCSVM in Imbalanced Data Set Classification[J].哈尔滨理工大学学报,2018,(04):87-94.[doi:10.15938/j.jhust.2018.04.016]
点击复制

改进的MDSMOTE与FCSVM在不平衡数据集分类中的应用
()
分享到:

《哈尔滨理工大学学报》[ISSN:1007-2683/CN:23-1404/N]

卷:
期数:
2018年04期
页码:
87-94
栏目:
计算机与控制工程
出版日期:
2018-08-25

文章信息/Info

Title:
 Application of Improved MDSMOTE and FCSVM in Imbalanced Data Set Classification
作者:
温雪岩1赵丽影1徐克生2陆光1
1东北林业大学 信息与计算机工程学院,黑龙江 哈尔滨 150040;
2国家林业局哈尔滨林业机械研究所,黑龙江 哈尔滨 150086
Author(s):
 WEN Xueyan1ZHAO Liying1XU Kesheng2LU Guang1
1School of Information and Computer Engineering, Northeast Forestry University, Heilongjiang, Harbin 150040, China;
2State Forestry Administration, Harbin Forestry Machinery Research Institute, Heilongjiang, Harbin 150086, China
关键词:
关键词:不均衡数据集支持向量机SMOTE算法文本分类
Keywords:
Keywords:imbalanced data sets support vector machines SMOTE algorithm text categorization
分类号:
TP311
DOI:
10.15938/j.jhust.2018.04.016
文献标志码:
A
摘要:
 摘要:针对于MDSMOTE算法在生成部分新样本时没有将错分样本纳入其中的问题,将对错分样本修正的方法加入到现有的MDSMOTE算法中,提高样本的质量;对于传统FSVM在对不平衡数据集分类时,不能解决超平面偏向少数类的问题,将正负惩罚系数、模糊因子加入到FSVM中,提高不平衡数据的识别率。将改进的算法用于京东网购评语数据集分类中,该算法的分类性能较其他算法平均提升了913%,表明了该方法的可行性和有效性,具有实际应用价值。
Abstract:
Abstract:On the network shopping evaluation data sets appear the phenomenon of extreme imbalance, in order to improve the classification accuracy of the unbalanced data set, It should be improved from both the sample and the algorithm For one of the problem in MDSMOTE algorithm that when generating part of the new samples, wrong points sample can′t be contained, the correct classification of the wrongly classified sample is added to the existing MDSMOTE algorithm to improve the quality of the samples For that we can′t solve the problem of the hyper plane bias of the minority class in traditional FSVM on imbalanced data sets classification, positive and negative penalty coefficient and fuzzy factor are added the FSVM to improve the recognition rate of unbalanced data The improved algorithm is used in the classification of JingDong online shopping commentary data set The fmeasure value of this algorithm is increased by 913% on average, which indicates the feasibility and effectiveness of this method

参考文献/References:

 [1]杨燕大学生网购现状的调查分析[J]. 江苏商论, 2017(12):189-190
[2]黄湘玲浅析互联网冲击下的实体商铺生存之路[J]. 江苏商论, 2017(1):44-45
[3]张文东, 吕扇扇, 张兴森基于改进BP神经网络的非均衡数据分类算法[J]. 计算机系统应用, 2017, 26(6):153-156
[4]WANG Q, LUO Z, HUANG J, et al A Novel Ensemble Method for Imbalanced Data Learning: Bagging of ExtrapolationSMOTE SVM[J].Computational Intelligence and Neuroscience,2017.
[5]姚宇, 董本志, 陈广胜一种改进的朴素贝叶斯不平衡数据集分类算法[J]. 黑龙江大学自然科学学报, 2015, 32(5):681-686
[6]HELAL M A, HAYDAR M S, MOSTAFA S A M Algorithms Efficiency Measurement on Imbalanced Data Using Geometric Mean and Cross Validation[C]// International Workshop on Computational Intelligence IEEE, 2017
[7]MARTINA F, BECCUTI M, BALBO G, et al Peculiar Genes Selection: A New Features Selection Method to Improve Classification Performances in Imbalanced Data Sets[J]. Plos One, 2017, 12(8):528-533.
[8]沈乐阳生物信息学中的不平衡学习新方法研究[D]. 南京:南京理工大学, 2017
[9]刘东启,陈志坚面向不平衡数据分类的复合SVM 算法研究[J].计算机应用研究,2017(4):1023-1027.
[10]KG*2〗TANG Y, ZHANG Y Q, CHAWLA N V, et al SVMs Modeling for Highly Imbalanced Classification[J]. IEEE Transactions on Systems Man & Cybernetics Part B Cybernetics A Publication of the IEEE Systems Man & Cybernetics Society, 2009, 39(1):281
[11]KG*2〗TANG B, HE H KernelADASYN: Kernel Based Adaptive Synthetic Data Generation for Imbalanced Learning[C]// Evolutionary Computation IEEE, 2015:664-671
[12]KG*2〗赵清华,张艺豪改进SMOTE 的非平衡数据集分类算法研究[J].计算机工程与应用,2017(8)
[13]KG*2〗KUDOH T, MATSUMOTO Y Chunking with Support Vector Machines[J]. Journal of Natural Language Processing, 2002, 9(107):3-21
[14]KG*2〗VEROPOULOS K, CAMPBELL C, CRISTIANINI N Controlling the Sensitivity of Support Vector Machines[C]// International Joint Conference on Ai, 1999:55-60
[15]KG*2〗衣柏衡 基于灰色关联度与改进SMOTE的支持向量机建模与应用[D]. 南京:南京航空航天大学, 2016
[16]KG*2〗衣柏衡, 朱建军, 李杰 基于改进SMOTE的小额贷款公司客户信用风险非均衡SVM分类[J]. 中国管理科学, 2016, 24(3):24-30
[17]KG*2〗关玉萍, 宋立新 基于支持向量机决策树的驾驶员眼睛状态检测[J]. 哈尔滨理工大学学报, 2010, 15(6):5-8
[18]KG*2〗李岩, 杜永斌, 宋海丰,等 ECT系统轮换对称SVM图像重建改进算法[J]. 哈尔滨理工大学学报, 2015, 20(3):40-44
[19]KG*2〗张桂香, 费岚, 杜喆,等 非均衡数据的去噪模糊支持向量机新方法[J]. 计算机工程与应用, 2008, 44(16):142-144
[20]KG*2〗HANG J, ZHANG J, CHENG M Application of Multiclass Fuzzy Support Vector Machine Classifier for Fault Diagnosis of Wind Turbine[M]. Elsevier NorthHolland, Inc, 2016
[21]KG*2〗段薇,路向阳 基于代价敏感支持向量机的银行信用风险评估模型[J]. 江西师范科技大学学报, 2015(12): 77-78
[22]KG*2〗张玉, 莫寒, 张烈平 基于模糊支持向量机的光伏发电量预测[J]. 热力发电, 2017, 46(1):116-120

相似文献/References:

[1]张新闻,周春燕,李学生,等.优化核参数的SVM在电能质量扰动分类中的应用[J].哈尔滨理工大学学报,2011,(03):50.
 ZHANG Xin-wen,ZHOU Chun-yan,LI Xue-sheng.Recognition of Power Quality Disturbances Based on Support Vector Machine with Optimal Kernel-parameter[J].哈尔滨理工大学学报,2011,(04):50.
[2]孙永倩,王培东.基于支持向量机的并行CT图像分割方法[J].哈尔滨理工大学学报,2013,(03):42.
 SUN Yong-qian,WANG Pei-dong.[J].哈尔滨理工大学学报,2013,(04):42.
[3]黄英来,孙晓芳,刘镇波,等.微博转发预测算法评测系统的建立及性能比较[J].哈尔滨理工大学学报,2013,(04):52.
[4]李岩,杜永斌,宋海丰,等.ECT系统轮换对称svM图像重建改进算法[J].哈尔滨理工大学学报,2015,(03):40.
 LI Yan .DU Yong-bin .SONG Hai-fence.MAN Zhi-qian}.REN Xiang-h ua.Improved Method of Electrical Capacitance Tomography Based onSVM Algorithm of Cyclic Symmetrical Partition[J].哈尔滨理工大学学报,2015,(04):40.
[5]柳长源,张付浩,韦琦. 基于脑电信号的癫痫疾病智能诊断与研究[J].哈尔滨理工大学学报,2018,(03):91.[doi:10.15938/j.jhust.2018.03.016]
 LIU Chang yuan,ZHANG Fu hao,WEI Qi. Intelligent Diagnosis and Research of Epileptic Diseases Based on EEG Signals[J].哈尔滨理工大学学报,2018,(04):91.[doi:10.15938/j.jhust.2018.03.016]

备注/Memo

备注/Memo:
基金项目:国家重点研发计划资助(2016YFD0702105);黑龙江自然科学基金(F201201)
更新日期/Last Update: 2018-10-25