[1]宋智超,康健,孙广路,等. 特征选择方法中三种度量的比较研究[J].哈尔滨理工大学学报,2018,(01):111-116.[doi:10. 15938 /j. jhust. 2018. 01. 020]
 SONG Zhi-chao,KANG Jian,SUN Guang-lu,et al.The Comparison of Three Measures in Feature Selection[J].哈尔滨理工大学学报,2018,(01):111-116.[doi:10. 15938 /j. jhust. 2018. 01. 020]
点击复制

 特征选择方法中三种度量的比较研究()
分享到:

《哈尔滨理工大学学报》[ISSN:1007-2683/CN:23-1404/N]

卷:
期数:
2018年01期
页码:
111-116
栏目:
材料科学与工程
出版日期:
2018-02-25

文章信息/Info

Title:
The Comparison of Three Measures in Feature Selection
文章编号:
1007- 2683( 2018) 01- 0111- 06
作者:
 宋智超12 康健3 孙广路12 何勇军1
 ( 1. 哈尔滨理工大学计算机科学与技术学院,黑龙江哈尔滨150080;
2. 哈尔滨理工大学信息安全与智能技术研究中心,黑龙江哈尔滨150080;
3. 北京宇航系统工程研究所,北京100076)
Author(s):
SONG Zhi-chao12 KANG Jian3 SUN Guang-lu12 HE Yong-jun12
 ( 1. School of Computer Science and Technology,Harbin University of Science and Technology,Harbin 150080,China;
2. Research Center of Information Security and Intelligent Technology,Harbin University of Science and Technology,Harbin 150080,China;
3. Beijing Institute of Astronautical Systems Engineering,Beijing 100076,China)
关键词:
特征选择 线性相关系数 对称不确定性 互信息 基于相关性的快速特征选择方法
Keywords:
 feature selection linear correlation coefficient symmetrical uncertainty mutual Information fastcorrelation-based filter
分类号:
TM391. 1
DOI:
10. 15938 /j. jhust. 2018. 01. 020
文献标志码:
A
摘要:
 不同类型数据中特征与类别以及特征与特征之间存在一定的线性和非线性相关性。
针对基于不同度量的特征选择方法在不同类型数据集上选取的特征存在明显差别的问题,本文选
择线性相关系数、对称不确定性和互信息三种常用的线性或非线性度量,将它们应用于基于相关性
的快速特征选择方法中,对它们在基因微阵列和图像数据上的特征选择效果进行实验验证和比较。
实验结果表明,基于相关性的快速特征选择方法使用线性相关系数在基因数据集上选取的特征集
往往具有较好分类准确率,使用互信息在图像数据集上选取的特征集的分类效果较好,使用对称不
确定性在两种类型数据上选取特征的分类效果较为稳定。
Abstract:
 It has been known that either linear correlation or nonlinear correlation might exist between featureto-
feature and feature-to-class in datasets. In this paper,we study the differences of selected feature subset when
different kinds of measures are applied with same feature selection method in different kinds of datasets. Three
representative linear or nonlinear measures,linear correlation coefficient,symmetrical uncertainty,and mutual
information are selected. By combining them with the fast correlation-based filter ( FCBF) feature selection
method,we make the comparison of selected feature subset from 8 gene microarray and image datasets.
Experimental results indicate that the feature subsets selected by linear correlation coefficient based FCBF obtain
better classification accuracy in gene microarray datasets than in image datasets,while mutual information and
symmetrical uncertainty based FCBF tend to obtain better results in image datasets. Moreover,symmetrical
uncertainty based FCBF is more robust in all datasets.

相似文献/References:

[1]王乾‘吕亚男,李东红,宋立新.基于关联规则的乳腺肿块多模检索[J].哈尔滨理工大学学报,2017,(02):124.[doi:10.15938/j.jhust.2017.02.023]
 WANGianLLI Ya-naLI DonR--honR-SONG Li-xi.Multimode Retrieval of Mammography Based on Association Rules[J].哈尔滨理工大学学报,2017,(01):124.[doi:10.15938/j.jhust.2017.02.023]

更新日期/Last Update: 2018-05-24