[1]姚登举詹晓娟,张晓晶‘.一种加权K一均值基因聚类算法[J].哈尔滨理工大学学报,2017,(02):112-116.[doi:10.15938/j.jhust.2017.02.021]
 YAO DeuR--juZHAN Xiao-juauZZHANG Xiao-jiuR-.A Weighted K一means Gene Clustering Algorithm[J].哈尔滨理工大学学报,2017,(02):112-116.[doi:10.15938/j.jhust.2017.02.021]
点击复制

一种加权K一均值基因聚类算法()
分享到:

《哈尔滨理工大学学报》[ISSN:1007-2683/CN:23-1404/N]

卷:
期数:
2017年02期
页码:
112-116
栏目:
计算机与控制工程
出版日期:
2017-04-25

文章信息/Info

Title:
A Weighted K一means Gene Clustering Algorithm
文章编号:
1007-2683(2017)02-0112-05
作者:
姚登举 詹晓娟2张晓晶‘
(}哈尔滨理工大学软件学院,黑龙江哈尔滨1s0040; 2黑龙江工程学院计算机科学与技术学院,黑龙江哈尔滨lsooso}
Author(s):
YAO DeuR--ju ZHAN Xiao-juauZ ZHANG Xiao-jiuR-
(1. School of Software 日arhin I」 nicarsitv of Seienee 2. College of Computer Science and’fechnolo}y, lleilon}jian} and ’technology , llarbin 150040,China; lnstitute of’fechnolo}y, llarbin 150010,China)
关键词:
关键词:微阵列表达数据聚类分析随机森林K一均值
Keywords:
Keywords:microarray expression dataclustering analysisrandom forestK-means
DOI:
10.15938/j.jhust.2017.02.021
文献标志码:
A
摘要:
摘要:针对微阵列表达数据集中基因一基因之间存在复杂相关关系的问题,基于随机森林变 量重要性分数,提出了一种新的加权K一均值基因聚类算法。首先,以微阵列表达数据中的样本为 对象、基因为特征,训练随机森林分类器,计算每个基因的变量重要性分数;然后,以基因为对象、样 本为特征、基因的变量重要性分数为权重进行K一均值聚类。在Leukemia , Breast , DLBCL等3个微 阵列表数据集上进行了实验,结果表明:所提出的加权K一均值聚类算法与原始的K一均值聚类算 法相比,类间距离与总距离的比值平均高出17. 7个百分点,具有更好的同质性和差异性。
Abstract:
Abstract:In view of the complex correlation between gene and gene in the microarray data set,a weighted K- mean gene clustering algorithm based on random forest variable importance score was proposed. First,the proposed algorithm begins with training random forest classifier on the microarray data,using the samples as objects and the genes as features,variable importance scores were calculated for each gene;then,a weighted K-means clustering were performed with genes as objects,samples as features,and variable importance score as weighted value. Experiments were carried out on Leukemia,Breast and DLBCL three datasets. The experimental results show that the proposed weighted K一mean clustering algorithm has an average of 17. 7 percentage points higher than the original K一mean clustering algorithm with respective to the ratio of the distance between the class and the total distance and has better homogeneity and difference.

相似文献/References:

[1]宋加升,陈琰.改进的K-Means聚类算法在保险客户信用分析中的算法实现[J].哈尔滨理工大学学报,2009,(01):116.
 SONG Jia-sheng,CHEN Yan.Algorithm Realization of Improved K-Means Clustering Algorithm in Credit Analysis of Policyholders[J].哈尔滨理工大学学报,2009,(02):116.
[2]刘帅,林克正,孙旭东,等.基于聚类的SIFT人脸检测算法[J].哈尔滨理工大学学报,2014,(01):31.

备注/Memo

备注/Memo:
收稿日期: 基金项目: 作者简介: 通信作者: 2016一12一08 黑龙江省教育厅2014年度科学技术研究面上项目(12541124). 詹晓娟(1978-),女,硕士,讲师; 张晓晶(1981一),女,硕士,副教授. 姚矜举(1980-),男,博士,副教授,L,-mail; ydkvictory 163. com
更新日期/Last Update: 2017-06-13