基于DSM的知识约简方法研究 Data Reduction Based on DSM 江昊 JIANG Hao 1 2 first-author 晏蒲柳 YAN Pu-liu 1 2 武汉大学电子信息学院 武汉大学电子信息学院 School of Electronic Information, Wuhan University School of Electronic Information, Wuhan University 武汉大学电子信息学院,湖北,武汉,430079 武汉大学电子信息学院,湖北,武汉,430079 根据对象属性的差异性与相似性 ,以及对DSM(difference similitudematrix)矩阵元素mdij,msij的特性分析 ,定义了属性的重要度和合并度 ,给出了最佳属性约简集的修正子集的求解方法 ,从而提出了基于DSM的知识约简方法 ,该方法能在保证规则相容的情况下生成少量规则 ,同时只使用部分条件属性 .通过约简UCI机器学习数据库 ,并与粗集理论约简的结果比较 ,表明了该方法的合理性和有效性 ,并在约简效率和规则的正确率上都要好于粗集理论 . By defining the significance and the uniformity of the attributes, and analyzing the elements md ij &s ij in DSM, the important principle of the optimization knowledge reduction and a new data reduction method are put forward.The method can reduce the superfluous data while preserving the consistency of classifications. This data reduction method based on DSM is employed to analyze databases from UCI reposity. Through comparing the reducing result of DSM method and Rough set theory method, it show that DSM method can obtain higher reduction rate of instances. The DSM method is effective in reducing information systems with its higher validity by using leave-one-out’ to examine. 数据约简 差异相似矩阵 粗集理论 UCI数据库 data reduction DSM (difference similitude matrix) Rough set theory UCI database TP311.13 国家自然科学基金资助项目 ( 90 2 0 40 0 8) 2003-03-01 2021-04-01 3