CUI XIAOJUN1, XIAO HONGYU1, DING LIXIN2. Distance-Based Adaptive Record Matching for Web Databases. [J]. 2012, 58(1): 89-94. DOI: 10.14188/j.1671-8836.2012.01.018.
One of the important steps of Deep Web information integration is identifying duplicate records over multiple Web databases.Due to the features such as query-dependency
the lack of training samples
and the online processing requirements
most state-of-the-art record matching methods are not applicable for the Web database scenario.Based on the analysis of the existing methods
an adaptive distance-based record matching method is proposed by introducing the idea of dynamic attributes’ weights adjustment.In the iterative process of the calculation for the similarity of records
the weight of each attribute is dynamically recalculated by means of increasing the weights of the attributes with the bigger similarity in the matching records set and increasing the weights of the attributes with the smaller similarity in the non-matching records set.The proposed method does not require training data as well as human efforts and the experimental results show that it works well for the Web database scenario.
关键词
Web数据库记录匹配实体识别比较向量权重向量
Keywords
Web databasesrecord matchingentity identificationcomparison vectorweight vector