NOVEL EVALUATION INDEX OF CROSS-SCALE DISCRETIZATION UNCERTAINTY BASED ON LOCAL STANDARD SCORE
Optimal discretization of continuously valued attributes is an uncertainty problem. The uncertainty of discretization is propagated and accumulated in the process of data mining, which has a direct influence on the usability and operation of the output results for mining. To address the limitations of existing discretization evaluation indices in describing accuracy and operation efficiency, this work suggests a discretization uncertainty index based on individuals. This method takes the local standard score as the general similarity measure in and between the intervals and evaluates discretization reliability according to the relative position of individuals in each interval. The experiment shows the new evaluation index is consistent with commonly used metrics. Under the premise of guaranteeing the validity of discrete evaluation, the proposed method has greater description accuracy and operation efficiency than extant approaches; it also has more advantages for massive data processing and special distribution detection.