Document Type

Report

Date

11-2-2009

Embargo Period

9-1-2010

Keywords

Discretization, Clustering, Binning, Supervised Learning

Language

English

Disciplines

Computer and Systems Architecture | Computer Engineering

Description/Abstract

We address the problem of discretization of continuous variables for machine learning classification algorithms. Existing procedures do not use interdependence between the variables towards this goal. Our proposed method uses clustering to exploit such interdependence. Numerical results show that this improves the classification performance in almost all cases. Even if an existing algorithm can successfully operate with continuous variables, better performance is obtained if variables are first discretized. An additional advantage of discretization is that it reduces the overall time-complexity.

Additional Information

SYR-EECS-2009-03

Source

local input

 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.