Electrical Engineering and Computer Science - All Scholarship

CLOUDS: A Decision Tree Classifier for Large Datasets

Document Type

Working Paper

Date

1998

Keywords

classification, decision trees, data mining, large datasets, sampling, estimation, Gini index, CLOUDS

Language

English

Disciplines

Computer Sciences

Description/Abstract

Classification for very large datasets has many practical applications in data mining. Techniques such as discretization and dataset sampling can be used to scale up decision tree classifiers to large datasets. Unfortunately, both of these techniques can cause a significant loss in accuracy. We present a novel decision tree classifier called CLOUDS, which samples the splitting points for numeric attributes followed by an estimation step to narrow the search space of the best split. CLOUDS reduces computation and I/O complexity substantially compared to state of the art classifiers, while maintaining the quality of the generated trees in terms of accuracy and tree size. We provide experimental results with a number of real and synthetic datasets.

Recommended Citation

Alsabti, Khaled; Ranka, Sanjay; and Singh, Vineet, "CLOUDS: A Decision Tree Classifier for Large Datasets" (1998). Electrical Engineering and Computer Science - All Scholarship. 41.
https://surface.syr.edu/eecs/41

Download

Included in

Computer Sciences Commons

COinS

Electrical Engineering and Computer Science - All Scholarship

CLOUDS: A Decision Tree Classifier for Large Datasets

Document Type

Date

Keywords

Language

Disciplines

Description/Abstract

Recommended Citation

Included in

Browse

Search

Author Resources

Links

Electrical Engineering and Computer Science - All Scholarship

CLOUDS: A Decision Tree Classifier for Large Datasets

Author(s)/Creator(s)

Document Type

Date

Keywords

Language

Disciplines

Description/Abstract

Recommended Citation

Included in

Share

Browse

Search

Author Resources

Links