Use of Distribution Algorithms, for the Construction of a Classification and Regression Tree

AuthorAdem Meta
PositionCuyahoga Community College, USA
Pages45-51
Vo
l. 4 N
o
. 2
Jul
y
, 201
8
ISS
N 2410-391
8
A
cces online at www.ii
p
ccl.or
g
45
A
cademic Journal o
f
Business,
A
dministration, Law and Social Sciences
II
PCCL Publishin
g
, Graz-
A
ustria
Use of Distribution Algorithms, for the Construction of a Classi cation and
Regression Tree
M
Sc.
Ade
m M
e
t
a
Cu
ahoga Communit
College,
S
Abstract
One of the most important processes in the construction of the classi cation and regression
trees is the distribution of a given data. There are numerous algorithms for predicting
continuous variables or categorical variables from a set of continuous predictors and/or
categorical factor e ects.
I
n this paper
I
address the problem of learning various types o
f
algorithms to be used to get a optimal decision trees from data base.
I
n particular, we study
online machine learnin
g
al
g
orithms for learnin
g
classi cation and re
g
ression trees, linear
model trees, option trees for regression, multi-target model trees, and ensembles of model
trees from a
g
iven data. Decision tree builds classi cation or re
g
ression models in the form of a
tr
ee
str
u
ct
u
r
e
.
I
t br
ea
ks d
o
wn
a
d
a
t
a
s
e
t
i
nt
o
sm
a
ll
e
r
a
nd sm
a
ll
e
r s
u
bs
e
ts wh
i
l
e
a
t th
e
s
a
m
e
t
i
m
e
an associated decision tree is incrementally developed. The nal result is a tree with decision
n
odes and leaf nodes.
A
decision node has two or more branches. Leaf node represents a
c
lassi cation or decision. The topmost decision node in a tree which corresponds to the best
p
redictor called root node. Decision trees can handle both categorical and numerical data. The
c
ore algorithm for building decision trees called, greedy search through the space of possible
b
r
a
nch
es
uses
E
ntrop
y
a
nd
I
n
f
ormation Gai
n
t
o
c
o
nstr
u
ct
a
d
e
c
i
s
io
n tr
ee
.In this paper, through
a concrete example, I will explicitly look at the use of four algorithms such as the Gini Index,
Chi-Square, Entropy and the Variance Reduction on which node will be the distribution of a
database. Once the data base is small, no doubt that the calculation is much more simple than
in the case of a large database.
Ke
y
words: distribution al
g
orithms, re
g
ression tree, classi cation
.
Introduction
T
h
e
b
asi
c classi cation and regression algorithms are considered to be one of the
best learning methods and used the most. Methods based on the classi cation tree
provide predictive models with very good precision, stability and very ease of
interpretation. They represent non-linear links quite well and are suitable for solving
any classi cation or regression problems. Decision trees use multiple algorithms to
decide when to split a node into two or more sub-nodes. The creation of subunits
increases the homogeneity of the resulting subunits. Thus, the purity of the node
increases with respect to the target variable. The crucial tree divides the nodes into all
available variables and then selects the resulting partition with the most homogeneous
sub nodes.
The choice of algorithms is also based on the type of responsible variables. Let’s look
at the four most used algorithms in the decision tree using the following example:
Let’s take a class of 36 students with three variables Gender (male/female), Class (XI
/ XII) and Height (160 cm up to 180 cm, (160,170) and (170,180)), 18 of which play
basketball at leisure. We want to create a model to predict who will play basketball

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT