An overview for regression tree

AuthorAdem Meta
PositionUniversity 'Ismail Qemali' Vlore, Albania
Pages70-84
Vo
l. 4 N
o
.
3
N
ovem
b
er
,
2018
A
cademic Journal o
f
Business,
A
dministration, Law and Social Science
s
II
PCCL Publishin
g
, Graz-
A
ustri
a
I
SSN 2410-391
8
A
cces online at www.ii
p
ccl.or
g
7
0
An overview for regression tree
PhD
(
C.
)
Adem Meta
niversity “
smail Qemali” Vlore,
lbania
Abs
tr
ac
t
C
lassi cation and regression tree is a non-parametric methodology. C
A
RT is a methodology
t
hat divides populations into meaningful subgroups which will allow the identi cation of
g
roups of interest. C
A
RT is classi cation method which uses a large data to construct decision
t
rees. Depending on available information about the dataset, classi cation tree or regression
t
ree can be constructed. The rst part of the paper describes fundamental principles of tree
c
onstruction, di erent spli ing algorithms, and pruning procedures. Second part of the paper
a
nswers the questions why should we use or should not use the C
A
RT method.
A
dvantages
a
nd weaknesses of the method are discussed and tested in detail.
I
n the last part, C
A
RT is
a
pplied to real data, using the statistical so ware R.
O
n this paper some graphical and plo ing
t
ools are presented. The Regression Tree is a classi cation tree with a continuous dependent
v
ariable in which independent variables receive continuous values or discrete values with
a
n error prediction that is computed with squares of di erences between the observed and
predicted values.
A
t the beginning of this paper, some basic principles are presented in the
c
onstruction of a classi cation
/
regression tree, while a er that a detailed description of the
d
istribution is made, makin
g
the theoretical
g
eneralization, and
g
ivin
g
in detail the al
g
orithms
t
hat are used in distribution as well as using a concrete examples, where di erent algorithms
a
pply to distribute to a database.
A
t the end are detailed analyses for regression trees by
m
aking appropriate theoretical generalizations and providing detailed information on how to
use di erent algorithms to prune the overcrowded tree to reach to the nal tree.
Ke
y
words
:
re
g
ression tree, overview, classi cation.
I
ntr
oduc
t
ion
The main idea behind tree methods is to recursivel
y
p
artition the data into smaller
and smaller strata in order to im
p
rove the
t as best as
p
ossible. The
y
p
artition the
s
am
p
le s
p
ace into a set o
f
rectangles and
t a model in each one. The sam
p
le s
p
ace
i
s originall
y
s
p
lit into two regions. The o
p
timal s
p
lit is
f
ound over all variables at all
p
ossible s
p
lit
p
oints. For each o
f
the two regions created this
p
rocess is re
p
eated again.
T
he major com
p
onents o
f
the C
A
RT methodolog
y
are the selection and sto
pp
ing rules.
T
he selection rule determines which strati
cation to
p
er
f
orm at ever
y
stage and the
s
to
pp
ing rule determines the
nal strata that are
f
ormed.
O
nce the strata have been
c
reated the im
p
urit
y
o
f
each stratum is measured. The heterogeneit
y
o
f
the outcome
c
ategories within a stratum is re
f
erred to as “node im
p
urit
y
”. Classi
cation trees are
e
m
p
lo
y
ed when the outcome is categorical and regression trees are em
p
lo
y
ed when
the outcome is continuous. Classi
cation trees can take most
f
orms o
f
cate
g
orical
v
ariables includin
g
indicator, ordinal and non-ordinal variables and are not limited to
the anal
y
sis o
f
categorical outcomes with two categories. There are three commonl
y
u
sed measures
f
or node im
p
urit
y
in classi
cation trees. The
y
are misclassi
cation

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT