Sensitivity of missing values in classification tree for large sample

Hasan, Norsida; Adam, Mohd Bakri; Mustapha, Norwati; Abu Bakar, Mohd Rizam
May 2012
AIP Conference Proceedings;5/22/2012, Vol. 1450 Issue 1, p374
Academic Journal
Missing values either in predictor or in response variables are a very common problem in statistics and data mining. Cases with missing values are often ignored which results in loss of information and possible bias. The objectives of our research were to investigate the sensitivity of missing data in classification tree model for large sample. Data were obtained from one of the high level educational institutions in Malaysia. Students' background data were randomly eliminated and classification tree was used to predict students degree classification. The results showed that for large sample, the structure of the classification tree was sensitive to missing values especially for sample contains more than ten percent missing values.


Related Articles

  • The characteristics of probability distribution of groundwater model output based on sensitivity analysis. Xiankui Zeng; Jichun Wu; Dong Wang; Xiaobin Zhu // Journal of Hydroinformatics;Jan2014, Vol. 16 Issue 1, p130 

    The probability distribution of groundwater model output is the direct product of modeling uncertainty. In this work, we aim to analyze the probability distribution of groundwater model outputs (groundwater level series and budget terms) based on sensitivity analysis. In addition, two sources of...

  • SUPERVISED PATTERN RECOGNITION WITH POTENTIAL FUNCTIONS METHOD. Ruxanda, Gheorghe // Economic Computation & Economic Cybernetics Studies & Research;2009, Vol. 43 Issue 2, p1 

    The article focuses on a study which provides an analysis of the potential functions method as supervised pattern recognition or nonlinear method. The study analyzes the theoretical fundamentals of the potential functions method by describing the constructing modality and the physical nature of...

  • Analyzing system safety and risks under uncertainty using a bow-tie diagram: An innovative approach. Ferdousa, Refaul; Khana, Faisal; Sadiqb, Rehan; Amyottec, Paul; Veitcha, Brian // Process Safety & Environmental Protection: Transactions of the I;Jan-Mar2013, Vol. 91 Issue 1/2, p1 

    A bow-tie diagram combines a fault tree and an event tree to represent the risk control parameters on a common platform for mitigating an accident. Quantitative analysis of a bow-tie is still a major challenge since it follows the traditional assumptions of fault and event tree analyses. The...

  • A new probabilistic coverage model for ambulances deployment with hypercube queuing approach. Davoudpour, H.; Mortaz, E.; Hosseinijou, S. // International Journal of Advanced Manufacturing Technology;Feb2014, Vol. 70 Issue 5-8, p1157 

    Aiming at improving the efficiency and reliability of ambulance service, several location models for ambulance stations have been proposed in the facility location literature. Two well-known approaches to this problem are coverage and median models. Coverage model looks for the location to...

  • Influence diagnostics for elliptical semiparametric mixed models. Ibacache-Pulgar, Germán; Paula, Gilberto A; Galea, Manuel // Statistical Modelling: An International Journal;Jun2012, Vol. 12 Issue 3, p165 

    In this paper we extend semiparametric mixed linear models with normal errors to elliptical errors in order to permit distributions with heavier and lighter tails than the normal ones. Penalized likelihood equations are applied to derive the maximum penalized likelihood estimates (MPLEs) which...

  • Mining for profits in modern markets. WHALEY, WAYNE // Futures: News, Analysis & Strategies for Futures, Options & Deri;Sep2011, Vol. 40 Issue 9, p42 

    The article discusses the importance of data mining for traders to earn profits in the U.S. modern markets. It notes that data mining is considered as a potentially profitable method capable of detecting industrial inefficiencies. This can be initiated using a simple indicator such as a 10-day...

  • A comparison of two bivariate extreme value distributions. S. Yue; C. Y. Wang // Stochastic Environmental Research & Risk Assessment;Apr2004, Vol. 18 Issue 2, p61 

    There are two distinct bivariate extreme value distributions constructed from Gumbel marginals, namely Gumbel mixed (GM) model and Gumbel logistic (GL) model. These two models have completely different structures and their dependence ranges are different. The product-moment correlation...

  • Accounting of prior information in paramacrosystems. Dorofeev, D.; Zon, B.; Popkov, Yu. // Automation & Remote Control;Oct2008, Vol. 69 Issue 10, p1663 

    The work is devoted to the technique for constructing the probability characteristics of macrostates of paramacrosystems in the cases when microstates are nonequiprobable. The comparison is performed of the probability distribution functions of macrostates for the cases of equal and unequal...

  • The Characteristic Analysis of Weak Current Sensor. Su Chengzhi; Qiao Kuipu; Xu Yudong; Wang Enguo; Li Zhenhui // Modern Applied Science;Apr2012, Vol. 6 Issue 4, p63 

    The electrical equipment loss caused by the lightning and power's increasing demand is reduced by monitoring the extent of the surge protector failure through measuring its leakage current on-line. Proposed a method of monitoring the surge protector's failure degree online based on mutual...


Read the Article


Sorry, but this item is not currently available from your library.

Try another library?
Sign out of this library

Other Topics