General distance formula estimation of population total for unequal probability sampling designs with auxiliary variables

Sampling is a process or technique to obtain statistical information about a finite population by selecting a representative sample from that population, by using an appropriate sampling design. Furthermore, in the process, the required information about the units in the sample is measured and the i...

Full description

Saved in:
Bibliographic Details
Main Author: Ibrahim, Ibrahim Elabid
Format: Thesis
Language:English
Published: 2021
Subjects:
Online Access:http://eprints.utm.my/id/eprint/101825/1/IbrahimElabidIbrahimPhDFS2021.pdf.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
id my-utm-ep.101825
record_format uketd_dc
institution Universiti Teknologi Malaysia
collection UTM Institutional Repository
language English
topic QA Mathematics
spellingShingle QA Mathematics
Ibrahim, Ibrahim Elabid
General distance formula estimation of population total for unequal probability sampling designs with auxiliary variables
description Sampling is a process or technique to obtain statistical information about a finite population by selecting a representative sample from that population, by using an appropriate sampling design. Furthermore, in the process, the required information about the units in the sample is measured and the inference about the unknown population parameters such as means, totals and proportions are done. This study is focused on estimating an unknown population total for one target variable using single or multiple auxiliary variables correlated with the target variable. This study also explores two classical estimators, namely the ratio estimator and the linear regression estimator, which are used as an alternative to the Horvitz Thompson estimator in the presence of a single auxiliary variable to estimate an unknown population total. The theoretical and empirical aspects were used to compare between these two estimators. The comparison was carried out based on the sample size and the correlation coefficient between the target variable and the auxiliary variable. The empirical study using the secondary data set for small and medium sample sizes shows that the linear regression estimator is more efficient compared to the ratio estimator when the correlation coefficient of the two variables is positive. For a large sample sizes, there are no significant differences between the two estimators. Also, the variance of both estimators decreases when the sample size increases. In contrast, if the correlation coefficient is negative, then any increase in the sample size leads to significant decrease in the variance estimate of the linear regression estimator. Meanwhile, for the ratio estimator, as the sample size is considerably increased, the variance of the estimator decreases. The simulation study showed that when the variable of interest has a strong negative correlation with the auxiliary variable irrespective of the sample size, the linear regression estimator provides an efficient estimate for the unknown population total relative to the ratio estimator. While, if the correlation coefficient between the variable of interest and the auxiliary variable is positive and within the range [0.75, 1], then the two estimators give a better estimate for the population total compared to the conventional estimators. However, the estimate of the total population obtained by the linear regression estimator is slightly more efficient than the ratio estimator. The most important idea in the estimation by using minimum distance measures is the quantification of the degree of closeness between the two data sets, such as sample data and the parametric distribution depends on an unknown parameter. A general distance formula is suggested in this research, based on the concept of the power divergence function, rather than that used by Deville and Särndal to measure the degree of closeness between the calibrated weights (new weights) and the classical design weights in Horvitz Thompson estimator. Derivation of the proposed general distance formula involved adding another constraint to the calibration equation constraints with respect to the sum of the classical sample design weights and the sum of sample calibrated weights. In order to generate a variety of distance measurements, the proposed formula was used to obtain a set of new weights that could be used to construct new estimators based on the inverse functions created by the proposed formula for estimating the total unknown population. Finally, the problems associated with calibrated weights produced by some distance measures, such as unrealistic or extreme weights are examined, leading to inaccurate estimates when these weights were handled instead of the design weights.
format Thesis
qualification_name Doctor of Philosophy (PhD.)
qualification_level Doctorate
author Ibrahim, Ibrahim Elabid
author_facet Ibrahim, Ibrahim Elabid
author_sort Ibrahim, Ibrahim Elabid
title General distance formula estimation of population total for unequal probability sampling designs with auxiliary variables
title_short General distance formula estimation of population total for unequal probability sampling designs with auxiliary variables
title_full General distance formula estimation of population total for unequal probability sampling designs with auxiliary variables
title_fullStr General distance formula estimation of population total for unequal probability sampling designs with auxiliary variables
title_full_unstemmed General distance formula estimation of population total for unequal probability sampling designs with auxiliary variables
title_sort general distance formula estimation of population total for unequal probability sampling designs with auxiliary variables
granting_institution Universiti Teknologi Malaysia
granting_department Faculty of Science
publishDate 2021
url http://eprints.utm.my/id/eprint/101825/1/IbrahimElabidIbrahimPhDFS2021.pdf.pdf
_version_ 1776100780570312704
spelling my-utm-ep.1018252023-07-10T09:32:53Z General distance formula estimation of population total for unequal probability sampling designs with auxiliary variables 2021 Ibrahim, Ibrahim Elabid QA Mathematics Sampling is a process or technique to obtain statistical information about a finite population by selecting a representative sample from that population, by using an appropriate sampling design. Furthermore, in the process, the required information about the units in the sample is measured and the inference about the unknown population parameters such as means, totals and proportions are done. This study is focused on estimating an unknown population total for one target variable using single or multiple auxiliary variables correlated with the target variable. This study also explores two classical estimators, namely the ratio estimator and the linear regression estimator, which are used as an alternative to the Horvitz Thompson estimator in the presence of a single auxiliary variable to estimate an unknown population total. The theoretical and empirical aspects were used to compare between these two estimators. The comparison was carried out based on the sample size and the correlation coefficient between the target variable and the auxiliary variable. The empirical study using the secondary data set for small and medium sample sizes shows that the linear regression estimator is more efficient compared to the ratio estimator when the correlation coefficient of the two variables is positive. For a large sample sizes, there are no significant differences between the two estimators. Also, the variance of both estimators decreases when the sample size increases. In contrast, if the correlation coefficient is negative, then any increase in the sample size leads to significant decrease in the variance estimate of the linear regression estimator. Meanwhile, for the ratio estimator, as the sample size is considerably increased, the variance of the estimator decreases. The simulation study showed that when the variable of interest has a strong negative correlation with the auxiliary variable irrespective of the sample size, the linear regression estimator provides an efficient estimate for the unknown population total relative to the ratio estimator. While, if the correlation coefficient between the variable of interest and the auxiliary variable is positive and within the range [0.75, 1], then the two estimators give a better estimate for the population total compared to the conventional estimators. However, the estimate of the total population obtained by the linear regression estimator is slightly more efficient than the ratio estimator. The most important idea in the estimation by using minimum distance measures is the quantification of the degree of closeness between the two data sets, such as sample data and the parametric distribution depends on an unknown parameter. A general distance formula is suggested in this research, based on the concept of the power divergence function, rather than that used by Deville and Särndal to measure the degree of closeness between the calibrated weights (new weights) and the classical design weights in Horvitz Thompson estimator. Derivation of the proposed general distance formula involved adding another constraint to the calibration equation constraints with respect to the sum of the classical sample design weights and the sum of sample calibrated weights. In order to generate a variety of distance measurements, the proposed formula was used to obtain a set of new weights that could be used to construct new estimators based on the inverse functions created by the proposed formula for estimating the total unknown population. Finally, the problems associated with calibrated weights produced by some distance measures, such as unrealistic or extreme weights are examined, leading to inaccurate estimates when these weights were handled instead of the design weights. 2021 Thesis http://eprints.utm.my/id/eprint/101825/ http://eprints.utm.my/id/eprint/101825/1/IbrahimElabidIbrahimPhDFS2021.pdf.pdf application/pdf en public http://dms.library.utm.my:8080/vital/access/manager/Repository/vital:147617 phd doctoral Universiti Teknologi Malaysia Faculty of Science