design and inference in finite population sampling pdf

Design And Inference In Finite Population Sampling Pdf

On Saturday, June 5, 2021 12:16:43 PM

File Name: design and inference in finite population sampling .zip
Size: 24340Kb
Published: 05.06.2021

The role of the sample selection mechanism in a model-based approach to finite population inference is examined. When the data analyst has only partial information on the sample design then a design which is ignorable when known fully may become informative. Conditions under which partially known designs can be ignored are established and examined for some standard designs. The results are illustrated by an example used by Scott

Design and Inference in Finite Population Sampling

There are two general views in causal analysis of experimental data: the super population view that the units are an independent sample from some hypothetical infinite population, and the finite population view that the potential outcomes of the experimental units are fixed and the randomness comes solely from the treatment assignment.

These two views differs conceptually and mathematically, resulting in different sampling variances of the usual difference-in-means estimator of the average causal effect. Practically, however, these two views result in identical variance estimators. By recalling a variance decomposition and exploiting a completeness-type argument, we establish a connection between these two views in completely randomized experiments.

This alternative formulation could serve as a template for bridging finite and super population causal inference in other scenarios. Neyman [ 1 , 2 ] defined causal effects in terms of potential outcomes, and proposed an inferential framework viewing all potential outcomes of a finite population as fixed and the treatment assignment as the only source of randomness.

This finite population view allows for easy interpretation free of any hypothetical data generating process of the outcomes, and is used in a variety of contexts e. This approach is considered desirable because, in particular, it does not assume the data are somehow a representative sample of some larger usually infinite population. Alternative approaches, also using the potential outcomes framework, assume that the potential outcomes are independent and identical draws from a hypothetical infinite population.

Mathematical derivations under this approach are generally simpler, but the approach itself can be criticized because of this typically untenable sampling assumption. Furthermore, this approach appears to ignore the treatment assignment mechanism. That being said, it is well known that the final variance formulae from either approach tend to be quite similar. For the difference in means, the infinite population variance estimate gives a conservative overly large estimate of the finite population variance.

As deriving infinite population variance expressions, relative to finite population variance expressions, tends to be more mathematically straightforward, we might naturally wonder if we could use infinite population expressions as conservative forms of finite population expressions more generally. In this work we show that in fact we can assume an infinite population model as an assumption of convenience, and derive formula from this perspective. This shows that we can thus consider the resulting formula as focused on the treatment assignment mechanism and not on a hypothetical sampling mechanism, i.

Mathematically, this result comes from a variance decomposition and a completeness-style argument characterizing the connection and the difference between these two views. The variance decomposition we use has previously appeared in Imai [ 7 ], Imbens and Rubin [ 18 ], and Balzer et al. The completeness-style argument, which we believe is novel in this domain, then sharpens the variance decomposition by moving from an expression on an overall average relationship to one that holds for any specific sample.

Our overall goal is simple: we wish to demonstrate that if one uses variance formula derived from assuming an infinite population sampling model, then the resulting inference one obtains will be correct with regards to the analogous sample-specific treatment effects although it could be potentially conservative in that the standard errors may be overly large regardless of the existence of any sampling mechanism. Assume that random variables Y 1 , Y 0 represent the pair of potential outcomes of an infinite super population, from which we take an independent and identically distributed IID finite population of size n :.

We first discuss completely randomized experiments, and comment on other experiments in Section 4. At the finite population level, i. The corresponding finite population variances of the potential outcomes and individual causal effects are. In classical casual inference [ 1 ], the potential outcomes of these n experimental units, S , are treated as fixed numbers. Equivalently, we can consider such causal inference to be conducted conditional on S e.

Our primary statistics are the averages of the observed outcomes and the difference-in-means estimator:. We summarize the infinite population, finite population and sample quantities in Table 1. The three levels of quantities in Table 1 are connected via independent sampling and complete randomization.

Neyman [ 1 ], without reference to any infinite population and by using the assignment mechanism as the only source of randomness, represented the assignment mechanism via an urn model, and found.

We next derive this result by assuming a hypothetical sampling mechanism from some assumed infinite super-population model of convenience. This alternative derivation of the above result, which can be extended to other assignment mechanisms, shows how we can interpret formulae based on super-population derivations as conservative formulae for finite-sample inference.

Conditional on S , randomization of the treatment Z is the only source of randomness. Therefore, classical survey sampling theory [ 28 ] for the sample mean and variance gives. We therefore use conditional expectations and conditional variances explicitly.

If we do not condition on S , then the independence induced by the assignment mechanism means the outcomes under treatment are IID samples of Y 1 and the outcomes under control are IID samples of Y 0 , and furthermore these samples are independent of each other.

This is the classic infinite population variance formula for the two sample difference-in-means statistic. We could use it to obtain standard errors by plugging in s 1 2 and s 0 2 for the two variances. The variance decomposition formula implies. Compare to the classic variance expression 1 , which is this without the expectation.

Here we have that on average our classic variance expression holds. Now, because this is true for any infinite population, as it is purely a consequence of the IID sampling mechanism and complete randomization, we can close the gap between eqs 1 and 8. Informally speaking, because eq. Some algebra gives. According to eq. Therefore, from eq. Because eq. Equation 8 relies on the assumption that the hypothetical infinite population exists, but eq. However, the completeness-style argument allowed us to make our sampling assumption only for convenience in order to prove eq.

While the final result is, of course, not new, we offer it as it gives an alternative derivation that does not rely on asymptotics such as a growing super population or a focus on the properties of the treatment assignment mechanism.

We go in the other direction: we use the variance decomposition of eq. This decomposition approach also holds for other types of experiments. First, for a stratified experiment, each stratum is essentially a completely randomized experiment.

Apply the result to each stratum, and then average over all strata to obtain results for a stratified experiment. Second, because a matched-pair experiment is a special case of a stratified experiment with two units within each stratum, we can derive the Neyman-type variance cf.

Third, a cluster-randomized experiment is a completely randomized experiment on the clusters. If the causal parameters can be expressed as cluster-level outcomes, then the result can be straightforwardly applied cf. Fourth, for general experimental designs, the variance decomposition in eq.

However, Aronow et al. This demonstrates that we can indeed make better inference conditional on the sample we have. On the other hand, in this work, we showed that assuming an infinite population, while not necessarily giving the tightest variance expressions, nonetheless gives valid conservative variance expressions from a finite-population perspective.

We offer this approach as a possible method of proof that could ease derivations for more complex designs. More broadly, it is a step towards establishing that infinite population derivations for randomized experiments can be generally thought of as pertaining to their finite population analogs. Also see Lin [ 16 ] and Samii and Aronow [ 33 ], who provide alternative discussions of super population regression-based variance estimators under the finite population framework. This connection becomes more apparent when only the ranks of the outcomes are used to construct the test statistics, as discussed extensively by Lehmann [ 35 ].

We thank Dr. Peter Aronow the Associate Editor and three anonymous reviewers for helpful comments. Neyman J. On the application of probability theory to agricultural experiments. Section 9 translated.

Reprinted ed. Stat Sci ;— Search in Google Scholar. Statistical problems in agricultural experimentation with discussion. J Roy Stat Soc ;— Kempthorne O. The design and analysis of experiments. New York: John Wiley and Sons, Hinkelmann K, Kempthorne O. Design and analysis of experiments, volume 1: introduction to experimental design, 2nd ed. Copas J. Biometrika ;— Rosenbaum PR. Observational studies, 2nd ed. New York: Springer, Imai K. Variance identification and efficiency analysis in randomized experiments under the matched-pair design.

Stat Med ;— Freedman DA. On regression adjustments in experiments with several treatments. Ann Appl Stat a;— Randomization does not justify logistic regression.

Stat Sci b;— Design of observational studies. A class of unbiased estimators of the average treatment effect in randomized experiments. J Causal Inference ;— Sharp bounds on the variance in randomized experiments. Ann Stat ;— Finite population causal standard errors.

Design and Inference in Finite Population Sampling

My research in survey sampling focuses on model-based Bayesian methods for complex survey designs that are robust to misspecification, and comparing the resulting inferences to classical methods based on the randomization distribution. Methods for survey nonresponse are discussed in the section on missing data research. Little, R. Journal of Official Statistics , 28, 3, Statistical Science 26, 2,

Not a MyNAP member yet? Register for a free account to start saving and receiving special member only perks. The final technical session of the workshop covered analysis techniques for small population and small sample research. Rick H. Hoyle Duke University described design and analysis considerations in research with small populations.


Finite Population Sampling * Comparisons with Design-Based Regression Estimation, Exercises Robustness and Design-Based Inference,


Design and Inference in Finite Population Sampling

Either your web browser doesn't support Javascript or it is currently turned off. In the latter case, please turn on Javascript support in your web browser and reload this page. Review Free to read.

This is a preview of subscription content, access via your institution. Rent this article via DeepDyve. Reprints and Permissions. Rao, P. Design and inference in finite population sampling.

Full text is available as a scanned copy of the original print version. Get a printable copy PDF file of the complete article K , or click on a page image below to browse page by page. National Center for Biotechnology Information , U. J Epidemiol Community Health.

Design and inference in finite population sampling

Design and Inference in Finite Population Sampling

The problem of handling non-ignorable non-response has been typically addressed under the design-based approach using the well-known sub-sampling technique introduced by Hansen and Hurwitz [, Journal of the American Statistical Association, Vol 41 , Page ]. Alternatively, the model-based paradigm emphasizes on utilizing the underlying model relationship between the outcome variable and one or more covariate s whose population values are known prior to the survey. This article utilizes the model relationship between the study variable and covariate s for handling non-ignorable non-response and obtaining an unbiased estimator for the population total under the sub-sampling technique. The main idea is to combine the estimates obtained from the sample on first call and the sub-sample from second call using separate model relationships.

Skip to search form Skip to main content You are currently offline. Some features of the site may not work correctly. DOI:

Most statistical theory is premised on an underlying infinite population. By contrast, survey sampling theory and practice are built on a foundation of sampling from a finite population. This basic difference has myriad ramifications, and it highlights why survey sampling is often regarded as a separate branch of statistical thinking. On a philosophical level, the theory brings statistical theory to a human, and thus necessarily finite, level. Before describing the basic notion of finite population sampling, it is instructive to explore the analogies and differences with sampling from infinite populations. These analogies were first described in Jerzy Neyman's seminal articles in the s and are discussed in basic sampling theory textbooks such as William Cochran's in the s.

Design and Inference in Finite Population Sampling

There are two general views in causal analysis of experimental data: the super population view that the units are an independent sample from some hypothetical infinite population, and the finite population view that the potential outcomes of the experimental units are fixed and the randomness comes solely from the treatment assignment. These two views differs conceptually and mathematically, resulting in different sampling variances of the usual difference-in-means estimator of the average causal effect. Practically, however, these two views result in identical variance estimators. By recalling a variance decomposition and exploiting a completeness-type argument, we establish a connection between these two views in completely randomized experiments.

Сьюзан многим ему обязана; потратить день на то, чтобы исполнить его поручение, - это самое меньшее, что он мог для нее сделать. К сожалению, утром все сложилось не так, как он планировал. Беккер намеревался позвонить Сьюзан с борта самолета и все объяснить.

free pdf book pdf

4 Comments

  1. Max S.

    Download Product Flyer. Download Product Flyer is to download PDF in new tab. This is a dummy description. Download Product Flyer is to download PDF in new​.

    08.06.2021 at 15:34 Reply
  2. Andrea E.

    Sahar Z.

    08.06.2021 at 19:17 Reply
  3. Meygradmakmi

    JavaScript is disabled for your browser.

    12.06.2021 at 16:44 Reply
  4. Maurelle B.

    detailed. Another interesting methodology discussed is related to file-merging problems arising in official statistics. The chapter concludes with a detailed.

    13.06.2021 at 00:02 Reply

Leave your comment

Subscribe

Subscribe Now To Get Daily Updates