¿Es el valor p una estimación puntual?

Dado que uno puede calcular los intervalos de confianza para los valores p y dado que lo opuesto a la estimación del intervalo es la estimación puntual: ¿es el valor p una estimación puntual?

— 00schneider
fuente

No creo que se puedan calcular intervalos de confianza para un valor p; Es una estadística calculada a partir de los datos, no un parámetro que describe el proceso de generación de datos. Por supuesto, aún puede preguntar qué estiman las estadísticas.

— Scortchi - Restablece a Monica

@Scortchi: but if I were to apply e.g. bootstrapping to compute a distribution of p-values and then were to construct a 95% percentile interval of this bootstrapped distribution, then if it's not a confidence interval for the p-value -- what is it?

— amoeba says Reinstate Monica

@amoeba: a confidence interval is about an unknown parameter, while your bootstrap interval is an approximation of a 95% region for a statistic.

— Xi'an

@Scorthci: I have seen software that prints CI's for p-values. In this case, the approximate p-values were calculated by permutation tests, so if the CI was too wide (i.e. p-value

\in [0, 0.05]

$\in [0, 0.05]$ and p-value

\in [0.05, 1]

$\in [0.05, 1]$ ), you would use more permutations before making inference.

— Cliff AB

@Cliff That's not a confidence interval for the p-value qua property of a distribution: that's a confidence interval for a stochastic estimator of the p-value of a test for a particular sample. Although they sound similar, and both are intervals, they are completely different things.

— whuber

Respuestas:

Point estimates and confidence intervals are for parameters that describe the distribution, e.g. mean or standard deviation.

But unlike other sample statistics like the sample mean and the sample standard deviation the p-value is not an useful estimator of an interesting distribution parameter. Look at the answer by @whuber for technical details.

The p-value for a test-statistic gives the probability of observing a deviation from the expected value of the test-statistic as least as large as observed in the sample, calculated under the assumption that the null hypothesis is true. If you have the entire distribution it is either consistent with the null hypothesis, or it is not. This can be described with by indicator variable (again, see the answer by @whuber).

But the p-value cannot be used as an useful estimator of the indicator variable because it is not consistent as the p-value does not converge as the sample size increases if the null hypothesis is true. This is a pretty complicated alternate way of stating that a statistical test can either reject or fail to reject the null, but never confirm it.

— Erik
fuente

Most of the better accounts of statistical tests (Lehman, Kiefer, etc.) do not refer to "populations" at all, but instead frame the situation in terms of estimating parameters of distributions. This does not require the randomness to be due solely to sampling, and thereby allows the theory more broadly to apply to situations where the randomness is part of a model.

— whuber

But you have explicitly contradicted that with the statement there "are no probabilities associated with the population at all." Please note, too, that all estimators are "explicitly defined on sample level." It is therefore difficult to determine what distinction you are trying to make in this post.

— whuber

Of course! But a distribution is not a population.

— whuber

(-1) I agree with both @Tim's common-sensical answer & whuber's recondite answer, but am struggling to make any sense of this one. (1) "But the p-value is not a population parameter since it is explicitly defined on sample level": this is doubtless worth pointing out, but the "but" makes it seem like you're saying that a p-value can't be an estimate of anything because it's a sample statistic, as if the sample mean couldn't be an estimate of anything because it's a sample statistic. ...

— Scortchi - Reinstate Monica

(2) "This is because there are no probabilities associated with the population at all, it is regarded as fixed but unknown": (a) The p-value isn't calculated from the sample because "there are no probabilities [...]"; (b) as @whuber's pointed out, sampling from a finite population is a special case; (c) in any case it just doesn't follow from what you've said that the p-value doesn't estimate anything about the population.

— Scortchi - Reinstate Monica

Yes, it could be (and has been) argued that a p-value is a point estimate.

In order to identify whatever property of a distribution a p-value might estimate, we would have to assume it is asymptotically unbiased. But, asymptotically, the mean p-value for the null hypothesis is $1/2$ (ideally; for some tests it might be some other nonzero number) and for any other hypothesis it is $0$ . Thus, the p-value could be considered an estimator of one-half the indicator function for the null hypothesis.

Admittedly it takes some creativity to view a p-value in this way. We could do a little better by viewing the estimator in question as the decision we make by means of the p-value: is the underlying distribution a member of the null hypothesis or of the alternate hypothesis? Let's call this set of possible decisions $D$ . Jack Kiefer writes

We suppose that there is an experiment whose outcome the statistician can observe. This outcome is described by a random variable or random vector $X$ ... . The probability law of $X$ is unknown to the statistician, but it is known that the distribution function $F$ of $X$ is a member of a specified class $\Omega$ of distribution functions. ...

A statistical problem is said to be a problem of point estimation if $D$ is the collection of possible values of some real or vector-valued property of $F$ which depends on $F$ in a reasonably smooth way.

In this case, because $D$ is discrete, "reasonably smooth" is not a restriction at all. Kiefer's terminology reflects this by referring to statistical procedures with discrete decision spaces as "tests" instead of "point estimators."

Although it is interesting to explore the limits (and limitations) of such definitions, as this question invites us to do, perhaps we should not insist too strongly that a p-value is a point estimator, because this distinction between estimators and tests is both useful and conventional.

In a comment to this question, Christian Robert brought attention to a 1992 paper where he and co-authors took exactly this point of view and analyzed the admissibility of the p-value as an estimator of the indicator function. See the link in the references below. The paper begins,

Approaches to hypothesis testing have usually treated the problem of testing as one of decision-making rather than estimation. More precisely, a formal hypothesis test will result in a conclusion as to whether a hypothesis is true, and not provide a measure of evidence to associate with that conclusion. In this paper we consider hypothesis testing as an estimation problem within a decision-theoretic framework ... .

[Emphasis added.]

References

Jiunn Tzon Hwang, George Casella, Christian Robert, Martin T. Wells, and Roger H. Farrell, Estimation of Accuracy in Testing. Ann. Statist. Volume 20, Number 1 (1992), 490-509. Open access.

Jack Carl Kiefer, Introduction to Statistical Inference. Springer-Verlag, 1987.

— whuber
fuente

Hmm. I am not sure if this view is helpful. For one in this sense the p-value is not a good estimator, since it is not consistent if the null hypothesis is true. And in the some cases (you mention that) it has a sample-size dependent bias as well. It might be technical true, but any random number could be (terrible) estimator for any parameter as well.

— Erik

The question does not ask whether the p-value is a good estimator, @Erik. As an estimator, it has obvious deficiencies. For instance, its asymptotic variance for the null hypothesis is nonzero. Please note that the bias of almost every unbiased estimator depends on sample size. Although you are correct that an independent random number could be viewed as an estimator, it would be an estimator of something different: it would estimate its own mean (by definition). Thus your objections appear not to have any relevance to the question at hand.

— whuber

I don't think we differ on any of those points, @Erik, except perhaps the "unhelpful" part. As Nick Cox points out in a comment elsewhere in this thread, it is nevertheless interesting to contemplate the sense in which a p-value could be considered an estimator and what, exactly, it could possibly be estimating. That can help us understand a little better just what a p-value is (and is not). Many would view that as a helpful exercise.

— whuber

In a 1992 paper, we study the

p

$p$ -value as an estimator of the indicator function

I_{Θ_{0}} (θ)

$\mathbb{I}_{\Theta_0}(\theta)$ and demonstrate that it can be an admissible estimator for one-sided hypothesis and cannot be admissible for two-sided hypotheses.

— Xi'an

@Xi'an I see we're only 23 years behind you... . Thank you for the reference!

— whuber

$p$ -values are not used for estimating any parameter of interest, but for hypothesis testing. For example, you could be interested in estimating population $\mu$ based on the sample you have, or you could be interested in interval estimate of it, but in hypothesis testing scenario you would rather compare the sample mean $\overline x$ with population mean $\mu$ to see if they differ. In fact in hypothesis testing scenario you are not interested in their particular values, but rather if they are below some threshold (e.g. $p < 0.05$ ). With $p$ -values you are not that much interested in their point values, but rather you want to know if your data provides enough evidence against null hypothesis. In hypothesis testing scenario, you would not be comparing different $p$ -values to each other, but rather use each of them to make separate decisions about your hypotheses. You don't really want to know anything about the hull hypothesis, as far as you know if you can reject it or not. This makes their values inseparable from the decision context and so they differ from point estimates, because with point estimates we are interested in their values per se.

— Tim
fuente

Your initial statement correctly echoes how things are often explained, but nevertheless it does not go deep enough. A basic fact here is sampling variation, the variability from sample to sample. Take a different sample, and your P-value will be different. It takes a little ingenuity to see precisely what it is estimating, and it is not (as far as I know) conventional to explain it as estimating an parameter, but that point of view makes perfect sense. See @whuber's interesting answer. (The entire territory is littered with muddy paraphrases based on the need to simplify for teaching.)

— Nick Cox

How terms are used is interesting and important (and a personal preoccupation, by the way). The question remains what a P-value is. This too is pointed out [inevitable pun here] elsewhere in this thread. It's a helpful convention to regard parameters as those unknowns which appear in a model specification, but there are other unknowns too.

— Nick Cox

@Tim, I think this claim (from your last comment) is almost always not true, at least in biology. People are very much interested in the value of p-values, marking

p < 0.05

$p<0.05$ ,

p < 0.01

$p<0.01$ ,

p < 0.001

$p<0.001$ with one, two, or three stars on the figures, writing about something being "highly significant", etc. The usual recommendation is also to report exact p-values, e.g.

p = 0.003

$p=0.003$ , and not

p < 0.05

$p<0.05$ . Only very rarely do people adhere to the strict Neyman-Pearson framework, choose

α

$\alpha$ in advance and report all p-values as

p < α

$p<\alpha$ .

— amoeba says Reinstate Monica

This question intersects with many others, most of which are highly controversial. One is the idealisation that the purpose of a test is to make a decision yes or no, which doesn't match all problems at all. Another key fact is that use of threshold levels was for decades a matter that people used published tables from printed tables and exact P-values were out of reach while people did not use computers.

— Nick Cox

@00schneider: If you do ever see an interval given for p-values, it's very unlikely to be a confidence interval for the population parameter defined by whuber. Tim's point is that there's no need to consider them as estimating anything at all, interesting though it may be to do so.

— Scortchi - Reinstate Monica