Interpretación de exp (B) en regresión logística multinomial

16

Esta es una pregunta para principiantes, pero ¿cómo se interpreta un resultado exp (B) de 6.012 en un modelo de regresión logística multinomial?

1) ¿es 6.012-1.0 = 5.012 = 5012% de aumento en el riesgo?

o

2) 6.012 / (1 + 6.012) = 0.857 = 85.7% de aumento en el riesgo?

En caso de que ambas alternativas sean incorrectas, ¿alguien puede mencionar la forma correcta?

He buscado muchos recursos en Internet y llego a estas dos alternativas, y no estoy completamente seguro de cuál es la correcta.

multinomial

— usuario6911
fuente

35

Nos llevará un tiempo llegar allí, pero en resumen, un cambio de una unidad en la variable correspondiente a B multiplicará el riesgo relativo del resultado (en comparación con el resultado base) por 6.012.

Uno podría expresar esto como un aumento del "5012%" en el riesgo relativo , pero esa es una forma confusa y potencialmente engañosa de hacerlo, porque sugiere que deberíamos pensar en los cambios de manera aditiva, cuando en realidad el modelo logístico multinomial nos alienta a Piensa multiplicativamente. El modificador "relativo" es esencial, porque un cambio en una variable está cambiando simultáneamente las probabilidades predichas de todos los resultados, no solo el en cuestión, por lo que tenemos que comparar las probabilidades (por medio de razones, no de diferencias).

El resto de esta respuesta desarrolla la terminología y la intuición necesarias para interpretar estas afirmaciones correctamente.

Antecedentes

Comencemos con la regresión logística ordinaria antes de pasar al caso multinomial.

Para la variable dependiente (binaria) $Y$ y las variables independientes $X_i$ , el modelo es

Pr [Y = 1] = \frac{\exp (β_{1} X_{1} + \dots + β_{m} X_{m})}{1 + \exp (β_{1} X_{1} + \dots + β_{m} X_{m})};

$\Pr[Y=1] = \frac{\exp(\beta_1 X_1 + \cdots + \beta_m X_m)}{1+\exp(\beta_1 X_1 + \cdots + \beta_m X_m)};$

de manera equivalente, suponiendo que $0 \ne \Pr[Y=1] \ne 1$ ,

\log (ρ (X_{1}, \dots, X_{m})) = \log \frac{Pr [Y = 1]}{Pr [Y = 0]} = β_{1} X_{1} + \dots + β_{m} X_{m} .

$\log(\rho(X_1, \cdots, X_m)) = \log\frac{\Pr[Y=1]}{\Pr[Y=0]} = \beta_1 X_1 + \cdots + \beta_m X_m.$

(Esto simplemente define $\rho$ , que son las probabilidades en función de la $X_i$ .)

Sin ninguna pérdida de generalidad, indexe para que sea la variable y sea la "B" en la pregunta (de modo que ). La fijación de los valores de , y la variación de en una pequeña cantidad produce $X_i$ $X_m$ $\beta_m$ $\exp(\beta_m)=6.012$ $X_i, 1\le i\lt m$ $X_m$ $\delta$

\log (ρ (\dots, X_{m} + δ)) - \log (ρ (\dots, X_{m})) = β_{m} δ .

$\log(\rho(\cdots, X_m+\delta)) - \log(\rho(\cdots, X_m)) = \beta_m \delta.$

Por lo tanto, es el cambio marginal en las probabilidades de registro con respecto a . $\beta_m$ $X_m$

Para recuperar , evidentemente debemos establecer y exponer el lado izquierdo: $\exp(\beta_m)$ $\delta=1$

\begin{aligned} \exp (β_{m}) & = \exp (β_{m} \times 1) \\ = \exp (\log (ρ (\dots, X_{m} + 1)) - \log (ρ (\dots, X_{m}))) \\ = \frac{ρ (\dots, X_{m} + 1)}{ρ (\dots, X_{m})} . \end{aligned}

$\eqalign{ \exp(\beta_m) &= \exp(\beta_m \times 1) \\ & = \exp( \log(\rho(\cdots, X_m+1)) - \log(\rho(\cdots, X_m))) \\ & = \frac{\rho(\cdots, X_m+1)}{\rho(\cdots, X_m)}. }$

Esto exhibe como la razón de posibilidades para un aumento de una unidad en $\exp(\beta_m)$ $X_m$ . Para desarrollar una intuición de lo que esto podría significar, tabule algunos valores para un rango de probabilidades iniciales, redondeando fuertemente para resaltar los patrones:

Starting odds  Ending odds  Starting Pr[Y=1]  Ending Pr[Y=1]
0.0001         0.0006       0.0001            0.0006
0.001          0.006        0.001             0.006
0.01           0.06         0.01              0.057
0.1            0.6          0.091             0.38
1.             6.           0.5               0.9
10.            60.          0.91              1.
100.           600.         0.99              1.

Para probabilidades muy pequeñas , que corresponden a probabilidades muy pequeñas , el efecto de un aumento de una unidad en es multiplicar las probabilidades o la probabilidad por aproximadamente 6.012. El factor multiplicativo disminuye a medida que las probabilidades (y la probabilidad) aumentan, y esencialmente desaparece una vez que las probabilidades exceden 10 (la probabilidad excede 0.9). $X_m$

Cambio de razón en la probabilidad

Como cambio aditivo , no hay mucha diferencia entre una probabilidad de 0.0001 y 0.0006 (es solo 0.05%), ni hay mucha diferencia entre 0.99 y 1. (solo 1%). El mayor efecto aditivo ocurre cuando las probabilidades son iguales a , donde la probabilidad cambia de 29% a 71%: un cambio de + 42%. $1/\sqrt{6.012} \sim 0.408$

Cambio aditivo en la probabilidad

Vemos, entonces, que si expresamos "riesgo" como odds ratio, $\beta_m$ = "B" tiene una interpretación simple: la razón de probabilidades es igual a para un aumento unitario en pero cuando expresamos riesgo en de alguna otra manera, como un cambio en las probabilidades, la interpretación requiere cuidado para especificar la probabilidad inicial. $\beta_m$ $X_m$

Regresión logística multinomial

(Esto se ha agregado como una edición posterior).

Habiendo reconocido el valor de usar probabilidades de registro para expresar posibilidades, pasemos al caso multinomial. Ahora la variable dependiente puede ser igual a una de categorías, indexada por . La probabilidad relativa de que esté en la categoría es $Y$ $k \ge 2$ $i=1, 2, \ldots, k$ $i$

Pr [Y_{i}] \sim \exp (β_{1}^{(i)} X_{1} + \dots + β_{m}^{(i)} X_{m})

$\Pr[Y_i] \sim \exp\left(\beta_1^{(i)} X_1 + \cdots + \beta_m^{(i)} X_m\right)$

con los parámetros para determinar y escribir para . Como abreviatura, escribamos la expresión de la derecha como o, donde y son claros del contexto, simplemente . La normalización para hacer que todas estas probabilidades relativas sumen a la unidad da $\beta_j^{(i)}$ $Y_i$ $\Pr[Y=\text{category }i]$ $p_i(X,\beta)$ $X$ $\beta$ $p_i$

Pr [Y_{i}] = \frac{p_{i} (X, β)}{p_{1} (X, β) + \dots + p_{m} (X, β)} .

$\Pr[Y_i] =\frac{p_i(X,\beta)}{p_1(X,\beta) + \cdots + p_m(X,\beta)}.$

(Hay una ambigüedad en los parámetros: hay demasiados. Convencionalmente, uno elige una categoría "base" para la comparación y obliga a que todos sus coeficientes sean cero. Sin embargo, aunque esto es necesario para informar estimaciones únicas de las beta, está no necesita interpretar los coeficientes a fin de mantener la simetría -. es decir, para evitar cualquier distinción artificial entre las categorías - ¡que no es hacer cumplir cualquier condicionante menos que sea necesario).

Una forma de interpretar este modelo es pedir la tasa marginal de cambio de las probabilidades de registro para cualquier categoría (por ejemplo, categoría ) con respecto a cualquiera de las variables independientes (por ejemplo, ). Es decir, cuando cambiamos un poco, eso induce un cambio en las probabilidades de registro de . Estamos interesados en la constante de proporcionalidad que relaciona estos dos cambios. La regla de la cadena de cálculo, junto con un poco de álgebra, nos dice que esta tasa de cambio es $i$ $X_j$ $X_j$ $Y_i$

\frac{\partial log odds (Y_{i})}{\partial X_{j}} = β_{j}^{(i)} - \frac{β_{j}^{(1)} p_{1} + \dots + β_{j}^{(i - 1)} p_{i - 1} + β_{j}^{(i + 1)} p_{i + 1} + \dots + β_{j}^{(k)} p_{k}}{p_{1} + \dots + p_{i - 1} + p_{i + 1} + \dots + p_{k}} .

$\frac{\partial\ \text{log odds}(Y_i)}{\partial\ X_j} = \beta_j^{(i)} - \frac{\beta_j^{(1)}p_1 + \cdots + \beta_j^{(i-1)}p_{i-1} + \beta_j^{(i+1)}p_{i+1} +\cdots + \beta_j^{(k)}p_k}{p_1 + \cdots + p_{i-1} + p_{i+1} + \cdots + p_k}.$

This has a relatively simple interpretation as the coefficient $\beta_j^{(i)}$ of $X_j$ in the formula for the chance that $Y$ is in category $i$ minus an "adjustment." The adjustment is the probability-weighted average of the coefficients of $X_j$ in all the other categories. The weights are computed using probabilities associated with the current values of the independent variables $X$ . Thus, the marginal change in logs is not necessarily constant: it depends on the probabilities of all the other categories, not just the probability of the category in question (category $i$ ).

$k=2$ $i=2$ $\beta_j^{(2)} - \beta_j^{(1)}$ $i$ $\beta_j^{(2)}$ , because we force $\beta_j^{(1)}=0$ . Thus the new interpretation generalizes the old.

To interpret $\beta_j^{(i)}$ directly, then, we will isolate it on one side of the preceding formula, leading to:

The coefficient of $X_j$ for category $i$ equals the marginal change in the log odds of category $i$ with respect to the variable $X_j$ , plus the probability-weighted average of the coefficients of all the other $X_{j'}$ for category $i$ .

Another interpretation, albeit a little less direct, is afforded by (temporarily) setting category $i$ as the base case, thereby making $\beta_j^{(i)}=0$ for all the independent variables $X_j$ :

The marginal rate of change in the log odds of the base case for variable $X_j$ is the negative of the probability-weighted average of its coefficients for all the other cases.

Actually using these interpretations typically requires extracting the betas and the probabilities from software output and performing the calculations as shown.

Finally, for the exponentiated coefficients, note that the ratio of probabilities among two outcomes (sometimes called the "relative risk" of $i$ compared to $i'$ ) is

\frac{Y_{i}}{Y_{i^{'}}} = \frac{p_{i} (X, β)}{p_{i^{'}} (X, β)} .

$\frac{Y_{i}}{Y_{i'}} = \frac{p_{i}(X,\beta)}{p_{i'}(X,\beta)}.$

Let's increase $X_j$ by one unit to $X_j+1$ . This multiplies $p_{i}$ by $\exp(\beta_j^{(i)})$ and $p_{i'}$ by $\exp(\beta_j^{(i')})$ , whence the relative risk is multiplied by $\exp(\beta_j^{(i)}) / \exp(\beta_j^{(i')})$ = $\exp(\beta_j^{(i)}-\beta_j^{(i')})$ . Taking category $i'$ to be the base case reduces this to $\exp(\beta_j^{(i)})$ , leading us to say,

The exponentiated coefficient $\exp(\beta_j^{(i)})$ is the amount by which the relative risk $\Pr[Y = \text{category }i]/\Pr[Y = \text{base category}]$ is multiplied when variable $X_j$ is increased by one unit.

— whuber
fuente

1

Great explanations, but the OP explicitly asked for the multinomial model. I may be reading more into the question than the OP intended, and the explanation for the binary case may be adequate, but I would love to see this answer cover the general multinomial case too. Even though the parametrization is similar, the "log-odds" are in general with respect to an (arbitrary) reference category, and they are not really log-odds, and a unit change in

X_{i}

$X_i$ results in a combined change of these "log-odds", and an increasing "log-odds" does not imply and increasing probability.

— NRH

@NRH That's an excellent point. I had somehow read "multivariate" instead of "multinomial." If I get a chance to return to this I will try to flesh out those details. Fortunately the same mode of analysis is effective in finding the correct interpretation.

— whuber

@NRH Done. I welcome your suggestions (or anyone else's) about how to make the interpretation clearer, or for alternative interpretations.

— whuber

1

thanks for writing this down. The complete answer is a very good reference.

— NRH

1

Try considering this bit of explanation in addition to what @whuber has already written so well. If exp(B) = 6, then the odds ratio associated with an increase of 1 on the predictor in question is 6. In a multinomial context, by "odds ratio" we mean the ratio of these two quantities: a) the odds (not probability, but rather p/[1-p]) of a case taking the value of the dependent variable indicated in the output table in question, and b) the odds of a case taking the reference value of the dependent variable.

You seem to be looking to quantify the probability--rather than odds-- of a case being in one or the other category. To do this you would need to know what probabilities the case "started with" -- i.e., before we assumed the increase of 1 on the predictor in question. Ratios of probabilities will vary case by case, while the ratio of odds connected with an increase of 1 on the predictor stays the same.

— rolando2
fuente

"If exp(B) = 6, then the odds ratio associated with an increase of 1 on the predictor in question is 6", if I read @whuber's answer correctly it says that the odds ratio will be multiplied by 6 with an increase of 1 on the predictor. That is, the new odds ratio will not be 6. Or am I intepreting things incorrectly?

— rbm

Where you say "the new odds ratio will not be 6" I would say "the new odds will not be 6...but the ratio of the new to the old odds will be 6."

— rolando2

Yes, I agree with that! But I just thought that "the odds ratio associated with an increase of 1 on the predictor in question is 6" does not really say that. But maybe I am just misinterpreting it then. Thanks for the clarification!

— rbm

1

I was also looking for the same answer, but the once above were not satisfying for me. It seemed to complex for what it really is. So I will give my interpretation, please correct me if I am wrong.

Do however read to the end, since it is important.

First of all the values B and Exp(B) are the once you are looking for. If the B is negative your Exp(B) will be lower than one, which means odds decrease. If higher the Exp(B) will be higher than 1, meaning odds increase. Since you are multiplying by the factor Exp(B).

Unfortunately you are not there yet. Because in a multinominal regression your dependent variable has multiple categories, let's call these categories D1, D2 and D3. Of which your last is the reference category. And let's assume your first independent variable is sex (males vs females).

Let's say the output for D1 -> males is exp(B)= 1.21, this means for males the odds increase by a factor 1.21 for being in the category D1 rather than D3 (reference category) compared to females (reference category).

So you are always comparing against your reference category of the dependent but also independent variables. This is not true if you have a covariate variable. In that case it would mean; a one unit increase in X increases the odds by a factor of 1.21 of being in category D1 rather than D3.

For those with an ordinal dependent variable:

If you have an ordinal dependent variable and did not do an ordinal regression because of the assumption of proportional odds for instance. Keep in mind your highest category is the reference category. Your result as above are valid to report. But keep in mind that an increase in odds than in fact means an increase in odds of being in the lower category rather than the higher! But that's only if you have an ordinal dependant variable.

If you want to know the increase in percentage, well take a fictive odds-number, let's say 100 and multiply it by 1.21 which is 121? Compared to 100 how much did it change percentage wise?

— Fico
fuente

0

Say that exp(b) in an mlogit is 1.04. if you multiply a number by 1.04, then it increases by 4%. That is the relative risk of being in category a instead of b. I suspect that part of the confusion here might have to do with by 4% (multiplicative meaning) and by 4 percent points (additive meaning). The % interpretation is correct if we talk about a percentage change not percentage point change. (The latter would not make sense anyhow as relative risks aren't expressed in terms of percentages.)

— natalia
fuente