If you take a Bayesian approach and treat parameters describing the distribution of X as a random variable/vector, then the observations indeed are not independent, but they would be conditionally independent given knowledge of θ hence P(Xn∣Xn−1,…X1,θ)=P(Xn∣θ) would hold.
In a classical statistical approach, θ is not a random variable. Calculations are done as if we know what θ is. In some sense, you're always conditioning on θ (even if you don't know the value).
When you wrote, "... provide information about the distribution structure, and as a result about Xn" you implicitly were adopting a Bayesian approach but not doing it precisely. You're writing a property of IID samples that a frequentist would write, but the corresponding statement in a Bayesian setup would involve conditioning on θ.
Bayesian vs. Classical statisticians
Let xi be the result of flipping a lopsided, unfair coin. We don't know the probability the coin lands heads.
- To the classical statistician, the frequentist, P(xi=H) is some parameter, let's call it θ. Observe that θ here is a scalar, like the number 1/3. We may not know what the number is, but it's some number! It is not random!
- To the Bayesian statistician, θ itself is a random variable! This is extremely different!
The key idea here is that the Bayesian statistician extends the tools of probability to situations where the classical statistician doesn't. To the frequentist, θ isn't a random variable because it only has one possible value! Multiple outcomes are not possible! In the Bayesian's imagination though, multiple values of θ are possible, and the Bayesian is willing to model that uncertainty (in his own mind) using the tools of probability.
Where is this going?
Let's say we flip the coin n times. One flip does not affect the outcome of the other. The classical statistician would call these independent flips (and indeed they are). We'll have:
P(xn=H∣xn−1,xn−2,…,x1)=P(xn=H)=θ
Where
θ is some unknown parameter. (Remember, we don't know what it is, but it's
not a random variable! It's some number.)
A Bayesian deep into subjective probability would say that what matters is the probability from her perspective!. If she sees 10 heads in a row, an 11th head is more likely because 10 heads in a row leads one to believe the coin is lopsided in favor of heads.
P(x11=H∣x10=H,x9=H,…,x1=H)>P(x1=H)
What has happened here? What is different?! Updating beliefs about a latent random variable θ! If θ is treated as a random variable, the flips aren't independent anymore. But, the flips are conditionally independent given the value of θ.
P(x11=H∣x10=H,x9=H,…,x1=H,θ)=P(x1=H∣θ)=θ
Conditioning on θ in a sense connects how the Bayesian and the classical statistician models the problem. Or to put it another way, the frequentist and the Bayesian statistician will agree if the Bayesian conditions on θ.
Further notes
I've tried my best to give a short intro here, but what I've done is at best quite superficial and the concepts are in some sense quite deep. If you want to take a dive into the philosophy of probability, Savage's 1954 book, Foundation of Statistics is a classic. Google for bayesian vs. frequentist and a ton of stuff will come up.
Another way to think about IID draws is de Finetti's theorem and the notion of exchangeability. In a Bayesian framework, exchangeability is equivalent to independence conditional on some latent random variable (in this case, the lopsidedness of the coin).