It is indeed something. To find out, we need to examine what we know about correlation itself.
The correlation matrix of a vector-valued random variable X=(X1,X2,…,Xp)X=(X1,X2,…,Xp) is the variance-covariance matrix, or simply "variance," of the standardized version of XX. That is, each XiXi is replaced by its recentered, rescaled version.
The covariance of XiXi and XjXj is the expectation of the product of their centered versions. That is, writing X′i=Xi−E[Xi]X′i=Xi−E[Xi] and X′j=Xj−E[Xj]X′j=Xj−E[Xj], we have
Cov(Xi,Xj)=E[X′iX′j].
Cov(Xi,Xj)=E[X′iX′j].
The variance of XX, which I will write Var(X)Var(X), is not a single number. It is the array of values Var(X)ij=Cov(Xi,Xj).
Var(X)ij=Cov(Xi,Xj).
The way to think of the covariance for the intended generalization is to consider it a tensor. That means it's an entire collection of quantities vijvij, indexed by ii and jj ranging from 11 through pp, whose values change in a particularly simple predictable way when XX undergoes a linear transformation. Specifically, let Y=(Y1,Y2,…,Yq)Y=(Y1,Y2,…,Yq) be another vector-valued random variable defined by
Yi=p∑j=1ajiXj.
Yi=∑j=1pajiXj.
The constants ajiaji (ii and jj are indexes--jj is not a power) form a q×pq×p array A=(aji)A=(aji), j=1,…,pj=1,…,p and i=1,…,qi=1,…,q. The linearity of expectation implies
Var(Y)ij=∑akialjVar(X)kl.
Var(Y)ij=∑akialjVar(X)kl.
In matrix notation,
Var(Y)=AVar(X)A′.
Var(Y)=AVar(X)A′.
All the components of Var(X)Var(X) actually are univariate variances, due to the Polarization Identity
4Cov(Xi,Xj)=Var(Xi+Xj)−Var(Xi−Xj).
4Cov(Xi,Xj)=Var(Xi+Xj)−Var(Xi−Xj).
This tells us that if you understand variances of univariate random variables, you already understand covariances of bivariate variables: they are "just" linear combinations of variances.
The expression in the question is perfectly analogous: the variables XiXi have been standardized as in (1)(1). We can understand what it represents by considering what it means for any variable, standardized or not. We would replaced each XiXi by its centered version, as in (2)(2), and form quantities having three indexes,
μ3(X)ijk=E[X′iX′jX′k].
μ3(X)ijk=E[X′iX′jX′k].
These are the central (multivariate) moments of degree 33. As in (4)(4), they form a tensor: when Y=AXY=AX, then
μ3(Y)ijk=∑l,m,naliamjankμ3(X)lmn.
μ3(Y)ijk=∑l,m,naliamjankμ3(X)lmn.
The indexes in this triple sum range over all combinations of integers from 11 through pp.
The analog of the Polarization Identity is
24μ3(X)ijk=μ3(Xi+Xj+Xk)−μ3(Xi−Xj+Xk)−μ3(Xi+Xj−Xk)+μ3(Xi−Xj−Xk).
24μ3(X)ijk=μ3(Xi+Xj+Xk)−μ3(Xi−Xj+Xk)−μ3(Xi+Xj−Xk)+μ3(Xi−Xj−Xk).
On the right hand side, μ3μ3 refers to the (univariate) central third moment: the expected value of the cube of the centered variable. When the variables are standardized, this moment is usually called the skewness. Accordingly, we may think of μ3(X)μ3(X) as being the multivariate skewness of XX. It is a tensor of rank three (that is, with three indices) whose values are linear combinations of the skewnesses of various sums and differences of the XiXi. If we were to seek interpretations, then, we would think of these components as measuring in pp dimensions whatever the skewness is measuring in one dimension. In many cases,
The first moments measure the location of a distribution;
The second moments (the variance-covariance matrix) measure its spread;
The standardized second moments (the correlations) indicate how the spread varies in pp-dimensional space; and
The standardized third and fourth moments are taken to measure the shape of a distribution relative to its spread.
To elaborate on what a multidimensional "shape" might mean, observed that we can understand PCA as a mechanism to reduce any multivariate distribution to a standard version located at the origin and equal spreads in all directions. After PCA is performed, then, μ3μ3 would provide the simplest indicators of the multidimensional shape of the distribution. These ideas apply equally well to data as to random variables, because data can always be analyzed in terms of their empirical distribution.
Reference
Alan Stuart & J. Keith Ord, Kendall's Advanced Theory of Statistics Fifth Edition, Volume 1: Distribution Theory; Chapter 3, Moments and Cumulants. Oxford University Press (1987).
Appendix: Proof of the Polarization Identity
Let x1,…,xnx1,…,xn be algebraic variables. There are 2n2n ways to add and subtract all nn of them. When we raise each of these sums-and-differences to the nthnth power, pick a suitable sign for each of those results, and add them up, we will get a multiple of x1x2⋯xnx1x2⋯xn.
More formally, let S={1,−1}nS={1,−1}n be the set of all nn-tuples of ±1±1, so that any element s∈Ss∈S is a vector s=(s1,s2,…,sn)s=(s1,s2,…,sn) whose coefficients are all ±1±1. The claim is
2nn!x1x2⋯xn=∑s∈Ss1s2⋯sn(s1x1+s2x2+⋯+snxn)n.
2nn!x1x2⋯xn=∑s∈Ss1s2⋯sn(s1x1+s2x2+⋯+snxn)n.(1)
Indeed, the Multinomial Theorem states that the coefficient of the monomial xi11xi22⋯xinnxi11xi22⋯xinn (where the ij are nonnegative integers summing to n) in the expansion of any term on the right hand side is
(ni1,i2,…,in)si11si22⋯sinn.
In the sum (1), the coefficients involving xi11 appear in pairs where one of each pair involves the case s1=1, with coefficient proportional to s1 times si11, equal to 1, and the other of each pair involves the case s1=−1, with coefficient proportional to −1 times (−1)i1, equal to (−1)i1+1. They cancel in the sum whenever i1+1 is odd. The same argument applies to i2,…,in. Consequently, the only monomials that occur with nonzero coefficients must have odd powers of all the xi. The only such monomial is x1x2⋯xn. It appears with coefficient (n1,1,…,1)=n! in all 2n terms of the sum. Consequently its coefficient is 2nn!, QED.
We need take only half of each pair associated with x1: that is, we can restrict the right hand side of (1) to the terms with s1=1 and halve the coefficient on the left hand side to 2n−1n! . That gives precisely the two versions of the Polarization Identity quoted in this answer for the cases n=2 and n=3: 22−12!=4 and 23−13!=24.
Of course the Polarization Identity for algebraic variables immediately implies it for random variables: let each xi be a random variable Xi. Take expectations of both sides. The result follows by linearity of expectation.