High school students may see the PMCC and Spearman correlation formulae years before they have the algebra skills to manipulate sigma notation, though they may well know the method of finite differences for deducing the polynomial equation for a sequence. So I have tried to write a "high school proof" for the equivalence: finding the denominator using finite differences, and minimising the algebraic manipulation of sums in the numerator. Depending on the students the proof is presented to, you may prefer this approach to the numerator, but combine it with a more conventional method for the denominator.
Denominator, √∑i(xi−ˉx)2∑i(yi−ˉy)2∑i(xi−x¯)2∑i(yi−y¯)2−−−−−−−−−−−−−−−−−−−√
With no ties, the data are the ranks {1,2,…,n}{1,2,…,n} in some order, so it is easy to show ˉx=n+12x¯=n+12. We can reorder the sum Sxx=∑ni=1(xi−ˉx)2=∑nk=1(k−n+12)2Sxx=∑ni=1(xi−x¯)2=∑nk=1(k−n+12)2, though with lower grade students I'd likely write this sum out explicitly rather than in sigma notation. The sum of a quadratic in kk will be cubic in nn, a fact that students familiar with the finite difference method may grasp intuitively: differencing a cubic produces a quadratic, so summing a quadratic produces a cubic. Determining the coefficients of the cubic f(n)f(n) is straightforward if students are comfortable manipulating ΣΣ notation and know (and remember!) the formulae for ∑nk=1k∑nk=1k and ∑nk=1k2∑nk=1k2. But they can also be deduced using finite differences, as follows.
When n=1n=1, the data set is just {1}{1}, ˉx=1x¯=1, so f(1)=(1−1)2=0f(1)=(1−1)2=0.
For n=2n=2, the data are {1,2}{1,2}, ˉx=1.5x¯=1.5, so f(2)=(1−1.5)2+(2−1.5)2=0.5f(2)=(1−1.5)2+(2−1.5)2=0.5.
For n=3n=3, the data are {1,2,3}{1,2,3}, ˉx=2x¯=2, so f(3)=(1−2)2+(2−2)2+(3−2)2=2f(3)=(1−2)2+(2−2)2+(3−2)2=2.
These computations are fairly brief, and help reinforce what the notation ∑ni=1(xi−ˉx)2∑ni=1(xi−x¯)2 means, and in short order we produce the finite difference table.
We can obtain the coefficients of f(n)f(n) by cranking out the finite difference method as outlined in the links above. For instance, the constant third differences indicate our polynomial is indeed cubic, with leading coefficient 0.53!=1120.53!=112. There are a few tricks to minimise drudgery: a well-known one is to use the common differences to extend the sequence back to n=0n=0, as knowing f(0)f(0) immediately gives away the constant coefficient. Another is to try extending the sequence to see if f(n)f(n) is zero for an integer nn - e.g. if the sequence had been positive but decreasing, it would be worth extending rightwards to see if we could "catch a root", as this makes factorisation easier later. In our case, the function seems to hover around low values when nn is small, so let's extend even further leftwards.
Aha! It turns out we have caught all three roots: f(−1)=f(0)=f(1)=0f(−1)=f(0)=f(1)=0. So the polynomial has factors of (n+1)(n+1), nn, and (n−1)(n−1). Since it was cubic it must be of the form:
f(n)=an(n+1)(n−1)
f(n)=an(n+1)(n−1)
We can see that aa must be the coefficient of n3n3 which we already determined to be 112112. Alternatively, since f(2)=0.5f(2)=0.5 we have a(2)(3)(1)=0.5a(2)(3)(1)=0.5 which leads to the same conclusion. Expanding the difference of two squares gives:
Sxx=n(n2−1)12
Sxx=n(n2−1)12
Since the same argument applies to SyySyy, the denominator is √SxxSyy=√S2xx=SxxSxxSyy−−−−−−√=S2xx−−−√=Sxx and we are done. Ignoring my exposition, this method is surprisingly short. If one can spot that the polynomial is cubic, it is necessary only to calculate SxxSxx for the cases n∈{1,2,3,4}n∈{1,2,3,4} to establish the third difference is 0.5. Root-hunters need only extend the sequence leftwards to n=0n=0 and n=−1n=−1, by when all three roots are found. It took me a couple of minutes to find SxxSxx this way.
Numerator, ∑i(xi−ˉx)(yi−ˉy)∑i(xi−x¯)(yi−y¯)
I note the identity (b−a)2≡b2−2ab+a2(b−a)2≡b2−2ab+a2 which can be rearranged to:
ab≡12(a2+b2−(b−a)2)
ab≡12(a2+b2−(b−a)2)
If we let a=xi−ˉx=xi−n+12a=xi−x¯=xi−n+12 and b=yi−ˉy=yi−n+12b=yi−y¯=yi−n+12 we have the useful result that b−a=yi−xi=dib−a=yi−xi=di because the means, being identical, cancel out. That was my intuition for writing the identity in the first place; I wanted to switch from working with the product of the moments to the square of their differences. We now have:
(xi−ˉx)(yi−ˉy)=12((xi−ˉx)2+(yi−ˉy)2−d2i)
(xi−x¯)(yi−y¯)=12((xi−x¯)2+(yi−y¯)2−d2i)
Hopefully even students unsure how to manipulate ΣΣ notation can see how summing over the data set yields:
Sxy=12(Sxx+Syy−n∑i=1d2i)
Sxy=12(Sxx+Syy−∑i=1nd2i)
We have already established, by reordering the sums, that Syy=SxxSyy=Sxx, leaving us with:
Sxy=Sxx−12n∑i=1d2i
The formula for Spearman's correlation coefficient is within our grasp!
rS=Sxy√SxxSyy=Sxx−12∑id2iSxx=1−∑id2i2Sxx
Substituting the earlier result that Sxx=112n(n2−1) will finish the job.
rS=1−∑id2i212n(n2−1)=1−6∑id2in(n2−1)