Imagine we have a population and Y is a summary of that population. Then P(Y∈(y,y+Δy)) is counting the proportion of individuals that have variable Y in the range (y,y+Δy). You can consider this as a "bin" of size Δy and we are counting how many individuals are inside that bin.
Now let us re-express those individuals in terms of another variable, X. Given that we know that Y and X are related as Y=X2, the event Y∈(y,y+Δy) is the same as the event X2∈(x2,(x+Δx)2) which is the same as the event X∈(|x|,|x|+Δx) or X∈(−|x|−Δx,−|x|). Thus, the individuals that are in the bin (y,y+Δy) must also be in the bins (|x|,|x|+Δx) and (−|x|−Δx,−|x|). In other words, those bins must have the same proportion of individuals,
P(Y∈(y,y+Δy))=P(X∈(|x|,|x|+Δx))+P(X∈(−|x|−Δx,−|x|))
Ok, now let's get to the density. First, we need to define what a probability density is. As the name suggests, it is the proportion of individuals per area. That is, we count the share of individuals on that bin and divide by the size of the bin. Since we have established that the proportions of people are the same here, but the size of the bins have changed, we conclude the density will be different. But different by how much?
As we said, the probability density is the proportion of people in the bin divided by the size of the bin, thus the density of Y is given by fY(y):=P(Y∈(y,y+Δy))Δy. Analogously, the probability density of X is given by fX(x):=P(X∈(x,x+Δx))Δx.
From our previous result that the population in each bin is the same we then have that,
fY(y):=P(Y∈(y,y+Δy))Δy=P(X∈(|x|,|x|+Δx))+P(X∈(−|x|−Δx,−|x|))Δy=fX(|x|)Δx+fX(−|x|)ΔxΔy=ΔxΔy(fX(|x|)+fX(−|x|))=ΔxΔy(fX(y√)+fX(−y√))
That is, the density fX(y√)+fX(−y√) changes by the factor ΔxΔy, which is the relative size of stretching or squeezing the bin size. In our case, since y=x2 we have that y+Δy=(x+Δx)2=x2+2xΔx+Δx2. If Δx is tiny enough we can ignore Δx2, which implies Δy=2xΔx and ΔxΔy=12x=12y√, and that is why the factor 12y√ shows up in the transformation.