I'll add a third method, just for variety: building up the kernel from a sequence of general steps known to create pd kernels. Let X denote the domain of the kernels below and φ the feature maps.
Scalings:
If κ is a pd kernel, so is γκ for any constant γ>0.
Proof: if φ is the feature map for κ, γ−−√φ is a valid feature map for γκ.
Sums:
If κ1 and κ2 are pd kernels, so is κ1+κ2.
Proof: Concatenate the feature maps φ1 and φ2, to get x↦[φ1(x)φ2(x)].
Limits:
If κ1,κ2,… are pd kernels, and κ(x,y):=limn→∞κn(x,y) exists for all x,y, then κ is pd.
Proof: For each m,n≥1 and every {(xi,ci)}mi=1⊆X×R we have that ∑mi=1ciκn(xi,xj)cj≥0. Taking the limit as n→∞ gives the same property for κ.
Products:
If κ1 and κ2 are pd kernels, so is g(x,y)=κ1(x,y)κ2(x,y).
Proof: It follows immediately from the Schur product theorem, but Schölkopf and Smola (2002) give the following nice, elementary proof.
Let
(V1,…,Vm)∼N(0,[κ1(xi,xj)]ij)(W1,…,Wm)∼N(0,[κ2(xi,xj)]ij)
be independent.
Thus
Cov(ViWi,VjWj)=Cov(Vi,Vj)Cov(Wi,Wj)=κ1(xi,xj)κ2(xi,xj).
Covariance matrices must be psd,
so considering the covariance matrix of (V1W1,…,VnWn) proves it.
Powers:
If κ is a pd kernel, so is κn(x,y):=κ(x,y)n for any positive integer n.
Proof: immediate from the "products" property.
Exponents:
If κ is a pd kernel, so is eκ(x,y):=exp(κ(x,y)).
Proof: We have
eκ(x,y)=limN→∞∑Nn=01n!κ(x,y)n; use the "powers", "scalings", "sums", and "limits" properties.
Functions:
If κ is a pd kernel and f:X→R, g(x,y):=f(x)κ(x,y)f(y) is as well.
Proof: Use the feature map x↦f(x)φ(x).