To begin, a criticism
I picked up the Haskell Data Analysis Cookbook. The book presents examples of comparing data using Pearson Coefficient and using Cosine Similarity.
pearson xs ys = (n * sxy - sx * sy) / sqrt ((n * sxx - sx * sx) * (n * syy - sy * sy))
where
n = fromIntegral (length xs)
sx = sum xs
sy = sum ys
sxx = sum $ zipWith (*) xs ys
syy = sum $ zipWith (*) ys ys
sxy = sum $ zipWith (*) xs ys
cosine xs ys = dot d1 d2 / (len d1 * len d2)
where
dot a b = sum $ zipWith (*) a b
len a = sqrt $ dot a a
Although these code snippets are both calculating the ‘similarity’ between two vectors and actually, as we shall see, share a lot of structure, this is not at all apparent from a glance.
We can fix that however…
Definition of an Inner Product
An inner product is conceptually a way to see how long a vector is after projecting it along another (inside some space).
Formally, an inner product is a binary operator satisfying the following properties
Linearity
for
We are saying that we can push sums inside on the left to being outside. We can also push out constant factors.
(Conjugate) Symmetry
or in the complex case,
In the real case, we’re saying everything is symmetric – it doesn’t matter which way you do it. In the complex case we have to reflect things by taking the conjugate.
Positive Definiteness
with equality iff
Here we’re saying projecting a vector onto itself always results in a positive length. Secondly, the only way we can end up with a result of zero is if the vector itself is of length 0.
From Inner Product to a notion of ‘length’
Intuitively a distance between two things must be
- positive or zero (a negative distance makes not too much sense), with a length of zero corresponding to the zero vector
- linear (if we scale the vector threefold, the length should also increase threefold)
Given that we might be tempted to set but then upon scaling we get – we’re not scaling linearly.
Instead defining everything is good!
Similarity
Now, in the abstract, how similar are two vectors?
How about we first stop caring about how long they are, and want them just to point in the same direction. We can project one along the other and see how much it changes in length (shrinks).
Projecting is kind of like seeing what its component is in that direction – i.e. considering 2-dimensional vectors in the plane, projecting a vector onto a unit vector in the direction will tell you the component of that vector.
Let’s call two vectors and .
Firstly let’s scale them to be both of unit length,
Now, project one onto the other (remember we’re not caring about order because of symmetry).
Using linearity we can pull some stuff out (and also assuming everything’s happily a real vector – not caring about taking conjugates)…
Making Everything Concrete
Euclidean Inner Product
The dot product we know and love.
Plugging that into the similarity formula, we end up with the cosine similarity we started with!
Covariance Inner Product
The covariance between two vectors is defined as where we’re abusing the notion of expectation somewhat. This in fact works if X and Y are arbitrary L2 random variables… but for the very concrete case of finite vectors we could consider .
We’ve said in our space, to project a first vector onto a second we see how covariant the first is with the second – if they move together or not.
Plugging this inner product into the similarity formula, we instead get the pearson coefficient!
In fact, given , in this space we have ,
i.e. .
Improving the code
Now that we know this structure exists, I posit the following as being better
similarity ip xs ys = (ip xs ys) / ( (len xs) * (len ys) )
where len xs = sqrt(ip xs xs)
-- the inner products
dot xs ys = sum $ zipWith (*) xs ys
covariance xs ys = exy - (ex * ey)
where e xs = sum xs / (fromIntegral $ length xs)
exy = e $ zipWith (*) xs ys
ex = e xs
ey = e ys
-- the similarity functions
cosineSimilarity = similarity dot
pearsonSimilarity = similarity covariance
Things I’m yet to think about
…though maybe the answers are apparent.
We have a whole load of inner products available to us. What does it mean to use those inner products?
E.g. on – the inner product producing the Fourier transform. I’m not the resulting similarity is anything particularly special though…