KL Divergence — in layman’s terms

Ajit Rajasekharan
8 min readMar 23, 2019

If we are asked to look at the three animals below and say which one is more of a cat than a dog, most of us would agree that

  • the first one is “all cat and no dog”
  • second one is “more cat than dog”
  • third is “more dog than cat”

Images of animals are from this link Marvin’s review of The Illustrated Encyclopedia of Cat Breeds

If we want a neural net based model to do the same thing (we have gotten good at this in the last few years particularly with neural net based models)

  1. we need to first generate some training data ourselves labeling each picture with some probability assignments like the values shown above ( e.g. 90% [.9] cat ; 10% [.1] dog)
  2. then have the model predict these values for each image in our training set and let the model keep improving its predictions based on far off they are from the values humans assigned to them.
  3. Once the model does this successfully for a large number of pictures, then it is likely to make predictions even for images of cats and dogs never seen before, that would also agree with our estimates.

Focussing specifically on the last part of the second step above, “how does the model calculate how far off it is with its prediction of percentages of

--

--