Information Theory

In machine learning context, every input vector X and output vector Y can be considered as probability density function. Information Theory is mathematical framework which enables us to compare these probability density functions to ask questions such as — are these input vectors similar? does this feature has any information at all?

Entropy — is a measure of unpredictability of the state or average information content. So if the…