Analysis of Entropy in Hidden States of RNNs
Entropy lowers similarly to the training loss?
What does it mean if the entropy lowers?
- In hidden states: means that the process is becoming more predictable. (overfit?) (vanishing gradients?)
- In general: less entropy means that items of the tensor are more and more similar to each other.
We don't want entropy to drop too much in the training process.
TODO What happens if we overfit the model (smaller hidden size)?