Test Time Training ++
Can TTT always mitigate distributional shift? This paper shows an improved version of TTT
TTT adapts neural networks to new data distributions at test time on unlabeled samples, using two tasks:
- Main task (classification)
- Auxiliary task (SSL reinforcement)
When does it fail?
- When the auxiliary task is not informative
- When the auxiliary task overfits --> main task may worsen
Solution: Online feature alignment (domain adaptation with divergence measure). After training we have offline feature summarization (mean and std calculation), channel wise batch norm using stats just calculated, then test time regularization minimizing the distance between test and train samples.
- Online dynamic queue
TTT-A: alignment of features (first + second order statistics) TTT-C: contrastive learning addition (SSL on target domain)
TTT++: TTT-A + TTT-C
Limits: feature summarization + resnet backbone