Test Time Training ++

Paper | Code

Can TTT always mitigate distributional shift? This paper shows an improved version of TTT

TTT adapts neural networks to new data distributions at test time on unlabeled samples, using two tasks:

  • Main task (classification)
  • Auxiliary task (SSL reinforcement)

When does it fail?

  • When the auxiliary task is not informative
  • When the auxiliary task overfits --> main task may worsen

Solution: Online feature alignment (domain adaptation with divergence measure). After training we have offline feature summarization (mean and std calculation), channel wise batch norm using stats just calculated, then test time regularization minimizing the distance between test and train samples.

  • Online dynamic queue

TTT-A: alignment of features (first + second order statistics) TTT-C: contrastive learning addition (SSL on target domain)

TTT++: TTT-A + TTT-C

Limits: feature summarization + resnet backbone