WT = (1 - 2μλ)n WT-n − μ Σ (1 - 2μλ)j-1 GT-j
B 3
μ 0.05
λ 0.01
Weight matrix WT
Singular values
Memory window (recent gradient updates, opacity = decay weight)
Step
0
Effective rank
-
Memory window
-