TR-NMF: Temporally-Regularized Non-Negative Matrix Factorization

Goal: Develop an NMF variant that exploits temporal ordering in the columns of $V$ to improve factorization quality for longitudinal data. Derive update rules, prove convergence, and benchmark against Lee & Seung (2001) and existing NMF solvers.

Objective 1: Understand Standard NMF

Learn the baseline algorithm and its convergence theory well enough to extend it.

Core reading:

Lee & Seung (1999) — Learning the parts of objects by non-negative matrix factorization, Nature. The paper that started it all. Short (3 pages). Why non-negativity forces parts-based decomposition.
Lee & Seung (2001) — Algorithms for Non-negative Matrix Factorization, NeurIPS. The most important paper in the reading list. Multiplicative update rules, the auxiliary function convergence proof, and the diagonal rescaling interpretation. Work through every line of the proof — the auxiliary function technique is exactly what gets extended for regularized variants.
Supplementary proof guide (2025) — fills in gaps in Lee & Seung’s original proof.

Known limitations:

Lin (2007) — Lee & Seung’s proof shows the objective decreases monotonically, but NOT that it converges to a stationary point (the iterates could cycle). Proposes projected gradient as a fix.
Koulakis et al. (2020) — a modified MU that provably avoids saddle points and converges to second-order stationary points. State of the art for NMF convergence theory.

Objective 2: Learn the Optimization Foundations

Constrained optimization:

Boyd & Vandenberghe, Convex Optimization (2004) — Ch. 4 (constrained problems), Ch. 9 (unconstrained minimization), Ch. 11 (interior point)
Tibshirani, Proximal Gradient Descent and Acceleration — covers FISTA, $O(1/k^2)$ convergence vs $O(1/k)$ for vanilla proximal gradient

Matrix calculus:

The Matrix Cookbook (Petersen & Pedersen, 2012) — derivatives of traces, Frobenius norm, Hadamard product identities

Majorization-minimization:

Hunter & Lange, A Tutorial on MM Algorithms (2004) — the general framework behind Lee & Seung’s proof (same idea as EM’s Q-function). Clear, stats-friendly introduction.

Objective 3: Survey Regularized NMF Variants

Understand what’s been done so the contribution is clearly novel.

Sparse NMF:

Hoyer (2004) — L1/L2 sparseness constraints on $W$ and/or $H$. Uses projected gradient descent instead of multiplicative updates.

Graph-regularized NMF:

Cai et al. (2011) — adds $\lambda \cdot \text{tr}(H L H^T)$ where $L$ is a graph Laplacian. The direct template for the temporal variant — the temporal penalty is the special case where the graph is a chain (path graph over ordered columns).

General regularized MU:

Iterative Algorithm for Regularized NMF (2024) — generalizes MU to handle weighted norms, ridge, and Lasso. Recent, clean notation.

Objective 4: Identify the Gap — Temporal Regularization

Two existing approaches, both with clear limitations:

TRMF — Yu, Rao & Dhillon (2016) — temporal regularization for matrix factorization, but allows negative values. Designed for forecasting in high-dimensional time series. Uses autoregressive regularization, not smoothness.
Chiovetto et al. (2016) — NMF with temporally constrained coefficients. Closest prior work, but uses a hard averaging constraint (each coefficient must equal the average of its neighbors). Not tunable, no convergence proof.

The gap: No one has done temporally-regularized non-negative MF with a tunable smoothness penalty and provable convergence. TRMF isn’t non-negative. Chiovetto isn’t tunable or proven. Graph-regularized NMF uses arbitrary graphs, not the temporal chain specifically.

Objective 5: Formulate TR-NMF

The proposed objective function:

$$\min_{W,H \geq 0} \frac{1}{2}\lVert V - WH \rVert_F^2 + \frac{\lambda}{2} \lVert H D^T \rVert_F^2$$

where $D$ is the $(p-1) \times p$ first-difference matrix. The penalty expands as:

$$\lVert H D^T \rVert_F^2 = \sum_i \sum_{j=1}^{p-1}(H_{i,j+1} - H_{i,j})^2$$

This penalizes jagged temporal patterns in $H$. $\lambda = 0$ recovers standard NMF. $\lambda \to \infty$ forces perfectly flat $H$ rows. The matrix $D^T D$ is the discrete Laplacian on a path graph — the temporal chain specialization of Cai et al.’s general graph Laplacian.

Key tasks:

Derive modified multiplicative update rules (split $\Phi = D^T D$ into positive and negative parts, following Cai et al.’s pattern)
Prove convergence via auxiliary function construction
Characterize the regularization path as $\lambda$ varies
Develop cross-validation strategy for $\lambda$ selection

Objective 6: Benchmark and Write

Benchmark against:

Method	Loss	Optimizer
Lee & Seung MU	Frobenius	Multiplicative updates
HALS	Frobenius	Hierarchical ALS
TR-NMF	Frobenius + temporal	Modified MU
Projected gradient	Frobenius	Projected gradient descent

Metrics: reconstruction error, factor recovery accuracy (on synthetic data with known ground truth), stability across random starts (cophenetic correlation), convergence speed.

Target venues: NeurIPS / ICML, SIAM Journal on Matrix Analysis, Annals of Applied Statistics, JMLR.

Reading Order

Order	Paper	Objective	Time
1	Lee & Seung (1999)	Intuition for NMF	1 hour
2	Lee & Seung (2001)	Update rules + convergence proof	1 day
3	Supplementary proof guide (2025)	Fill proof gaps	2 hours
4	Hunter & Lange (2004)	General MM framework	3 hours
5	Boyd & Vandenberghe Ch. 9	Optimization foundations	1 day
6	Tibshirani proximal gradient	Proximal methods, FISTA	3 hours
7	Lin (2007)	Known gaps in convergence	3 hours
8	Cai et al. (2011)	Template for the proof	1 day
9	Hoyer (2004)	Alternative: projected gradient	3 hours
10	TRMF (Yu et al. 2016)	Temporal MF (not NMF)	3 hours
11	Chiovetto et al. (2016)	Temporal NMF (hard constraint)	2 hours
12	Koulakis et al. (2020)	Second-order convergence	1 day

Additional References

Resource	Description
Matrix Cookbook	Matrix calculus reference
Regularized NMF (2024)	General recipe for regularized MU
mSOM 2nd-order NMF (2023)	Second-order majorant algorithm
NNDSVD initialization survey	Initialization strategies
Boutsidis & Gallopoulos (2008)	NNDSVD original paper
Brunet et al. (2004)	Cophenetic correlation for rank selection
spOT-NMF (2025)	Optimal transport NMF