Goal: Develop an NMF variant that exploits temporal ordering in the columns of $V$ to improve factorization quality for longitudinal data. Derive update rules, prove convergence, and benchmark against Lee & Seung (2001) and existing NMF solvers.


Objective 1: Understand Standard NMF

Learn the baseline algorithm and its convergence theory well enough to extend it.

Core reading:

  1. Lee & Seung (1999)Learning the parts of objects by non-negative matrix factorization, Nature. The paper that started it all. Short (3 pages). Why non-negativity forces parts-based decomposition.

  2. Lee & Seung (2001)Algorithms for Non-negative Matrix Factorization, NeurIPS. The most important paper in the reading list. Multiplicative update rules, the auxiliary function convergence proof, and the diagonal rescaling interpretation. Work through every line of the proof — the auxiliary function technique is exactly what gets extended for regularized variants.

  3. Supplementary proof guide (2025) — fills in gaps in Lee & Seung’s original proof.

Known limitations:

  • Lin (2007) — Lee & Seung’s proof shows the objective decreases monotonically, but NOT that it converges to a stationary point (the iterates could cycle). Proposes projected gradient as a fix.

  • Koulakis et al. (2020) — a modified MU that provably avoids saddle points and converges to second-order stationary points. State of the art for NMF convergence theory.


Objective 2: Learn the Optimization Foundations

Constrained optimization:

Matrix calculus:

Majorization-minimization:


Objective 3: Survey Regularized NMF Variants

Understand what’s been done so the contribution is clearly novel.

Sparse NMF:

  • Hoyer (2004) — L1/L2 sparseness constraints on $W$ and/or $H$. Uses projected gradient descent instead of multiplicative updates.

Graph-regularized NMF:

  • Cai et al. (2011) — adds $\lambda \cdot \text{tr}(H L H^T)$ where $L$ is a graph Laplacian. The direct template for the temporal variant — the temporal penalty is the special case where the graph is a chain (path graph over ordered columns).

General regularized MU:


Objective 4: Identify the Gap — Temporal Regularization

Two existing approaches, both with clear limitations:

  • TRMF — Yu, Rao & Dhillon (2016) — temporal regularization for matrix factorization, but allows negative values. Designed for forecasting in high-dimensional time series. Uses autoregressive regularization, not smoothness.

  • Chiovetto et al. (2016) — NMF with temporally constrained coefficients. Closest prior work, but uses a hard averaging constraint (each coefficient must equal the average of its neighbors). Not tunable, no convergence proof.

The gap: No one has done temporally-regularized non-negative MF with a tunable smoothness penalty and provable convergence. TRMF isn’t non-negative. Chiovetto isn’t tunable or proven. Graph-regularized NMF uses arbitrary graphs, not the temporal chain specifically.


Objective 5: Formulate TR-NMF

The proposed objective function:

$$\min_{W,H \geq 0} \frac{1}{2}\lVert V - WH \rVert_F^2 + \frac{\lambda}{2} \lVert H D^T \rVert_F^2$$

where $D$ is the $(p-1) \times p$ first-difference matrix. The penalty expands as:

$$\lVert H D^T \rVert_F^2 = \sum_i \sum_{j=1}^{p-1}(H_{i,j+1} - H_{i,j})^2$$

This penalizes jagged temporal patterns in $H$. $\lambda = 0$ recovers standard NMF. $\lambda \to \infty$ forces perfectly flat $H$ rows. The matrix $D^T D$ is the discrete Laplacian on a path graph — the temporal chain specialization of Cai et al.’s general graph Laplacian.

Key tasks:

  • Derive modified multiplicative update rules (split $\Phi = D^T D$ into positive and negative parts, following Cai et al.’s pattern)
  • Prove convergence via auxiliary function construction
  • Characterize the regularization path as $\lambda$ varies
  • Develop cross-validation strategy for $\lambda$ selection

Objective 6: Benchmark and Write

Benchmark against:

MethodLossOptimizer
Lee & Seung MUFrobeniusMultiplicative updates
HALSFrobeniusHierarchical ALS
TR-NMFFrobenius + temporalModified MU
Projected gradientFrobeniusProjected gradient descent

Metrics: reconstruction error, factor recovery accuracy (on synthetic data with known ground truth), stability across random starts (cophenetic correlation), convergence speed.

Target venues: NeurIPS / ICML, SIAM Journal on Matrix Analysis, Annals of Applied Statistics, JMLR.


Reading Order

OrderPaperObjectiveTime
1Lee & Seung (1999)Intuition for NMF1 hour
2Lee & Seung (2001)Update rules + convergence proof1 day
3Supplementary proof guide (2025)Fill proof gaps2 hours
4Hunter & Lange (2004)General MM framework3 hours
5Boyd & Vandenberghe Ch. 9Optimization foundations1 day
6Tibshirani proximal gradientProximal methods, FISTA3 hours
7Lin (2007)Known gaps in convergence3 hours
8Cai et al. (2011)Template for the proof1 day
9Hoyer (2004)Alternative: projected gradient3 hours
10TRMF (Yu et al. 2016)Temporal MF (not NMF)3 hours
11Chiovetto et al. (2016)Temporal NMF (hard constraint)2 hours
12Koulakis et al. (2020)Second-order convergence1 day

Additional References

ResourceDescription
Matrix CookbookMatrix calculus reference
Regularized NMF (2024)General recipe for regularized MU
mSOM 2nd-order NMF (2023)Second-order majorant algorithm
NNDSVD initialization surveyInitialization strategies
Boutsidis & Gallopoulos (2008)NNDSVD original paper
Brunet et al. (2004)Cophenetic correlation for rank selection
spOT-NMF (2025)Optimal transport NMF