Wei Tan's team at IBM T. J. Watson Research Center developed cuMF, a CUDA-based matrix factorization library that optimizes the Alternating Least Squares (ALS) method to solve large-scale matrix factorization problems using NVIDIA GPUs.cuMF achieves excellent scalability and performance by applying techniques such as memory access optimization, data parallelism, and a topology-aware parallel reduc
