Papers


(Order of completion)

A Bregman-Kaczmarz method for nonlinear systems of equations, arXiv 2303.08549, 2023

Preprint


Handbook of Convergence Theorems for (Stochastic) Gradient Methods, arXiv:2301.11235 2023.

Preprint


Linear Convergence of Natural Policy Gradient Methods with Log-Linear Policies, ICLR 2023.

Preprint Proceedings


SP2 : A Second Order Stochastic Polyak Method, ICLR 2023.

Preprint Proceedings


Cutting Some Slack for SGD with Adaptive Polyak Stepsizes, 2022.

Preprint


Stochastic Polyak Stepsize with a Moving Target, 2021.

Preprint


A general sample complexity analysis of vanilla policy gradient, AISTATS 2022.

Preprint Proceedings

SAN: Stochastic Average Newton Algorithm for Minimizing Finite Sums, AISTATS 2022.

Preprint Code Proceedings

RidgeSketch: A Fast sketching based solver for large scale ridge regression, SIAM Journal of Matrix Analysis, Vol. 43, Iss. 3, 2022.

Preprint Journal Code

Variance-Reduced Methods for Machine Learning, Proceedings of the IEEE, vol. 108, no. 11, pp. 1968-1983, Nov. 2020.

Preprint Journal

Sketched Newton-Raphson, SIAM Journal of Optimization, Vol. 32, Iss. 3, 2022.

Preprint Journal

Unified Analysis of Stochastic Gradient Methods for Composite Convex and Smooth Optimization, 2020.

Preprint

SGD for Structured Nonconvex Functions: Learning Rates, Minibatching and Interpolation, AISTATS 2021.

Preprint Proceedings

Factorial Powers for Stochastic Optimization, Asian Conference on Machine Learning, 2021.

Preprint Proceedings

Almost sure convergence rates for Stochastic Gradient Descent and Stochastic Heavy Ball, COLT 2021.

Preprint Proceedings

Fast Linear Convergence of Randomized BFGS, 2020.

Preprint

Adaptive Sketch-and-Project Methods for Solving Linear Systems, SIAM Journal on Matrix Analysis and Applications, 2019.

Preprint Journal

Towards closing the gap between the theory and practice of SVRG, Neurips 2019.

Preprint Code Poster Proceedings

. RSN: Randomized Subspace Newton, Neurips 2019.

Preprint Poster Proceedings

. Optimal mini-batch and step sizes for SAGA, ICML 2019.

Preprint Code Proceedings Poster

SGD: general analysis and improved rates, (extended oral presentation) ICML 2019.

Preprint Proceedings

Improving SAGA via a probabilistic interpolation with gradient descent, 2018.

Preprint

Stochastic quasi-gradient methods: variance reduction via Jacobian sketching, Mathematical Programming 2020.

Preprint Code Journal

Accelerated stochastic matrix inversion: general theory and speeding up BFGS rules for faster second-order optimization, NIPS, 2018.

Preprint Code Proceedings Poster

Greedy stochastic algorithms for entropy-regularized optimal transport problems, AISTATS, 2018.

Preprint Proceedings Poster

Tracking the gradients using the Hessian: A new look at variance reducing stochastic methods, AISTATS (Oral presenation), 2018.

Preprint Code Proceedings Slides Poster

Randomized quasi-Newton updates are linearly convergent matrix inversion algorithms, SIAM Journal on Matrix Analysis and Applications, 2017.

Preprint Code Journal Slides

Linearly Convergent Randomized Iterative Methods for Computing the Pseudoinverse, 2016.

Preprint Code Slides

Sketch and Project: Randomized Iterative Methods for Linear Systems and Inverting Matrices, PhD Dissertation, School of Mathematics, The University of Edinburgh, 2016.

Preprint Code Slides

Stochastic Block BFGS: Squeezing More Curvature out of Data, ICML, 2016.

Preprint Code Proceedings Slides Poster

Stochastic dual ascent for solving linear systems, 2015.

Preprint Code

Randomized iterative methods for linear systems, SIAM Journal on Matrix Analysis and Applications, 2015.

Preprint Journal Slides Code Most downloaded on SIMAX

High order reverse automatic differentiation with emphasis on the third order, Mathematical Programming, 2014.

Preprint Journal Slides Code

Computing the sparsity pattern of Hessians using automatic differentiation, ACM Transactions on Mathematical Software, 2014.

Preprint Journal Code

A new framework for Hessian automatic differentiation, Optimization Methods and Software, 2012.

Preprint Journal Slides Code

Reports and Notes

A Very Simple Introduction to Diffusion Models and The Standard Loss Function

Notes

Train Positioning Using Video Odometry, 2014.

Report

Action constrained quasi-Newton methods, Technical Report ERGO 14-020, 2014

Report Code

Conjugate Gradients: The short and painful explanation with oblique projections

Notes

Hessian matrices via automatic differentiation, State University of Campinas technical report and Msc Thesis 2011

Report Master's thesis

Efficient calculation of derivatives through graph coloring, State University of Campinas technical report, undergraduate project 2009

Report I Report II

Recent & Upcoming Talks

ICCOPT 2019
Aug 5, 2019
Expected smoothness is the key to understanding the mini-batch complexity of stochastic gradient methods Slides

Teaching

Flatiron: Machine Learning X Science Summer School (Spring 2022)

1) Lecture slides.
2) Jupiter notebook for generating (most) of figures in slides SGD_figure.ipynb, and Gradient_momentum.ipynb
3) Exercises on convexity and smoothness, complexity and convergence rates, ridge regression and gradient descent, stochastc methods for ridge regression, SGD proof

Cornell lecture: Optimization for Machine Learning (Spring 2020)

In case you need it, here is a short probability revision.
1) Details of the course.
2) Lecture slides for SGD.
3) Exercises on stochastc methods for ridge regression (solutions) and SGD convergence proof (solutions).
4) Lecture slides for Variance Reduction.

Master IASD: AI Systems and Data Science (Fall 2019)

For prerequisites see here . For revision of vector calculus see here .
The course information can be found here
1) Slides on introduction to SGD and ERM
2) Lecture notes on probability revision
3) Exercise list on stochastc methods for ridge regression (solutions)
4) Slides on SGD and variants
5) Exercise on SGD proof (solutions)
6) Python notebook on SGD (solutions)

Telecom Paris IA317: Large scale machine learning (Fall 2019)

The course information can be found here.
For prerequisites and revision material see here.
1) Lecture notes on dimension reduction tools and sparse matrices
2) Exercise list on dimension reduction tools and sparse matrices
3) Python Notebook graded homework on dimension reduction tools and sparse matrices. Data sets needed for homework: colon-cancer, anthracyclineTaxaneChemotherapy and sector.scale.

MDI210 Optimization et Analise númeric (Fall 2020)

The complete lectures notes with examples and exercises (by Irene Charon Olivier Hudry) are here. Here are my notes and slides (WARNING: these are a work in progress!)
1) Lecture notes on numerical linear algebra
2) Lecture notes on linear and nonlinear optimization (Updated 12/10/2020)
3) Slides on Linear systems and Eigenvalues
4) Slides on Linear Programming
5) Slides on Nonlinear Linear Programming (Updated 12/10/2020)

Master2 Optimization for Data Science (Fall 2019)

For prerequisites see here . For revision of vector calculus see here .
Lecture notes on gradient descent proofs.

0) Exercises on convexity and smoothness (solutions)
1) Exercises on complexity and convergence rates (solutions)
2) Lecture I: intro to ML, convexity, smoothness and gradient descent
3) Exercises ridge regression and gradient descent (solutions)
4) Lecture II: proximal gradient methods
5) Exercises on proximal operator (solutions)
6) Lab1: Proximal gradient methods
7) Lecture III: Stochastic gradient descent
8) Exercises on stochastc methods for ridge regression (solutions)
9) Exercise on SGD proof (solutions)
10) Lecture IV: Stochastic variance reduced gradient methods
11) Exercise on variance reduction, proof of convergence of SVRG
12) Lecture V: Sampling and momentum
13) Exercise on sampling and momentum
14) Python notebook on momentum

African Masters of Machine Intelligence (AMMI) (Winter 2019)

1) Lecture I: Introduction into ML and optimization
2) Exercises on convexity, smoothness and gradient descent
3) Lecture II: proximal gradient methods
4) Exercises on proximal operator
5) Lecture III: Stochastic gradient descent
6) Exercises on stochastc methods
7) Lecture IV: Stochastic variance reduced gradient methods
8) Notes on stochastic variance reduced methods

Contact

Nerv Symbol

  • gowerrobert$@$gmail.com
  • Flatiron Institute, 162 5th Ave, New York, NY 10010, United States. Office: 411
  • email for appointment