In most introductory courses on statistics the focus is on classical statistical models in which the number of unknown parameters is small relative to the sample size. If such models depend smoothly on the parameter of interest, then under regularity conditions we have that as the sample size n tends to infinity, maximum likelihood or Bayesian estimators converge at the rate √n to the parameter corresponding to the true distribution that generates the data. Moreover, asymptotically such estimators are normally distributed and efficient, in the sense that they have minimal asymptotic variance.
There are many situations in which it is natural to consider models with an unknown parameter that is very high-dimensional compared to sample size, or even infinite-dimensional. In this course we will see that in such models we usually have completely different behaviour of statistical procedures. Convergence rates are typically slower than √n, asymptotic normality is not guaranteed, and optimality of procedures can not be assessed in terms of minimal variance.
This course provides a rigorous introduction to the mathematics of high-dimensional and nonparametric statistical models. Topics that are treated include the Stein phenomenon, the bias-variance trade-off, the role of regularisation, smoothing or shrinking, minimax lower bounds for testing and estimation, adaptive estimation, and nonparametric confidence sets.
- none at this time
Students should have followed an introductory course in statistics and preferably a more advanced course on asymptotic statistics in parametric models as well. For instance, the UvA courses Stochastiek 2 and Asymptotic Statistics. Throughout the whole course we use the language and basic results of measure theoretic probability, on the level of the course Measure Theoretic Probability.
We will also use concepts and results from stochastic process theory (e.g. Brownian motion and its basic properties) and Functional Analysis (e.g. Hilbert spaces and some of their properties, basic ideas of Fourier theory). These notions will be briefly recalled when needed, but to fully appreciate the course students should have followed courses like Functional Analysis and Stochastic Processes.
Although we focus on theory in this course, it is very useful to do numerical experiments to get some feeling for what is going on. Therefore there are computer exercises scattered throughout the notes, asking you to implement procedures and experiment with them. If you are not yet familiar with the statistical package R, you should get up to speed as soon as possible. R is freely available for all the usual platforms and the internet is full of tutorials and other documentation. The canonical starting point is the official webpage www.r-project.org.
Lectures are on Thursday’s, 14:00-16:45. First lecture on February 7, in room A1.04, UvA Science Building, Science Park 904. For the latest schedule info, please consult mastermath.nl.
Exercises and assignments from the lecture notes are an integral part of the material. Every week, several students will be asked to present their solutions to the group. These mini presentations will determine a substantial part of the final grade.
Files with MNIST data for Exercise 1.3:
Student solutions to some of the exercises:
We will use lecture notes that are under construction and will be updated and corrected regularly. Please report errors and typos!
An incomplete list of additional background material:
- Giné, E. and Nickl, R. (2015). Mathematical foundations of infinite-dimensional statistical models. Cambridge University Press.
- Johnstone, I.M. (2017). Draft of Ian Johnstone’s book Gaussian estimation: Sequence and wavelet models. See http://statweb.stanford.edu/~imj/GE_08_09_17.pdf.
- Tsybakov, A. (2009). Introduction to nonparametric estimation. Springer Verlag.
What we have done so far
|Date||Material from the notes||Corresponding exercises|
|7/2||Chapter 1||1.1, 1.2, 1.3|
|14/2||Sections 2.1-2.3||2.1, 2.2|