Statistical Learning Theory is centred on finding ways in which random data can be used to approximate an unknown random variable. At the heart of the area is the following question: Let F be a class of functions defined on a probability space (\Omega,\mu) and let Y be an unknown random variable. Find some function that is (almost) as 'close' to Y as the 'best function' in F.
A crucial facet of the problem is the information one has: both Y and the underlying probability measure \mu are not known. Instead, the given data is an independent sample (X_i,Y_i)_{i=1}^N, selected according to the joint distribution of \mu and Y. One has to design a procedure that receives as input the sample (and the identity of the class F) and returns an approximating function. The success of the procedure is measured by the tradeoff between the accuracy (level of approximation) and the confidence (probability) with which that accuracy is achieved.
In the talk I explore some surprising connections the problem has with high-dimensional geometry. Specifically, I explain how geometric considerations played an instrumental role in the problem's recent solution-leading to the introduction of a prediction procedure that is optimal in a very strong sense and under minimal assumptions.