The field of data science is expanding and changing as new technologies and methods crop up for data scientists.
Those familiar with point processes know that they map collections of data points, or events, that occur over time. A collection of random variables that show the evolution of a given system over time is called a stochastic point process.
Vincent Granville, data scientist and author of Stochastic Processes and Simulations, introduces new point processes for data scientists. He discusses their real-world applications, as well as the best ways to teach them to aspiring data scientists.
Granville elaborates on the practical value of stochastic processes, both in the present and future, in this Q&A.
Editor's note: The following interview was edited for length and clarity.
What is the future trajectory for these new point processes? Will they be more widely used? If so, how?
Vincent Granville: There are many recent papers focusing on applications for cellular network optimization. The most obvious application that comes to mind, with future potential, is IoT: optimizing the locations, numbers and strength of sensor devices to provide maximum coverage at the lowest possible cost and deliver optimum data or services.
These processes could also be used in quantitative finance to model commodity price movements. One of the features shared both by the theoretical model and real-life price fluctuations is that they are random yet constrained by some underlying rigid structures -- or a 'lattice' -- so the distributions can't be arbitrary. These processes could also be used to model cluster structures in traditional machine learning settings -- i.e., supervised or unsupervised learning -- covering a richer class of potential patterns.
For instance, one of the examples in my book features something that is technically a 2D Brownian motion, or random motion of particles after they collide with other molecules. Yet, it exhibits an unusually strong cluster structure. Anyone working with Brownian motions, such as physicists or Wall Street quantitative analysts, would benefit from exploring such structures.
Can universities create more advanced college-level courses to cover this content in a way that current courses usually don't? What type of information or training needs to be included?
Granville: Typical academic courses on stochastic processes require a strong foundation in measure theory. This doesn't need to be the case. A less theoretical presentation -- in my case, without any reference to measure theory -- with emphasis on applications would attract a much larger audience. It would leave students feeling they learned something that's useful and much more accessible than it's made out to be.
Still, a standard, one-semester course on statistics is one of the requirements. However, more advanced topics, such as characteristic functions or a mathematical proof of the central limit theorem, are not needed to understand most of the material. The more advanced topics should be included but possibly as optional reading for students interested in pursuing the topic to the next level or attracted by the theoretical aspects.
The practical uses of stochastic point processes include engineering, cellular networks, sensor data, etc. Do you know of concrete examples of this in IoT or elsewhere?
Granville: The distribution, or physical locations, of sensor devices across a large area -- say, to capture weather or pollution data -- requires optimization to minimize the number of devices and determine how sensitive or powerful they must be. The purpose is to get as much good data as possible, at the lowest possible cost.
The same applies to cell towers or satellites. In practice, these devices are located somewhat randomly, as there are restrictions on where you can build cell towers, but as close as possible to some lattice vertices. Typically, hexagonal lattices are used. You want to optimize coverage, but you know that you cannot use the perfect locations, which, in this case, would be the exact locations of these vertices. In practice, the locations are a little perturbed, [meaning] further away from the ideal ones in some random way. This is why the name 'perturbed lattice point process' has gained popularity.
As you saw in this example, a 'point' was the location of a cell tower or sensor device.