Teachers and Students, a Statistically Perfect Collaboration
The article M Statistic Commands:Interpoint Distance Distribution Analysis (Stata Journal, vol. 11, n. 2, 2011), authored by M.Sc. student Pietro Tebaldi, describes a software package that allows users of the statistical program Stata to easily implement the "M statistic" method of Bonetti and Pagano. The article and the package are the result of the collaboration among Pietro Tebaldi, Marco Bonetti (Bocconi, Department of Policy Analysis and Public Management), and Marcello Pagano (Department of Biostatistics, Harvard University).
Tebaldi developed the software and the paper during his internship at Harvard University in 2010, while an M.Sc. student in Economics and Social Sciences at Bocconi University.
The M statistic method was developed by Bonetti and Pagano in 2005, and it has been the subject of a series of articles, both of theoretical and applied nature. The M statistic can be used to address one of the key questions in the study of spatial data: the search for empirical evidence that may support the hypothesis that two groups in a population have different spatial distributions. The M statistic can be used in all studies in which the spatial dimension of the variables is relevant. The technique consists of a statistical test based on a quantity constructed by using all pairwise distances (dissimilarities) between the observations in the data, combined to study their distribution. The use of all distances between the observations characterizes the method when compared to other techniques used in this setting which use a smaller number of distances (some the smaller ones, others within a range around zero), as well as to other summary measures of the distribution of the distances.
The method was originally developed for use in cluster detection in epidemiology and biosurveillance, but it can also be used in problems characterized by highly dimensional data for which a dissimilarity measure can be defined between observations. Examples of other areas of application are economics (network analysis and impact evaluation), sociology (subpopulation studies), genetics (cluster detection in DNA sequences), and demography (life course studies).
To allow for an easy application of the method it became necessary that it be implementable from within a standard statistical software, and this has been turned into an intership opportunity as 'summer visiting student' for the (then) M.Sc. student in Economic and Social Sciences Pietro Tebaldi. After working on a Stata programming project under the supervision of Bonetti in Milan, Tebaldi spent three months in 2010 within the Department of Biostatistics of Harvard University (School of Public Health). While there, he worked as a Research Assistant like Ph.D. students in Biostatistics of the department, with his own office and access to the many resources of the host institution. His programming work, directed by Pagano, produced the current statistical package, which offers Stata users the possibility of running the M test and producing the associated graphical output.
The article "M statistic commands: interpoint distance distribution analysis" contains a deion of the new commands and their use, as well as an application to breast cancer data from the state of Massachusetts. The mtest command is used in that context to explore the presence of areas with a higher than average incidence of the disease.
After graduating from his M.Sc. program from Bocconi University, Pietro Tebaldi is now continuing his studies within the Ph.D. program in Economics at Stanford University.