Contacts

That's Life: A Predictable Sequence of States

, by Raffaella Piccarreta and Marco Bonetti
Bonetti, Piccarreta and coauthor Salford build a model aimed at predicting life courses in a paper published in Demography

Sequence analysis (SA) has become one of the most used and discussed techniques to describe life courses. For each individual the activities or states experienced during a period are tracked, and her life course is represented as a finite sequence (ordered collection) of states.

In Parametric and Nonparametric Analysis of Life Courses: An Application to Family Formation Patterns (Demography, June 2013, Volume 50, Issue 3, pp 881-902 doi: 10.1007/s13524-012-0191-z) Marco Bonetti (Department of Policy Analysis and Public Management), Raffaella Piccarreta (Dipartimento di Scienze delle Decisioni) and Gaia Salford (Santander Private Banking) focus on women's childbearing and family formation patterns (data from the Fertility and Family Surveys, FFS). For each woman the monthly activities experienced by a woman between 18 and 30 years of age are considered: leaving without a partner (N), in unmarried cohabitation (U) or married (M), and having or not at least one child (NC, UC, MC). Usually, permanence in a state is observed for a certain length of time (duration of the state): hence the sequence can be represented as the list of the visited states, v, and of their durations, t. For example, if a woman lived without a partner for 22 months, then cohabited for 27, lived as single again for 31 months, and then married and remained in that state for 64 more months, it is v = (N,U,N,M), and t = (22,27,31,64).

The authors focus on the dependence of sequences on a set of categorical covariates, to evaluate if significantly different sequences are observed for groups of women with different values of a set of socio-demographic characteristics: birth cohort, level of education (years of education received after the age of 15, in classes), religious status, and having or not separated or divorced parents.

One possible approach is to define a suitable criterion to measure the dissimilarity between sequences, cluster the trajectories, identify them by their cluster membership, and predict cluster membership using multinomial regression models. The simplification from sequences to clusters is sensible if cases in the same cluster experience very similar trajectories. In many applications, this can be achieved only by increasing the number of clusters: a trade-off exists between the necessity of obtaining a potentially large number of highly homogeneous clusters and the opportunity to build a dependent variable with few levels to avoid estimation inaccuracy in the multinomial model.

Another possibility is testing for differences between groups of sequences with different levels of one categorical variable. This can be approached using a nonparametric extension of ANOVA to the analysis of dissimilarity (ANODI) or through the nonparametric study of the full distribution of the inter-sequences dissimilarities. A limit of these approaches is that they indicate which covariates are statistically significant but do not provide a quantitative assessment of their effect.

The authors introduce a parametric state change model, aiming at describing the evolution of life courses conditional on covariates and on having observed an initial segment of the trajectory.The model, which fits within the broad family of multistate event history models, fully exploits the longitudinal nature of sequence data and uses regression models to describe the time to the next generic state transition as well as the transition probabilities, conditional on a transition occurring.

More precisely, the authors express the probability of a sequence as the combination of discrete time to event distributions and transition probabilities conditional on a transition having occurred. They study the time to the next transition as a function of past information, including one's state before the transition. Conditional on a transition occurring, they then model the probability of moving to one of the other states as a function of covariates, which include also the duration into the previous state and the age at the time of the transition. This modelling approach allows the authors to describe processes that produce recurrent events (i.e., the return to states that had been visited before).

The number of parameters in the model can be large: the authors propose ANODI to preliminarily screening the potential covariates and constructing a more parsimonious initial parametric model.

The authors also propose criteria to validate the results.

To explore whether the estimated model represents a reasonable data generating process, they use it to generate a synthetic population of life courses. The marginal distributions of some quantities of interest in the simulated data are compared to those in the observed ones.

As a further examination of the interplay between the parametric model and ANODI (as a screening procedure), the authors also use the data generated from the fitted model and perform ANODI to confirm the statistical significance of the same covariates as in the original model.