This page collects the main methodological advancements developed by the BODaI-Lab research group on the topic of rating data analysis by means of statistical models in the CUB class. The original CUB (Combination of Uniform and Binomial) model was introduced in 2005 by D’Elia and Piccolo. Since then, several extensions have been proposed worldwide. The BODaI-Lab research group gave birth to NLCUB (NonLinear CUB) in 2014 and to CUM (Combination of Uniform and Multinomial) in 2022. In addition, the problem of “don’t know” responses (DK) has been considered from a methodological point of view and a novel proposal has been advanced in this context, able to treat DK as valuable information, instead of missing data as usual.

**Scientific coordinators:** Domenico Piccolo, Paola Zuccolotto, Marica Manisera, Rosaria Simone

**Researchers (editors of the webpage):** Ambra Macis, Matteo Ventura

D’Elia A., Piccolo D. (2005), A mixture model for preferences data analysis. *Computational Statistics & Data Analysis*, **49**(3), 917-934.

A mixture model for preferences data, which adequately represents the composite nature of the elicitation mechanism in ranking processes, is proposed. Both probabilistic features of the mixture distribution and inferential and computational issues arising from the maximum likelihood parameters estimation are addressed. Moreover, empirical evidence from different data sets confirming the goodness of fit of the proposed model to many real preferences data is shown.

Piccolo D., Simone R. (2019). The class of cub models: statistical foundations, inferential issues and empirical evidence. *Statistical Methods & Applications*, ** 28**(3), 389-435.

This paper discusses a general framework for the analysis of rating and preference data that is rooted on a class of mixtures of discrete random variables. These models have been extensively studied and applied in the last 15 years thanks to a flexible and parsimonious parametrization of data generating process and to prompt interpretation of results. The approach considers the final response as the combination of feeling and uncertainty, by allowing for finer model specifications to include refuge options, response styles and possible overdispersion, also in relation to subjects’ and objects’ covariates. The article establishes the state of art of the research inherent to this paradigm, in terms of methodology, inferential procedures and fitting measures, by emphasizing capabilities and limitations yet establishing new findings. In particular, explicative power and predictive performances of cub statistical models for ordinal data are examined and new topics that could boost and support the modelling of uncertainty in this framework are provided. Possible developments are outlined throughout the whole presentation and final comments conclude the paper.

**Comments to the paper**

**Rejoinder to the discussion by Domenico Piccolo & Rosaria Simone** (477-493)

CUB

FastCUB: see Simone R. (2021). An accelerated EM algorithm for mixture models with uncertainty for rating data. *Computational Statistics*, ** 36**, 691-714.

Manisera M., Zuccolotto P. (2014), Modeling rating data with Nonlinear CUB models. *Computational Statistics & Data Analysis*, **78**, 100-118.

A general statistical model for ordinal or rating data, which includes some existing approaches as special cases, is proposed. The focus is on the CUB models and a new class of models, called Nonlinear CUB, which generalize CUB. In the framework of the Nonlinear CUB models, it is possible to express a transition probability, i.e. the probability of increasing one rating point at a given step of the decision process. Transition probabilities and the related transition plots are able to describe the state of mind of the respondents about the response scale used to express judgments. Unlike classical CUB, the Nonlinear CUB models are able to model decision processes with non-constant transition probabilities.

Manisera M., Zuccolotto P. (2022), A mixture model for ordinal variables measured on semantic differential scales. *Econometrics and Statistics*, **22**, 98-123.

Subjective perceptions and attitudes are usually measured by administering questionnaires with ordered response scales. Among them, a particular case are semantic differential scales, where the respondent has to declare his/her position between two bipolar adjectives. To model ordinal variables measured on semantic differential scales, a novel model is introduced as an extension in the framework of the CUB (Combination of discrete Uniform and shifted Binomial random variables) class of models. The proposed model addresses the analysis of ordinal variables measured on semantic differential scales. However, it is definitely well suited to all the rating scales that have a middle option that means indifference between two extremes. This is a circumstance that occurs in the main part of the most commonly used Likert scales. The proposal is based on a mixture of a discrete Uniform and a – linearly transformed – Multinomial random variable, so it is called CUM. Parameter estimation is carried out using the expectation-maximization algorithm, and the parameters can be represented in a triangular space with a ternary plot. A simulation study is carried out and, finally, applications on real data are examined in order to show limits and potentialities of the proposal.

Manisera M., Zuccolotto P. (2014), Modeling “don’t know” responses in rating scales. *Pattern Recognition Letters*, **45**, 226-234.

We propose a probabilistic framework for the treatment of “don’t know” responses in surveys aimed at investigating human perceptions through expressed ratings. The rationale behind the proposal is that “don’t know” is a valid response to all extents because it informs about a specific state of mind of the respondent, and therefore, it is not correct to treat it as a missing value, as it is usually treated. The actual insightfulness of the proposed model depends on the chosen probability distributions. The required assumptions of these distributions first pertain to the expressed ratings and then to the state of mind of “don’t know” respondents toward the ratings. Regarding the former, we worked in the CUB model framework, while for the latter, we proposed using the Uniform distribution for formal and empirical reasons. We show that these two choices provide a solution that is both tractable and easy to interpret, where “don’t know” responses can be taken into account by simply adjusting one parameter in the model.

Slides of the presented talks:

- Domenico Piccolo – An Introduction to Model Based Approaches
- Julien Jacques – Clustering Longitudinal Ordinal Data
- Rosaria Simone – Tree Methods for Ordinal Data

Analysis of Rating Data in the CUB class framework – An Introduction to Model Based Approaches

by Domenico Piccolo (Introduction to the Workshop “*Statistical Methods and Models for Ordinal Data*”, University of Brescia, 25th May 2023)

With animated slides (9-11)

Analysis of Rating Data in the CUB class framework – A brief overview of methods and models proposed by the BODaI-Lab research group

- R script (script.R) and functions for NLUB and CUM (NLCUB.R, CUM.R) to run examples in the slides (zip file)
- data of the examples in the slides: customer satisfaction (CS) and opinions about distance teaching (DT) (zip file)

Carpita M., Ciavolino E., Nitti M. (2019). The MIMIC–CUB Model for the Prediction of the Economic Public Opinions in Europe. *Social Indicators Research*, **146**(1), 287-305.

To study the Europeans’ perception on the economic conditions, a model that combine Multiple Indicators Multiple Causes (MIMIC) and Combination of Uniform and shifted Binomial (CUB) is proposed. The MIMIC–CUB Model, estimated at country-level using the Partial Least Squares, specifies the influence of the economic forecast news on a latent variable named “Citizens’ perception of the European economics health state”. The survey is related, at both national and EU level, to the period 2005–2014.

Iannario M., Manisera M., Piccolo D., Zuccolotto P. (2012). Sensory analysis in the food industry as a tool for marketing decisions. *Advances in Data Analysis and classification*, **6**(4), 303-321.

In the food industry, sensory analysis can be useful to direct marketing decisions concerning not only products, for example product positioning with respect to competitors, but also market segmentation, customer relationship management, advertising strategies and price policies. In this paper we show how interesting information useful for marketing management can be obtained by combining the results from cub models and algorithmic data mining techniques (specifically, variable importance measurements from Random Forest). A case study on sensory evaluation of different varieties of Italian espresso is presented.