Layman’s Abstract for Quality and Reliability Engineering International article: An analytic journey in an industrial classification problem: How to use models to sharpen your questions

Each week, we publish layman’s abstracts of new articles from our prestigious portfolio of journals in statistics. The aim is to highlight the latest research to a broader audience in an accessible format.
 
The article featured today is from Quality and Reliability Engineering International with the full article now available to read here.
 
Kenett, RSGotwalt, CPoggi, J-MAn analytic journey in an industrial classification problem: How to use models to sharpen your questionsQual Reliab Engng Int2023116https://doi.org/10.1002/qre.3449

The paper describes a journey between questions, models and data analysis to reach specific goals. This journey is typical in industrial, engineering, biology and social science applications. It contrasts regulated clinical research where a statistical analysis plan is declared before data collection. We consider random forests, ridge regression, lasso and elastic nets. Specifically, consider a system tracked with 63 sensors. Your car is such an example. The case study used in the paper consists of data from 63 sensors collected in the testing of an electronic system. The analyst asks, in the paper, a sequence of questions and the paper shows how they were tackled by statistical analysis to meet the analysis goal. Eventually the statistical analyst was able to provide a robust parsimonious and effective model for predicting the system condition using a subset of the 63 sensors. In handling this problem, the paper develops and applies several innovative methods and insights that can prove useful in a general data analysis life cycle. The questions, the life cycle step and the analytic tools used in the case study are listed in the table below:. 

Question 

Life cycle step 

Analytic Tool 

Question 1: Is the sensor data differentiating between good and failed systems? 

(1) problem elicitation  

Multivariate T2 

Question 2: What sensors contribute to the T2 control chart 

(8) impact assessment. 

Contribution plots 

Question 3: Can we predict failure modes from sensor data? 

(6) operationalization of findings 

Bootstrap forest 

Question 4: How robust are the column contributions to alternative choices of training and validation subsets? 

(5) formulation of findings 

Sensitivity analysis of column contribution 

Question 5: Is there a subset of sensors we can focus on to adequately predict test results? 

(5) formulation of findings 

Variable clustering 

Question 6: What is the performance of penalized regression models using only Cluster 1 sensors? 

(8) impact assessment. 

Penalized regression 

Question 7: Can we have simple rules that translate sensor data into test result predictions? 

(6) operationalization of findings 

Decision tree 

 

This journey demonstrates how study goal, data, questions and analysis interact. To quote the famous mathematician and bio-scientist Sam Karlin: “The purpose of models is not to fit the data but to sharpen the question”.  https://onlinelibrary.wiley.com/doi/10.1002/9781118445112.stat08377  

More Details