Each week, we will be publishing layman’s abstracts of new articles from our prestigious portfolio of journals in statistics. The aim is to highlight the latest research to a broader audience in an accessible format.
The article featured today is from Applied Stochastic Models in Business and Industry, with the full article now available to read in Early View here.
A fundamental problem of hypothesis testing with finite inventory in e‐commerce. Appl Stochastic Models Bus Ind. 2020; 1– 21. https://doi.org/10.1002/asmb.2574, , .
The paper challenges the current industry standard for A/B testing in e-commerce with limited inventory. It shows that testing with limited inventory can lead to arbitrary high false positive and false negative rates. More precisely, an experiment is investigated in which visitors to a website are randomly shown one of two versions of the website. On the website, products can be purchased that are grouped into two groups: one group with the standard goods and one with the most attractive goods, which are, however, only available in limited quantities. Data about visitors is collected: which version of the website was displayed, which and how many goods were purchased. The paper explains why standard statistical procedures that assume independent samples are not applicable for comparing which version of the website is better, i.e. leads to a higher probability for a generic customer to make a purchase. A more complex model is set up for this purpose, taking into account the limited supply of attractive goods. Under the new, more realistic model assumptions, the asymptotic distribution of the test statistics in the chi-squared test is calculated. This result is then applied to an example where the new version of a website is not better than the old one, but sells the attractive goods faster. The false positive rate of the test is then examined as a function of the number of attractive goods and it is shown that this rate can be arbitrarily close to one. Using a similar example where the new version of the website is very much better, it is demonstrated that the false negative rate can also be as close to one as desired. The bottom line is that in an experiment with shared and limited inventory, the assumptions underlying the use of the chi-squared test (and similar tests) can be violated to the extent that its application becomes meaningless.