Lay abstract for Stat article: The development of a mobile app-focused deduplication strategy for the Apple Heart Study that informs recommendations for future digital trials

Each week, we publish lay abstracts of new articles from our prestigious portfolio of journals in statistics. The aim is to highlight the latest research to a broader audience in an accessible format.
 
The article featured today is from Stat with the full article now available to read here.  (OPEN ACCESS)
 
Garcia, ALee, JBalasubramanian, V, et al. The development of a mobile app-focused deduplication strategy for the Apple Heart Study that informs recommendations for future digital trialsStat2022;e470. Accepted Author Manuscript. https://doi.org/10.1002/sta4.470
 
The Apple Heart Study (AHS) was a pragmatic digital clinical trial (DCT) that employed an app to recruit participants. The digital nature of AHS proved to be advantageous on many fronts and enabled the recruitment of over 400,000 participants in only 8 months. However, it also led to some data management challenges. Specifically, there were occurrences where the same individual was represented under multiple participant identifiers. This phenomenon, referred to as duplicated records, can be problematic, particularly in our case where there was interest in determining how many individuals among those enrolled were alerted of an irregular heart rate. For this purpose, it was critical (1) to understand how many participants were enrolled, and (2) to link records over time to the same individual. To achieve goals, a deduplication algorithm was developed while the trial was ongoing. The algorithm involved scoring the differences in demographic information for a given pair of records. To reduce the computational time of comparing all pairs of records, the algorithm was applied to those pairs of records that were considered at high risk of being duplicated. Re-sampling methods were then used to derive and validate a decision rule to classify a pair of records as duplicated or not. Overall, the algorithm yielded a high positive predictive value of 96% and identified a total of 4% of records as duplicated. Details of the algorithm are described along with principled guidelines for future digital trials. In addition, the code for implementing the algorithm along with annotation is provided so that other study teams can easily adopt these methods for similar challenges.  
 
More Details