Skeptical Learning: Empowering Users to Clean Their Own Data for Smarter Personal Assistants

TLDR: A study evaluated Skeptical Learning (skel), an interactive machine learning approach, in a real-world setting with university students using a mobile app to track their location. Skel aims to reduce user annotation effort and improve data quality by predicting context and challenging suspicious user inputs. While facing challenges like missing data and user consistency, the study showed skel’s potential to reduce user burden, with participants rating a high percentage of machine predictions as correct, paving the way for more reliable personal assistants and longitudinal data collection.

In our increasingly digital world, personal assistants, from navigation apps to smart home devices, rely heavily on accurate user data to function effectively. This data, whether actively provided by users or passively collected from sensors, is often prone to errors and noise. Imagine your fitness tracker misinterpreting your activity or a calendar app sending notifications for the wrong location. These inaccuracies can significantly degrade the user experience and the assistant’s overall performance.

A recent study, titled Help the machine to help you: an evaluation in the wild of egocentric data cleaning via skeptical learning, delves into a novel approach called Skeptical Learning (skel) to tackle this pervasive issue. Conducted by Andrea Bontempelli, Matteo Busso, Leonardo Javier Malcotti, and Fausto Giunchiglia from the University of Trento, Italy, this research evaluates skel’s performance in real-world conditions, empowering users to refine their own contextual data.

Understanding Skeptical Learning (skel)

Skeptical Learning is designed to improve the quality of user annotations and reduce the effort required from participants in data collection. It works by having a machine learning model predict a user’s context (e.g., location) based on sensor data. If the model is uncertain about its prediction, it queries the user for the correct label. Crucially, if the model is confident that its prediction is correct but the user’s initial annotation is different, skel becomes “skeptical” and challenges the user, prompting them to revise their input. This interactive process aims to create a cleaner, more accurate dataset and train a more reliable model.

The skel algorithm operates in two main phases. Initially, a bootstrap phase collects annotations and trusts them to build a foundational model, addressing the ‘cold start’ problem where insufficient data exists. Following this, the second phase introduces the skeptical mechanism, where the model actively challenges suspicious labels. To minimize user interruption, all skeptical questions generated throughout the day are sent together in the evening, allowing users to address them in an aggregated manner.

The Study in the Wild

The researchers conducted a longitudinal study involving university students from the University of Trento over six weeks, with four weeks dedicated to data collection. Participants used the iLog mobile application on their personal Android devices, ensuring data collection reflected their daily lives without altering routines. The study specifically focused on recognizing participants’ locations, as this dimension is relatively easier for machines to recognize and reduces user burden compared to tracking multiple context dimensions.

The research protocol was structured into several parts: an intake phase for onboarding, followed by three data collection phases. In the first week, the app collected sensor data and asked participants for their location every 30 minutes. The second and third weeks introduced skeptical questions, where the model challenged suspicious labels. In the final week of data collection, participants evaluated the machine’s predictions, identifying incorrect location labels from a daily list. Sensor data, including Bluetooth, WiFi, accelerometer, activity recognition, GPS, and battery information, was continuously collected and aggregated into 30-minute feature vectors to feed the skel model.

Key Findings and Challenges

The study involved 77 participants, with 58 uploading sensor data and answers. While the average percentage of correct predicted labels rated by users was 76%, indicating skel’s potential to reduce answering effort, the direct performance comparison between skel and a non-skeptical variant (gpnever) showed no significant average advantage in this particular study. This was attributed to several factors:

User Consistency: Participants were largely consistent with their initial annotations, even when contradicted. When challenged, they confirmed their original label in 80% of cases.
Missing Data: A high fraction of missing sensor data and unanswered questions (over 50% for skeptical and evaluation questions on most days) impacted the model’s predictive power.
Study Design: The relatively short experiment period and focus on a single context question (location) might have limited the visible benefits of skel, which could be more pronounced in longer studies with multiple questions.

The research highlighted the inherent difficulties of conducting ‘in the wild’ studies with real users, including participant attrition and technical issues leading to missing data. Despite these challenges, the fact that users rated a majority of the machine’s predictions as correct underscores the potential of skel to reduce the manual annotation burden in longitudinal studies.

Also Read:

Future Directions

The authors acknowledge several limitations, such as the small and homogenous participant pool (students from one department) and the removal of active queries in this specific skel version. Future work will investigate the psychological and behavioral implications of interacting with a machine that learns and contradicts user input. Additionally, researchers plan to explore other factors influencing answer quality, improve question scheduling, assess the iLog app’s usability, and implement per-user hyperparameter tuning to adapt the model to individual user behaviors. Expanding skel to other domains and multi-modal inputs is also on the horizon.

In conclusion, this evaluation of Skeptical Learning in a real-world setting demonstrates its promise in enhancing data quality and reducing user effort in longitudinal studies. By empowering users to refine their own data, skel offers a path towards more accurate personal assistants and more reliable data collection in social sciences, even as it navigates the complexities of human-machine interaction and real-world data challenges.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Skeptical Learning: Empowering Users to Clean Their Own Data for Smarter Personal Assistants

Understanding Skeptical Learning (skel)

The Study in the Wild

Key Findings and Challenges

Future Directions

Gen AI News and Updates

Amazon Bedrock’s A2A Protocol: The Catalyst for Next-Gen Cross-Framework Multi-Agent AI Systems

AZTECH Introduces Comprehensive AI Training Series to Propel Regional Digital Transformation

Financial Sector Fortifies Against Surging AI-Powered Scams

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates