AI for literature searches, manuscript drafting, and data analysis

This article is part of weekly updates on new developments in the use of AI methods and tools of surveys (households, individuals, farms…) and administrative data for official statistics

Coverage Period: 24–30 November 2025

Key words: AI, survey research, official statistics, machine learning, data quality, household surveys, data analysis

Executive Summary

This weekly update provides a summary of recent developments in the application of Artificial Intelligence (AI) within survey research and household surveys. The report covers key areas including data editing, cleaning, processing, analysis, reporting, and dissemination, with a focus on implications for researchers and national statistical offices [1].

Key Developments This Week

This week’s developments highlight a significant and accelerating trend: the rapid integration of AI into all stages of the survey lifecycle. From automated data processing to sophisticated analytical models and new methods for dissemination, AI is reshaping the landscape of survey research. However, this rapid adoption is accompanied by growing concerns about research integrity, data quality, and the potential for misuse.

Recent surveys of research professionals reveal a dramatic increase in the use of AI tools for a variety of tasks. A global survey of over 1,100 research office staff and 1,400 researchers, conducted by Research Professional News, found that AI is now a key driver of change in the field [1, 2]. The report indicates that 57% of research office staff now consider AI a top-three change driver, a significant increase from 25% in 2023 [2].

The most common applications of AI in research offices include identifying funding opportunities (35%), editing and improving grant applications (33%), and data management (30%) [1]. Among researchers, nearly half report using AI for literature searches, manuscript drafting, and data analysis [1].

Despite the widespread adoption, a significant majority of research office staff (60%) identify AI as the greatest threat to research integrity, citing concerns about the erosion of critical thinking and the potential for overreliance on automated systems [1, 2]. This highlights a critical need for robust governance frameworks and training programs to ensure the responsible and ethical use of AI in research.

The Rise of AI in Household Survey Analysis

A new study published in Archives of Public Health demonstrates the practical application of machine learning models for analyzing large-scale household survey data [5]. Using data from the Demographic and Health Surveys (DHS) across 34 Sub-Saharan African countries, the research team successfully predicted household sanitation facility access with a high degree of accuracy. The Random Forest model proved to be the most effective, achieving an accuracy of 80.61% [5].

This research underscores the potential of machine learning to not only predict survey outcomes but also to identify the most influential factors driving those outcomes. The use of Shapley Additive Explanations (SHAP) allowed for the interpretation of feature importance, providing actionable insights for policy interventions [5]. This study serves as a valuable model for national statistical offices seeking to leverage machine learning for more granular and impactful analysis of household survey data.

LLMs Revolutionize Qualitative Data Analysis and Pose New Threats

Large Language Models (LLMs) are emerging as a transformative force in the analysis of qualitative survey data. A recent article in Scientific Reports found that LLMs consistently outperform outsourced human coders in complex textual analysis tasks, including named entity recognition and sentiment analysis [3]. This suggests that LLMs can provide a cost-effective and highly accurate solution for coding open-ended survey responses at scale.

However, the increasing sophistication of LLMs also presents a significant challenge to survey data quality. A paper in the Proceedings of the National Academy of Scienceswarns of an

“existential threat” to online survey research, demonstrating that AI agents can generate plausible and coherent survey responses that are indistinguishable from human responses and can evade current data quality checks [4]. This development necessitates a fundamental re-evaluation of online data collection methods and the development of new validation techniques.

Innovations in Data Processing, Reporting, and Dissemination

Beyond data analysis, AI is also driving innovation in the later stages of the survey lifecycle. New tools and techniques are emerging for automated data cleaning, reporting, and dissemination.

Automated Data Cleaning and Processing: AI-powered tools are now available to automate the traditionally labor-intensive tasks of data cleaning and preparation. These tools can identify and correct errors, handle missing data, and detect outliers, significantly improving the efficiency and quality of survey data processing [7].

Synthetic Data Generation: A study in JAMIA Open explores the use of generative models, such as Conditional Tabular GAN (CTGAN), to create synthetic datasets from sensitive health and demographic data [6]. This approach allows for the public dissemination of valuable data while preserving the privacy of individuals. The study found that synthetic data generated by CTGAN maintained a high degree of fidelity and utility, with no statistically significant loss in predictive performance compared to real data [6]. This has significant implications for national statistical offices seeking to broaden data access without compromising confidentiality.

Automated Reporting and Visualization: The process of generating reports and visualizations from survey data is also being transformed by AI. Visual Natural Language Interfaces (V-NLIs) allow users to query data and generate visualizations using natural language, making data exploration more intuitive and accessible [8]. Generative AI can also be used to automate the drafting of report sections, executive summaries, and narrative summaries of key findings, reducing reporting timelines by 40-60% [9].

U.S. Census Bureau Embraces AI in Business Survey

In a significant development for official statistics, the U.S. Census Bureau has added new questions on artificial intelligence to its Business Trends and Outlook Survey (BTOS) [10]. This large-scale, biweekly survey of 1.2 million businesses will now collect data on AI adoption and its impact on the economy. The inclusion of these questions signals the growing recognition of AI’s importance in business operations and the need for official statistics to track its impact.

Conclusion

The developments from the past week demonstrate a clear and accelerating trend of AI integration across the entire survey research lifecycle. While the potential for efficiency gains and deeper insights is immense, the field must also grapple with the significant challenges posed by AI, particularly in the areas of research integrity and data quality. For researchers and statistical offices, the key will be to embrace these new technologies while simultaneously developing the necessary governance frameworks, validation techniques, and ethical guidelines to ensure the continued production of high-quality, reliable data.

References

[1] Anadolu Agency. (2025, November 20). AI use surges in global research offices but staff warn it poses major integrity risks. https://www.aa.com.tr/en/science-technology/ai-use-surges-in-global-research-offices-but-staff-warn-it-poses-major-integrity-risks/3749193

[2] Clarivate. (2025, November 20). Research Offices of the Future: Key findings from the 2025 report. https://clarivate.com/academia-government/blog/research-offices-of-the-future-key-findings-from-the-2025-report/

[3] Bermejo, V. J., Gago, A., Gálvez, R. H., & Harari, N. (2025). LLMs outperform outsourced human coders on complex textual analysis. Scientific Reports, 15(1), 40122. https://www.nature.com/articles/s41598-025-23798-y

[4] Westwood, S. J. (2025). The potential existential threat of large language models to online survey research. Proceedings of the National Academy of Sciences, 122(47), e2518075122. https://www.pnas.org/doi/10.1073/pnas.2518075122

[5] Yitageasu, G., Alemu, E. A., Worede, E. A., Tigabie, M., & Demoze, L. (2025). Machine learning-based prediction of household sanitation facility access in Sub-Saharan Africa: insights from DHS data (2012–2024). Archives of Public Health, 83(1), 1-13. https://archpublichealth.biomedcentral.com/articles/10.1186/s13690-025-01780-4

[6] Mwigereri, D. G., Kamotho, N. T., Waljee, A. K., Rego, R. T., Weinheimer-Haus, E. M., Alarakhiya, F., … & Siwo, G. H. (2025). Synthetic data generation of health and demographic surveillance systems data: a case study in a low-and middle-income country. JAMIA Open, 8(6), ooaf137. https://pmc.ncbi.nlm.nih.gov/articles/PMC12628187/

[7] Bowen, J. (2025, November 18). How does AI assist in qualitative data analysis? CoLoop. https://www.coloop.ai/blog/how-ai-assists-qualitative-data-analysis

[8] Szudejko, M. (2025, November 21). Natural Language Visualization and the Future of Data Analysis and Presentation. Towards Data Science. https://towardsdatascience.com/natural-language-visualization-and-the-future-of-data-analysis-and-presentation/

[9] EvalCommunity Academy. (2025, November 18). Generative AI in Evaluation. https://academy.evalcommunity.com/generative-ai-in-evaluation/

[10] U.S. Census Bureau. (2025, November 20). Business Trends and Outlook Survey Data Release — November 20, 2025. https://www.census.gov/newsroom/press-releases/2025/btos-nov-20.html

Contact: bakodramane@gmail.com