Jeffrey Michler, University of Arizona
"Mining Meaning: Conducting AI-Assisted Reviews of Economic Literature"
Abstract
The emergence of large language models (LLMs) such as ChatGPT presents new opportunities for streamlining academic tasks, including literature reviews. This paper evaluates whether LLMs can effectively synthesize complex and extensive bodies of economic research by focusing on the use of weather as an instrumental variable (IV). Naturally occurring variation in weather is widely used to identify causal relationships in economics. However, how weather is quantified and what it instruments for varies widely in the literature, making it a useful test case for LLM performance in understanding and classifying a single concept operationalized under heterogeneous regimes. We fine tune one of OpenAI's GPTs and assess its ability to synthesize requested information from a corpus of over 3,700 papers. Our findings suggest that LLMs struggle to parse complex concepts, particularly when there is a diversity of ways to describe that concept. They can significantly reduce researcher effort but at a non-trivial cost in accuracy. At this time, fine tuning LLMS requires careful prompt engineering and a substantial amount of human-labeled training data. Ultimately, LLMs are best viewed as collaborative tools that assist, but do not replace, domain expertise in the literature review process.
Contact person: Neda Trifkovic