TLDR: A new study demonstrates that Large Language Models (LLMs) can accurately and cost-effectively geolocate historical land grants from colonial Virginia. The best LLM achieved a mean error of 23 km, outperforming human GIS analysts and traditional geoparsers, and was significantly faster and cheaper. This research highlights LLMs’ potential to transform historical spatial analysis, enabling large-scale mapping of ancient land records.
A groundbreaking study has explored the potential of large language models (LLMs) to accurately pinpoint the locations of historical land grants from seventeenth- and eighteenth-century colonial Virginia. These ancient land records, primarily surviving as narrative descriptions of boundaries (known as metes-and-bounds), have long posed a challenge for historians and archaeologists seeking to visualize early settlement patterns and land ownership using modern Geographic Information System (GIS) tools.
Traditionally, converting these prose descriptions into precise latitude and longitude coordinates is an incredibly labor-intensive task. Even professional GIS analysts can spend hours on a single grant, grappling with archaic place-names, inconsistent spellings, and low-resolution boundary details. This manual effort has severely limited large-scale spatial analysis of colonial history.
The research, titled Benchmarking Large Language Models for Geolocating Colonial Virginia Land Grants, aimed to systematically evaluate whether modern LLMs could automate this complex process accurately and cost-effectively. The study utilized a digitized collection of 5,471 Virginia patent abstracts from 1695 to 1732, with 43 rigorously verified test cases serving as a benchmark.
How the Models Were Tested
Six OpenAI models, spanning different architectures (o-series, GPT-4-class, and GPT-3.5), were put to the test. They operated under two main approaches: directly predicting coordinates from the text, and a “tool-augmented” method where models could use external geocoding APIs (like Google Geocoding) and a centroid calculation tool to refine their predictions. The results were then compared against a baseline established by a professional GIS analyst, as well as other automated geoparsing systems like Stanford NER and Mordecai-3, and a simple county-centroid heuristic.
Key Findings: Accuracy and Efficiency
The study yielded impressive results. The top-performing single-call LLM, o3-2025-04-16, achieved a mean error of just 23 kilometers (with a median error of 14 km). This significantly outperformed the human GIS analyst baseline, which had a mean error of 71 km, and the Stanford NER geoparser at 79 km. This means the LLM was 67% more accurate than the human expert and 70% better than another leading automated system.
Further enhancing accuracy, a five-call ensemble method (where the model made five independent predictions and then clustered them) reduced the mean error even further to 19 km (median 12 km). This ensemble approach managed to place nearly 40% of its predictions within a 10 km radius of the true location.
Interestingly, the tool-augmented approach, which allowed LLMs to interact with external geocoding APIs, did not provide a measurable benefit in accuracy. In some cases, it even led to worse performance, suggesting that the models’ internal understanding of historical geography was often more reliable than external tools optimized for modern place names.
Beyond accuracy, the LLMs demonstrated a massive advantage in cost and speed. All automated methods were orders of magnitude cheaper and faster than the traditional human GIS workflow. For instance, the cost-effective gpt-4o-2024-08-06 model maintained a 28 km mean error at a mere USD 1.09 per 1,000 grants, establishing a strong cost-accuracy benchmark. This contrasts sharply with the human GIS baseline, which cost over USD 3,200 per 1,000 grants.
The speed gains were equally dramatic, with LLMs generating coordinates in seconds compared to the human analyst’s average of 502 seconds per grant.
Also Read:
- Omni: A New Approach to Matching Geospatial Data with Diverse Geometries
- Automating Insights from Oral Histories with Language Models
Implications for Historical Research
These findings highlight the immense potential of LLMs for scalable, accurate, and cost-effective historical georeferencing. The ability to quickly and precisely map thousands of historical land grants can unlock new quantitative approaches for studying colonial Virginia’s social and environmental history, including settlement patterns, plantation economies, and Indigenous land dispossession.
The study also suggests a future for “machine-assisted reading” in digital humanities, where historians can delegate repetitive data extraction tasks to AI, freeing them to focus on interpretation and analysis. While acknowledging limitations such as the specific corpus used and potential data contamination, the research paves the way for a new era of spatially enabled colonial archives worldwide.


