Unlocking History: Large Language Models Georeference Colonial Virginia

TLDR: A new study demonstrates that Large Language Models (LLMs) can accurately and cost-effectively geolocate historical land grants from colonial Virginia. The best LLM achieved a mean error of 23 km, outperforming human GIS analysts and traditional geoparsers, and was significantly faster and cheaper. This research highlights LLMs’ potential to transform historical spatial analysis, enabling large-scale mapping of ancient land records.

A groundbreaking study has explored the potential of large language models (LLMs) to accurately pinpoint the locations of historical land grants from seventeenth- and eighteenth-century colonial Virginia. These ancient land records, primarily surviving as narrative descriptions of boundaries (known as metes-and-bounds), have long posed a challenge for historians and archaeologists seeking to visualize early settlement patterns and land ownership using modern Geographic Information System (GIS) tools.

Traditionally, converting these prose descriptions into precise latitude and longitude coordinates is an incredibly labor-intensive task. Even professional GIS analysts can spend hours on a single grant, grappling with archaic place-names, inconsistent spellings, and low-resolution boundary details. This manual effort has severely limited large-scale spatial analysis of colonial history.

The research, titled Benchmarking Large Language Models for Geolocating Colonial Virginia Land Grants, aimed to systematically evaluate whether modern LLMs could automate this complex process accurately and cost-effectively. The study utilized a digitized collection of 5,471 Virginia patent abstracts from 1695 to 1732, with 43 rigorously verified test cases serving as a benchmark.

How the Models Were Tested

Six OpenAI models, spanning different architectures (o-series, GPT-4-class, and GPT-3.5), were put to the test. They operated under two main approaches: directly predicting coordinates from the text, and a “tool-augmented” method where models could use external geocoding APIs (like Google Geocoding) and a centroid calculation tool to refine their predictions. The results were then compared against a baseline established by a professional GIS analyst, as well as other automated geoparsing systems like Stanford NER and Mordecai-3, and a simple county-centroid heuristic.

Key Findings: Accuracy and Efficiency

The study yielded impressive results. The top-performing single-call LLM, o3-2025-04-16, achieved a mean error of just 23 kilometers (with a median error of 14 km). This significantly outperformed the human GIS analyst baseline, which had a mean error of 71 km, and the Stanford NER geoparser at 79 km. This means the LLM was 67% more accurate than the human expert and 70% better than another leading automated system.

Further enhancing accuracy, a five-call ensemble method (where the model made five independent predictions and then clustered them) reduced the mean error even further to 19 km (median 12 km). This ensemble approach managed to place nearly 40% of its predictions within a 10 km radius of the true location.

Interestingly, the tool-augmented approach, which allowed LLMs to interact with external geocoding APIs, did not provide a measurable benefit in accuracy. In some cases, it even led to worse performance, suggesting that the models’ internal understanding of historical geography was often more reliable than external tools optimized for modern place names.

Beyond accuracy, the LLMs demonstrated a massive advantage in cost and speed. All automated methods were orders of magnitude cheaper and faster than the traditional human GIS workflow. For instance, the cost-effective gpt-4o-2024-08-06 model maintained a 28 km mean error at a mere USD 1.09 per 1,000 grants, establishing a strong cost-accuracy benchmark. This contrasts sharply with the human GIS baseline, which cost over USD 3,200 per 1,000 grants.

The speed gains were equally dramatic, with LLMs generating coordinates in seconds compared to the human analyst’s average of 502 seconds per grant.

Also Read:

Implications for Historical Research

These findings highlight the immense potential of LLMs for scalable, accurate, and cost-effective historical georeferencing. The ability to quickly and precisely map thousands of historical land grants can unlock new quantitative approaches for studying colonial Virginia’s social and environmental history, including settlement patterns, plantation economies, and Indigenous land dispossession.

The study also suggests a future for “machine-assisted reading” in digital humanities, where historians can delegate repetitive data extraction tasks to AI, freeing them to focus on interpretation and analysis. While acknowledging limitations such as the specific corpus used and potential data contamination, the research paves the way for a new era of spatially enabled colonial archives worldwide.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Unlocking History: Large Language Models Georeference Colonial Virginia

How the Models Were Tested

Key Findings: Accuracy and Efficiency

Implications for Historical Research

Gen AI News and Updates

HKU Spearheads AI Integration in Hong Kong’s Digital Education Future

UNESCO’s 43rd General Conference Concludes with New Leadership and Landmark Ethics Frameworks for Technology

BRYGE AI Secures Silver Stevie® Award for Groundbreaking Health Tech Product for Women

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates