TLDR: The World-POI research introduces a new global Point-of-Interest (POI) dataset that integrates Foursquare and OpenStreetMap data. It combines Foursquare’s verified business listings with OSM’s rich, user-contributed metadata. The methodology uses spatial proximity and name similarity to create high-confidence matches, resulting in a cleaner, more accurate dataset available in both tabular and graph formats. This enhanced dataset is validated against external sources and shown to provide a realistic representation of human activity, making it valuable for urban analytics, mobility modeling, and geographic knowledge graph construction.
A new research paper introduces World-POI, a comprehensive global dataset designed to enhance our understanding of real-world locations and human activity. This innovative dataset integrates information from two major sources: Foursquare, known for its verified business listings, and OpenStreetMap (OSM), celebrated for its rich, community-contributed metadata.
Existing Point-of-Interest (POI) datasets often present a trade-off between extensive spatial coverage, detailed semantic information, and cost. Commercial options like Google Places API offer broad coverage and verified listings but come with significant licensing fees. Free alternatives such as Foursquare and OSM are accessible but can suffer from inconsistencies, incompleteness, or a lack of formal business verification.
World-POI addresses these limitations by merging the strengths of both Foursquare and OSM. The methodology involves a sophisticated record linkage process that computes name similarity scores and spatial distances between POIs from both sources. This approach identifies and retains high-confidence matches, ensuring that the integrated data corresponds to actual businesses and locations, while filtering out noise and inaccuracies.
How World-POI is Constructed
The creation of the World-POI dataset follows a meticulous multi-step pipeline. Initially, POI data is collected from Foursquare’s cloud storage and the official OpenStreetMap website. This raw data undergoes rigorous preprocessing, including cleaning, harmonizing attributes, and resolving inconsistencies. Both datasets are then imported into a PostgreSQL/PostGIS database, which supports advanced spatial operations.
Spatial indexing is applied to optimize query performance, followed by a crucial spatial join. For each Foursquare record, the nearest OSM feature within a 50-meter radius is identified. This threshold balances precision and recall, ensuring accurate linkages while managing the dataset size. Subsequently, name similarity between matched POI pairs is calculated using both trigram-based similarity and the Levenshtein distance metric. Only pairs exceeding a Levenshtein name-similarity threshold of 0.5 are retained, forming a high-confidence integrated dataset.
The final World-POI dataset is released in two primary formats: tabular and graph-based. The tabular format provides detailed spatial, semantic, and contextual metadata from both Foursquare and OSM, including unique identifiers, names, coordinates, addresses, categories, and contact details. The graph-based representation models each POI as a node connected to its N nearest neighbors based on geographic proximity, with edge weights corresponding to spatial distances. This dual format supports diverse analytical workflows, from geographic knowledge graph construction to spatial clustering and network analysis.
Validation and Accuracy
To ensure the quality and correctness of the integration, World-POI underwent extensive technical validation. A visual comparison with city population distributions demonstrated a strong spatial correlation, indicating that World-POI provides a realistic representation of human activity patterns in populated regions. The dataset effectively removes noise by retaining only POIs present in both Foursquare and OSM that meet the similarity criteria.
A focused case study in Greenland further highlighted the dataset’s accuracy. While Foursquare POIs appeared sparsely scattered and OSM POIs showed dense clustering in uninhabited areas, the World-POI subset for Greenland, filtered by the Levenshtein name-similarity score, closely aligned with known populated coastal regions. Manual validation of samples from Foursquare, OSM, and World-POI confirmed these findings, with World-POI consistently yielding high-confidence, real venues with accurate names and coordinates, significantly outperforming the individual source datasets in terms of verifiable real-world locations.
Also Read:
- Bridging the Gap: A Multi-Agent AI System Translates Natural Language into Spatial SQL
- CMOMgen: Automating Complex Ontology Alignment with Pattern-Guided AI
Applications of World-POI
The World-POI dataset is a robust and versatile resource for researchers and practitioners across various domains:
- Geographic Knowledge Graph Construction: Provides a foundation for building knowledge graphs that link semantically and spatially related POIs.
- Location-Based Clustering and Classification: Enables the identification of spatial clusters of economic activity, cultural landmarks, or service facilities.
- Semantic Enrichment of Commercial POI Data: Allows cross-linking business listings with OSM attributes to improve data completeness.
- Urban and Regional Planning Analysis: Useful for assessing spatial accessibility, infrastructure distribution, and land-use balance.
- Mobility and Human Behavior Modeling: Supports travel demand forecasting and agent-based simulations by accurately reflecting real-world activity centers.
- Data Quality Assessment and Benchmarking: Serves as a benchmark for evaluating geocoding accuracy and record linkage algorithms.
- Network and Graph-Based Analyses: Facilitates graph-theoretic analyses to reveal patterns of urban connectivity and identify local hubs of activity.
The World-POI dataset and its accompanying codebase are publicly available, offering a reproducible design and global coverage for both methodological research and applied spatial analysis. For more details, you can refer to the full research paper: World-POI: Global Point-of-Interest Data Enriched from Foursquare and OpenStreetMap as Tabular and Graph Data.


