Assessing Progress: Detecting Polarity of Sustainable Development Goals in News Text

TLDR: This research introduces SDG polarity detection, a new task to determine if news text indicates positive, neutral, or negative progress towards specific Sustainable Development Goals (SDGs). The study presents SDG-POD, a benchmark dataset combining human and LLM-generated annotations. It evaluates six LLMs, finding that fine-tuned models, especially QWQ-32B, perform better than zero-shot counterparts, particularly when augmented with synthetic data. The task remains challenging but fine-tuning significantly reduces critical misclassifications, offering valuable tools for sustainability monitoring.

The United Nations’ Sustainable Development Goals (SDGs) provide a crucial framework for addressing global challenges related to society, environment, and economy. While natural language processing (NLP) and large language models (LLMs) have made it easier to classify text based on its relevance to specific SDGs, understanding the direction of this relevance—whether the impact described is positive, neutral, or negative—has remained a significant challenge.

This research introduces a new task called SDG polarity detection. This task aims to determine if a text segment indicates progress towards a specific SDG or conveys an intention to achieve such progress. For example, a text discussing a new policy to reduce hunger would be positive for SDG 2 (“Zero Hunger”), while one describing an emerging famine crisis would be negative, even though both relate to the same SDG.

To support research in this area, the authors developed SDG-POD, a new benchmark dataset. This dataset combines original and synthetically generated data, specifically designed for the SDG polarity detection task. It includes 6,400 texts, each annotated with a polarity label (positive, neutral, or negative) for a given SDG. The training set of SDG-POD was automatically labeled using a majority voting system from five different LLMs, while the test set was meticulously annotated by human experts.

The study performed a comprehensive evaluation using six state-of-the-art large LLMs, testing them in both zero-shot (without specific training) and fine-tuned configurations. The results indicate that this task remains challenging for current LLMs. However, some fine-tuned models, particularly QWQ-32B, showed good performance, especially for specific SDGs like SDG-9 (Industry, Innovation and Infrastructure), SDG-12 (Responsible Consumption and Production), and SDG-15 (Life on Land).

A key finding was that augmenting the fine-tuning dataset with synthetically generated examples significantly improved model performance. This highlights the effectiveness of data enrichment techniques in domains where annotated data is scarce. The researchers used an innovative method to generate synthetic data, integrating outputs from multiple LLMs and applying a majority voting strategy to ensure reliability.

The paper emphasizes that SDG polarity detection is distinct from traditional sentiment analysis. While sentiment analysis focuses on the emotional tone of a text, polarity detection evaluates the actual impact of described actions in relation to sustainability targets. A text could have a positive sentiment but convey negative polarity regarding an SDG, or vice versa.

The evaluation also revealed that fine-tuned models not only achieved higher overall performance but also demonstrated greater robustness. They significantly reduced “critical errors,” such as incorrectly predicting negative labels as positive and vice versa, which are more severe than confusing them with a neutral class. This improvement was particularly evident when using error-weighted F1 metrics, which assign heavier penalties to these critical misclassifications.

This work advances the methodological toolkit for monitoring sustainability efforts and offers practical insights for developing efficient, high-performing polarity detection systems. The complete codebase for the experiments has been made publicly available to ensure reproducibility and encourage further community research. For more details, you can read the full research paper here.

Also Read:

Future research aims to expand the benchmark to include multilingual data and explore real-world deployment settings for policy monitoring, media analysis, and decision support in the sustainability domain.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Assessing Progress: Detecting Polarity of Sustainable Development Goals in News Text

Gen AI News and Updates

Vesl AI Recognized for AI Infrastructure Innovation with ASOCIO Digital Summit Award

Avalara Secures $500 Million Investment from BlackRock to Propel AI-Powered Tax Automation

AT&T Unleashes Agentic AI Across Business Operations for Enhanced Efficiency and Innovation

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates