Enhancing Scientific Figure Captions with Context and Author Style

TLDR: This paper introduces a two-stage system for generating scientific figure captions that are both accurate and stylistically consistent with the author’s writing. It uses the LaMP-Cap dataset, combining context filtering and category-specific prompt optimization in the first stage, and then applies few-shot prompting with author profile figures for stylistic refinement in the second stage. Experiments show significant improvements in ROUGE-1 recall and BLEU scores, demonstrating the effectiveness of integrating contextual understanding with author-specific stylistic adaptation.

Scientific figures are essential for conveying complex information, but writing accurate and stylistically consistent captions for them can be a time-consuming and challenging task for researchers. This is where automated caption generation systems come into play, offering a promising solution to enhance scientific communication efficiency.

A recent research paper, “Leveraging Author-Specific Context for Scientific Figure Caption Generation: 3rd SciCap Challenge,” by Watcharapong Timklaypachara, Monrada Chiewhawan, Nopporn Lekuthai, and Titipat Achakulvisut, introduces an innovative system designed to generate high-quality scientific figure captions. The system focuses on integrating figure-related textual context with the unique writing styles of individual authors.

The core of their approach is a two-stage pipeline. The first stage is all about generating content-grounded captions, ensuring they are relevant and minimize irrelevant information. This stage involves several key steps:

Stage 1: Content-Grounded Caption Generation

First, the system employs a sentence-based filtering mechanism to identify and retain only the most relevant information from input paragraphs. This helps in reducing noise and focusing on the core content related to the figure.

Next, it utilizes category-level prompt optimization. Scientific papers often fall into specific domains like Computer Science, Mathematics, or Biology. The researchers developed category-focused prompt templates using advanced tools like MIPROv2 and SIMBA from the DSPy Toolkit. These tools help create instruction-example pairs and apply feedback-driven optimization to generate more precise and relevant captions for each paper category.

Finally, for papers that might span multiple categories, a caption candidate selection process is used. An advanced language model, Gemini-2.5 Flash, acts as a reranker, evaluating multiple caption candidates generated for different categories and selecting the single best one based on clarity, relevance, accuracy, and tone.

Also Read:

Stage 2: Profile-Informed Stylistic Refinement

The second stage addresses the need for personalization and precision. While the first stage ensures content accuracy, this stage refines the captions to match the author’s specific writing style. This is achieved through few-shot prompting, using ‘profile figures’ from the same paper. These profile captions serve as structural references, helping the system adapt to the author’s consistent writing style, and also enforcing a caption length limit for conciseness.

The researchers evaluated their system using various metrics like BLEU and ROUGE scores, which measure how closely generated captions match reference captions in terms of lexical overlap and structural similarity. Their experiments demonstrated significant improvements. Category-specific prompts, for instance, outperformed general approaches, boosting ROUGE-1 recall by +8.3%.

Even more impressively, the profile-informed stylistic refinement led to substantial gains, with BLEU scores improving by 40–48% and ROUGE precision by 25–27%. This indicates that the system can generate captions that are not only scientifically accurate but also stylistically faithful to the source paper, making them more consistent with the author’s overall manuscript.

This work highlights the power of combining a deep understanding of contextual information with an adaptation to author-specific writing styles. It represents a significant step forward in automating scientific communication, potentially reducing the manual burden on researchers and improving the consistency of scientific manuscripts. You can read the full paper here.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Enhancing Scientific Figure Captions with Context and Author Style

Stage 1: Content-Grounded Caption Generation

Stage 2: Profile-Informed Stylistic Refinement

Gen AI News and Updates

Vesl AI Recognized for AI Infrastructure Innovation with ASOCIO Digital Summit Award

AT&T Unleashes Agentic AI Across Business Operations for Enhanced Efficiency and Innovation

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates