spot_img
HomeResearch & DevelopmentEnhancing Scientific Figure Captions with Context and Author Style

Enhancing Scientific Figure Captions with Context and Author Style

TLDR: This paper introduces a two-stage system for generating scientific figure captions that are both accurate and stylistically consistent with the author’s writing. It uses the LaMP-Cap dataset, combining context filtering and category-specific prompt optimization in the first stage, and then applies few-shot prompting with author profile figures for stylistic refinement in the second stage. Experiments show significant improvements in ROUGE-1 recall and BLEU scores, demonstrating the effectiveness of integrating contextual understanding with author-specific stylistic adaptation.

Scientific figures are essential for conveying complex information, but writing accurate and stylistically consistent captions for them can be a time-consuming and challenging task for researchers. This is where automated caption generation systems come into play, offering a promising solution to enhance scientific communication efficiency.

A recent research paper, “Leveraging Author-Specific Context for Scientific Figure Caption Generation: 3rd SciCap Challenge,” by Watcharapong Timklaypachara, Monrada Chiewhawan, Nopporn Lekuthai, and Titipat Achakulvisut, introduces an innovative system designed to generate high-quality scientific figure captions. The system focuses on integrating figure-related textual context with the unique writing styles of individual authors.

The core of their approach is a two-stage pipeline. The first stage is all about generating content-grounded captions, ensuring they are relevant and minimize irrelevant information. This stage involves several key steps:

Stage 1: Content-Grounded Caption Generation

First, the system employs a sentence-based filtering mechanism to identify and retain only the most relevant information from input paragraphs. This helps in reducing noise and focusing on the core content related to the figure.

Next, it utilizes category-level prompt optimization. Scientific papers often fall into specific domains like Computer Science, Mathematics, or Biology. The researchers developed category-focused prompt templates using advanced tools like MIPROv2 and SIMBA from the DSPy Toolkit. These tools help create instruction-example pairs and apply feedback-driven optimization to generate more precise and relevant captions for each paper category.

Finally, for papers that might span multiple categories, a caption candidate selection process is used. An advanced language model, Gemini-2.5 Flash, acts as a reranker, evaluating multiple caption candidates generated for different categories and selecting the single best one based on clarity, relevance, accuracy, and tone.

Also Read:

Stage 2: Profile-Informed Stylistic Refinement

The second stage addresses the need for personalization and precision. While the first stage ensures content accuracy, this stage refines the captions to match the author’s specific writing style. This is achieved through few-shot prompting, using ‘profile figures’ from the same paper. These profile captions serve as structural references, helping the system adapt to the author’s consistent writing style, and also enforcing a caption length limit for conciseness.

The researchers evaluated their system using various metrics like BLEU and ROUGE scores, which measure how closely generated captions match reference captions in terms of lexical overlap and structural similarity. Their experiments demonstrated significant improvements. Category-specific prompts, for instance, outperformed general approaches, boosting ROUGE-1 recall by +8.3%.

Even more impressively, the profile-informed stylistic refinement led to substantial gains, with BLEU scores improving by 40–48% and ROUGE precision by 25–27%. This indicates that the system can generate captions that are not only scientifically accurate but also stylistically faithful to the source paper, making them more consistent with the author’s overall manuscript.

This work highlights the power of combining a deep understanding of contextual information with an adaptation to author-specific writing styles. It represents a significant step forward in automating scientific communication, potentially reducing the manual burden on researchers and improving the consistency of scientific manuscripts. You can read the full paper here.

Karthik Mehta
Karthik Mehtahttps://blogs.edgentiq.com
Karthik Mehta is a data journalist known for his data-rich, insightful coverage of AI news and developments. Armed with a degree in Data Science from IIT Bombay and years of newsroom experience, Karthik merges storytelling with metrics to surface deeper narratives in AI-related events. His writing cuts through hype, revealing the real-world impact of Generative AI on industries, policy, and society. You can reach him out at: [email protected]

- Advertisement -

spot_img

Gen AI News and Updates

spot_img

- Advertisement -