TLDR: This research introduces two novel methods, KEwLTM and KEwRAG, that enable large language models (LLMs) to automatically identify cancer stages from pathology reports. KEwLTM learns domain-specific rules iteratively from unannotated reports, making it ideal for data-scarce clinical settings. KEwRAG extracts and synthesizes rules from external guidelines once, providing an interpretable and auditable knowledge base. Both methods offer transparent rule sets and reduce reliance on large annotated datasets, showing promising performance in breast cancer staging while highlighting challenges like numerical reasoning for future improvement.
Cancer staging is a crucial step in determining a patient’s prognosis and guiding their treatment plan. Traditionally, this involves medical professionals manually sifting through complex, unstructured pathology reports to extract vital information. This process is time-consuming and prone to inconsistencies, highlighting a significant need for automated solutions.
Existing methods, including traditional Natural Language Processing (NLP) and machine learning (ML) techniques, often require vast amounts of annotated data for training. This dependency makes them expensive to develop, difficult to scale, and less adaptable to variations in reporting styles across different hospitals or cancer types. The emergence of large language models (LLMs) like Mixtral and Llama has opened new avenues, as these models possess broad general medical knowledge from their extensive pre-training. However, they often lack exposure to the specific nuances and varied terminology found in real-world patient pathology reports, which are typically protected under privacy regulations.
To bridge this gap, a recent study introduces two innovative Knowledge Elicitation methods designed to enable LLMs to learn and apply domain-specific rules for cancer staging directly from pathology reports. These methods aim to enhance interpretability and overcome the limitations of data dependency. You can read the full research paper here.
Knowledge Elicitation with Long-Term Memory (KEwLTM)
The first method, Knowledge Elicitation with Long-Term Memory (KEwLTM), allows LLMs to derive specific staging rules from a small number of unannotated pathology reports. What makes KEwLTM particularly valuable is its “label-free induction” process. It doesn’t require ground-truth labels or human annotations. Instead, the LLM iteratively induces and refines high-level staging rules from the content of the reports themselves, storing these rules in a persistent long-term memory. This approach is highly beneficial in clinical settings where large annotated datasets are scarce or restricted due to privacy concerns. The explicit rules generated also make the model’s decisions more transparent and understandable.
Knowledge Elicitation with Retrieval-Augmented Generation (KEwRAG)
The second method, Knowledge Elicitation with Retrieval-Augmented Generation (KEwRAG), adapts the standard RAG framework. Instead of retrieving raw text chunks for every query, KEwRAG first retrieves relevant information from external sources, such as clinical guidelines (e.g., the AJCC cancer staging manual), in a single step. It then prompts the LLM to synthesize these retrieved texts into a concise, structured set of rules. This stable set of rules is then used for all subsequent inferences, eliminating the need for repeated retrieval and providing a more coherent, auditable knowledge base that clinicians can easily review and validate.
Experimental Findings
The researchers evaluated both KEwLTM and KEwRAG using breast cancer pathology reports from The Cancer Genome Atlas (TCGA) dataset, focusing on identifying T (tumor size) and N (lymph node involvement) stages. They compared the methods against baselines like Zero-Shot Chain-of-Thought (ZSCOT) and standard Retrieval-Augmented Generation (RAG) using open-source LLMs, Mixtral-8x7B-Instruct-v0.1 and Llama3-Med42-70B.
The results showed that the effectiveness of KEwRAG and KEwLTM is somewhat dependent on the base LLM’s performance. KEwLTM tended to outperform KEwRAG when the Zero-Shot Chain-of-Thought inference was already effective for the base model. Conversely, KEwRAG achieved better performance when ZSCOT inference was less effective, suggesting it benefits more from external knowledge retrieval. A significant advantage of KEwLTM is its label-free induction process, making it suitable for environments with limited annotated data, while KEwRAG offers a more auditable knowledge base by distilling rules from guidelines upfront.
Also Read:
- ET2RAG: Boosting Language Model Accuracy with Smart Retrieval and Consensus
- MiRAGE: A New Framework for Detecting Student Misconceptions in Math
Challenges and Future Directions
Despite their promising performance, the study identified common error patterns. “Numerical Incompetence” was a prevalent issue, where LLMs struggled with precise numerical comparisons (e.g., misinterpreting tumor sizes). “Incorrect Information Extraction” also occurred, with models sometimes overlooking crucial details in reports. Future work aims to address these limitations by incorporating reinforcement-based feedback mechanisms to improve memory accuracy and reduce “hallucinations.” A key strategy to tackle numerical incompetence involves integrating external calculation tools, allowing the LLM to delegate precise computations to specialized functions, thereby enhancing reliability.
The current study focused exclusively on breast cancer reports from TCGA, limiting the direct generalizability of the findings to other cancer types or diverse clinical environments. Future research will expand the evaluation to a broader range of cancer types and datasets from multiple institutions to assess real-world applicability and robustness. The goal is to develop specialized LLM agents, each tailored to a specific cancer type, potentially collaborating to provide comprehensive cancer staging across various pathology reports.
In conclusion, these Knowledge Elicitation methods represent a significant step forward in automating cancer staging. By enabling LLMs to induce and apply domain-specific rules with minimal data and expert supervision, they pave the way for more scalable, transparent, and trustworthy AI solutions in complex clinical tasks, ultimately promoting effective human-AI collaboration in healthcare.


