Towards Fully Automated Systematic Reviews 2

Continuing on building the ultimate tool for systematic reviews. Switching to LangGraph.

While working on a new systematic review, I realized I needed a more advanced and thorough workflow, with multiple stages like search, screening, extraction, and validation. To make things smoother and more automated, I started thinking about restructuring the whole pipeline as a multi-agent system using LangGraph.

Here’s is a diagram of the top-level LangGraph workflow.

100x50

Complete Pipeline

🔍 Fetch

Given an initial search query, an LLM expands it into multiple related queries to improve recall. These queries are then used to retrieve papers from multiple sources: PubMed Central (PMC), arXiv, and Semantic Scholar.

The retrieved results are:

  • Deduplicated
  • Normalized into a consistent format

🧹 Screen

At this stage, the system applies inclusion and exclusion criteria based on the PICO framework (Population, Intervention, Comparison, Outcome).

The pipeline:

  • Generates screening criteria dynamically
  • Evaluates abstracts against these criteria
  • Produces an inclusion/exclusion decision

This automates one of the most time consuming steps in systematic reviews.

📖 Review/Extract

Each paper is processed and transformed into a structured representation. The system:

  • Extracts key information (e.g., objectives, methods, results)
  • Identifies research gaps
  • Synthesizes summaries across studies L- inks every extracted insight to a supporting quote from the original text

Here we ensure both structure and traceability of extracted knowledge.

🧪 QA

To check reliability, a final QA step validates the outputs:

  • Verifies that extracted quotes actually exist in the source text
  • Measures citation coverage (how well claims are supported by evidence)
  • Flags hallucinations

This step is critical for trustworthiness.

Flexible by Design

This system is not limited to running only as one large end-to-end workflow. The pipeline is exposed through a CLI, which gives users control over how they want to run it. To keep results organized and reusable, every run is saved in a checkpoint and the pipeline saves outputs from each stage separately.

Users can choose whether to run:

  • the full pipeline
  • only fetch
  • only screen
  • only extract
  • only synthesize
  • only qa

For example, a you may want to only retrieve papers for a new topic, resume a previous run from a checkpoint, or run QA on already extracted results without repeating the earlier steps.

🛠 Tech Stack:

  • Langraph: for multi-agent workflows
  • Pydantic: to validate at the field and model level
  • OpenAI GPT-4: to screen, and extract structured insights

🛠 Supporting Databases:

  • PMC
  • Semantic Scholar
  • ArXiv

🧑‍💻 Repo Is Loading

Current tool works as a CLI.


© 2022. Makan Farhoodi.

Powered by Hydejack v9.2.1