Towards Fully Automated Systematic Reviews 2

03 Apr 2026 in Posts / Development

Continuing on building the ultimate tool for systematic reviews. Switching to LangGraph.

While working on a new systematic review, I realized I needed a more advanced and thorough workflow, with multiple stages like search, screening, extraction, and validation. To make things smoother and more automated, I started thinking about restructuring the whole pipeline as a multi-agent system using LangGraph.

Here’s is a diagram of the top-level LangGraph workflow.

100x50

Complete Pipeline

🔍 Fetch

Given an initial search query, an LLM expands it into multiple related queries to improve recall. These queries are then used to retrieve papers from multiple sources: PubMed Central (PMC), arXiv, and Semantic Scholar.

The retrieved results are:

Deduplicated
Normalized into a consistent format

🧹 Screen

At this stage, the system applies inclusion and exclusion criteria based on the PICO framework (Population, Intervention, Comparison, Outcome).

The pipeline:

Generates screening criteria dynamically
Evaluates abstracts against these criteria
Produces an inclusion/exclusion decision

This automates one of the most time consuming steps in systematic reviews.

📖 Review/Extract

Each paper is processed and transformed into a structured representation. The system:

Extracts key information (e.g., objectives, methods, results)
Identifies research gaps
Synthesizes summaries across studies L- inks every extracted insight to a supporting quote from the original text

Here we ensure both structure and traceability of extracted knowledge.

🧪 QA

To check reliability, a final QA step validates the outputs:

Verifies that extracted quotes actually exist in the source text
Measures citation coverage (how well claims are supported by evidence)
Flags hallucinations

This step is critical for trustworthiness.

Flexible by Design

This system is not limited to running only as one large end-to-end workflow. The pipeline is exposed through a CLI, which gives users control over how they want to run it. To keep results organized and reusable, every run is saved in a checkpoint and the pipeline saves outputs from each stage separately.

Users can choose whether to run:

the full pipeline
only fetch
only screen
only extract
only synthesize
only qa

For example, a you may want to only retrieve papers for a new topic, resume a previous run from a checkpoint, or run QA on already extracted results without repeating the earlier steps.

🛠 Tech Stack:

Langraph: for multi-agent workflows
Pydantic: to validate at the field and model level
OpenAI GPT-4: to screen, and extract structured insights

🛠 Supporting Databases:

PMC
Semantic Scholar
ArXiv

🧑‍💻 Repo Is Loading

Current tool works as a CLI.

Towards Fully Automated Systematic Reviews 2

Complete Pipeline

Flexible by Design

🛠 Tech Stack:

🛠 Supporting Databases:

🧑‍💻 Repo Is Loading

Makan Farhoodi

Error

Complete Pipeline

Flexible by Design

🛠 Tech Stack:

🛠 Supporting Databases:

🧑‍💻 Repo Is Loading

Templates (for web app):

Error