SanoMap's Radiomics Layer closes the gap between gut microbiome evidence and imaging-derived disease markers. Radiomic features and body-composition measurements are explicit intermediate nodes — not inferred, not implied — grounded in literature-extracted direct evidence and verified quantitative figure correlations.
Every edge in the graph is grounded in direct evidence. No inferred bridge matches become asserted relationships. The imaging backbone (BodyLocation, ImagingModality, ImageRef) is now fully wired.
Each node type maps to a specific evidence source. No speculative nodes.
Taxon-specific entities from microbial NER. Examples: Fusobacterium nucleatum, Akkermansia muciniphila.
Non-taxon microbiome states: dysbiosis, alpha diversity, beta diversity, intratumoral microbiome.
IBSI-backed quantitative imaging features. Examples: glcm_entropy, first_order_kurtosis.
Imaging-derived phenotype markers: skeletal_muscle_index, visceral_adipose_tissue, sarcopenia, myosteatosis.
Disease or outcome concepts grounded in paper text. Filtered through shared span cleanup before edge promotion.
12 anatomical sites where imaging measurements are taken: liver, lung, colon, abdomen, muscle, bone, and more.
4 modalities with DICOM codes: CT/CT, MRI/MR, PET/PT, DXA/DXA.
Verified figure references from the Vision Track. Stores PMCID, figure ID, topology, and image path. Completes the chain to representative evidence.
The Vision Track uses a VLM to propose r-values from heatmap figures, then gates every proposal through a deterministic pixel-level verifier before any edge is asserted. Figures that are not continuous gradient heatmaps — or that lack a microbe entity — are correctly rejected.
Aligned with MINERVA methodology. Each stage produces validated JSONL artifacts.
src/harvest_pubmed.py
Split query profiles: strict radiomics, adjacent imaging, body-composition. 640-paper expanded corpus.
src/merge_paper_corpora.py · src/download_pmc_fulltext.py
Deduplicated merge. Full-text preferred over abstract-only for downstream NER.
src/extract_radiomics_text.py
IBSI-backed radiomics + body-composition vocabulary. Detects BodyLocation (42.8% coverage) and ImagingModality. Stopword-guarded disease detection.
src/text_ner_minerva.py
MINERVA-aligned sentence-level NER. MPS-accelerated microbe NER on Apple Silicon. BC5CDR disease NER.
src/build_relation_input.py
Joins sentence evidence with phenotype context. Threads subject_node_type for graph typing.
src/relation_extract_stage.py
Self-consistency 3-label classification (Positive / Negative / Unrelated). Hosted via Gemini 2.5 Flash-Lite. 9 clean validated microbe–disease pairs.
src/span_cleanup.py
Pre-inference entity cleanup. Normalizes genus-containing products, finding-in-disease patterns, clause fragments.
src/index_figures.py · src/propose_vision_qwen.py · src/verify_heatmap.py
VLM proposes r-values from PMC figures. Deterministic pixel-level verifier gates every proposal. Only verified figures become ImageRef nodes.
src/assemble_edges.py
Emits Neo4j-ready edge CSV, BodyLocation/ImagingModality/ImageRef nodes, and audit-only phenotype-axis candidates. Bridge hypotheses are explicitly not ingested.
All artifacts are schema-validated JSONL. The test suite runs in Conda base on Apple Silicon MPS.
| Artifact | Count | Status |
|---|---|---|
| Validated microbe–disease pairs (Gemini, self-consistency) | 9 | Graph-ready |
| Phenotype-to-disease text edges (ASSOCIATED_WITH) | 23 | Graph-ready |
| BodyLocation nodes | 12 | Graph-ready |
| ImagingModality nodes (CT, MRI, PET, DXA) | 4 | Graph-ready |
| MEASURED_AT + ACQUIRED_VIA backbone rows | 50 | Graph-ready |
| ImageRef nodes (Vision Track verified) | 1 | Graph-ready |
| Phenotype-axis candidates (audit-only) | 233 | Audit-only |
| Bridge hypotheses (audit-only, not ingested) | 232 | Audit-only |
| pytest checks passing | 156 | Green |
The initial 640-paper corpus spanned 3 query lanes (strict radiomics, adjacent imaging, body composition). Four domain-specific lanes were added: liver radiomics, bone DXA, lung CT phenotypes, and colorectal imaging.
Every edge carries a source PMID and evidence string. Text-derived edges use
ASSOCIATED_WITH;
Vision Track-verified edges use
CORRELATES_WITH.
Bridge hypotheses are excluded from graph import.
Each pair was extracted with Gemini 2.5 Flash-Lite using 7-sample self-consistency (full agreement required). Sign direction: positive (enrichment associated with disease) or negative (depletion / protective association).
| Microbe | Direction | Disease | PMID | Confidence |
|---|---|---|---|---|
| Proteobacteria | POSITIVE | Cirrhosis | 39539377 | 0.70 |
| Proteobacteria | POSITIVE | Cirrhosis | 35978666 | 0.70 |
| Peptostreptococcus stomatis | POSITIVE | Cirrhosis | 36536957 | 0.70 |
| Ruminococcus | POSITIVE | Cirrhosis | 36536957 | 0.70 |
| Lactobacillus-based probiotics | NEGATIVE | Inflammatory bowel disease | 37998334 | 0.70 |
| Bacteroidetes | NEGATIVE | Inflammatory bowel disease | 37998334 | 0.70 |
| Bifidobacterium bifidum | NEGATIVE | Obesity | 36358288 | 0.70 |
| Bifidobacterium lactis | NEGATIVE | Obesity | 36358288 | 0.70 |
| Catenibacterium | NEGATIVE | Cirrhosis | 35978666 | 0.70 |
| Actinobacteria species | NEGATIVE | Obesity | 35126309 | 0.70 |
| Dysosmobacter | NEGATIVE | Obesity | 34108237 | 0.70 |
| Lactobacillus-containing probiotic | NEGATIVE | Systemic inflammation | 33633246 | 0.70 |