Chest ImaGenome Dataset for Clinical Reasoning

文章来源：[NeurIPS2021 Datasets and Benchmarks]

Keywords：

本文提出的问题以及解决方案：

Methods

Silver Dataset Construction

使用atlas-based的方法提取bbox来构造anatomies；使用rule-based的文本分析方法来关联与anatomies相关的CXR attributes (finding, diseases, technical assessment, devices）。
本文的目标是不仅局部标记与CXR图像上的关键解剖位置相关的属性，而且能从大量CXR报告中提取记录的放射学知识，以辅助未来语义驱动和多模态临床推理工作。

在构造Chest ImaGenome dataset时，做出了两个关键假设：

CXR imaging observations可以被标准化为可视化的anatomical location（object nodes）和位置包含的abnormalities、devices或其他CXR descriptions（attribute nodes）之间的关系。因此，检测到的目标的种类受到图像和报告中解剖位置检测粒度的限制。
提取的比较关系旨在允许对不同CXR解剖结构的疾病进展进行纵向建模。

上图为一个radiology knowledge graph，包括检查适应症的患者病史（橙色）、解剖位置（蓝色）及其相关属性，包括anatomical finding（粉色）、diseases（黄色）、technical assessment（紫色）和devices（绿色）节点。蓝色anatomy nodes (objects) 在CXR图像上也具有相应的边界框坐标。

Gold Standard Dataset Collection

A) the object-to-attribute relations (i.e., CXR knowledge graph) extracted from individual reports
B) the object-to-object comparison relations extracted between sequential CXR reports
C) the anatomical location detection (i.e., the bounding box extraction pipeline) for the CXR images

Data description

Chest ImaGenome dataset包含两个方向，一是用于自动生成的场景图（“silver_dataset”），242, 072 scene graphs 自动生成来自于 217013 unique CXR studies。另一个用于手动验证和校正的500个唯一患者子集（“gold_dataset）。每个报告中大约有7个anatomical objects和5个attributes。

anatomicalfinding：在短语分组中一些主观解剖学发现用于提取标签。
disease：更具诊断水平的描述，通常需要图像之外的患者信息，并且对阅读放射科医生的推断/印象最为主观。
technicalassessment：影响CXR观测解释的图像质量问题。
texture：仅存在于‘texture-cues’字段中，保留了一组高度非特异性的属性（如不透明度、透明度、间隙、空域），这些属性往往会形成放射科医生在图像中观察到的最客观的初始描述。

Gold Standard Dataset Tables

1、gold_attributes_relations_500pts_500studies1st.txt
21,594 object-to-attribute relations manually annotated for 3,042 sentences from the first CXR study for 500 unique patients.
对应的notebook：‘object-attribute-relation_evaluation.ipynb’。
2、gold_comparison_relations_500pts_500studies2nd.txt
5,156 object-object (per attribute) comparison relations for 638 sentences from the second CXR study for the same 500 unique patients.
对应的notebook：‘object-object-comparison-relation_evaluation.ipynb’。
3、four bbox_coordinate_annotations.csv
manually annotated bounding box coordinates for the objects on the corresponding 1,000 unique CXR images.
对应的notebook为：‘object- bbox-coordinates_evaluation.ipynb’。

4、final_merging_report_and_bbox_ground_truth.ipynb combines the manual text and anatomical bbox annotations as gold_object_attribute_with_coordinates.txt and gold_object_comparison_with_coordinates.txt

Chest ImaGenome可以应用到的下游任务

Task 1: Change between sequential CXR exams

通过同一病人的时序（纵向）CXR图像，基于两次连续的CXR检查自动评估疾病随时间的变化。
change relations in the ’left lung’ and ’right lung’ objects that are related to the ‘pulmonary edema/hazy opacity’ and ‘fluid overload/heart failure’ attributes。关注“左肺”和“右肺”中与“肺水肿/朦胧混浊”和“液体过载/心力衰竭”属性相关的目标。训练、验证和测试数据中标记的实例数量分别为10515、1493和2987。
文章中设计了一个孪生（siamese）结构，先提取定位图像的bbox，使用预训练好的ResNet101（NIH, CheXpert, and MIMIC datasets）自动编码提取到的image patches。预测两次连续检查之间局部解剖结果的变化，准确率为75.3%。

Task 2: Localization of CXR attributes

了解CXR图像上非特异性发现/属性的解剖位置（anatomical location of non-specific findings/attribute）有助于缩小可能的疾病诊断范围，并指导下一步要求更具体的成像检查或治疗。
18个anatomical locations和9个公共的CXR attributes。 9个attributes为lung opacity, pleural effusion, atelectasis, enlarged cardiac silhouette, pulmonary edema/hazy opacity, pneumothorax, consolidation, fluid overload/heart failure, pneumonia。