Title
题目
Deep learning for diferential diagnosisof malignant hepatic tumors based on multi-phase contrast-enhanced CT and clinical data
基于多相增强 CT 和临床数据的恶性肝肿瘤鉴别诊断深度学习
Abstract
摘要
Liver cancer remains the leading cause of cancer death globally, and the treatment strategies are dis
tinct for each type of malignant hepatic tumors. However, the diferential diagnosis before surgery is challenging and subjective. This study aims to build an automatic diagnostic model for diferentiating malignant hepatic tumors based on patients’ multimodal medical data including multi-phase contrast-enhanced computed tomography and clinical
features.
肝癌仍然是全球癌症死亡的主要原因,每种恶性肝肿瘤的治疗策略都不同。然而,手术前的鉴别诊断具有挑战性且主观性强。本研究旨在基于患者的多模态医疗数据构建自动诊断模型,以鉴别不同的恶性肝肿瘤,这些医疗数据包括多相增强计算机断层扫描和临床特征。
Conclusions
结论
We incorporated deep CNN and gated RNN in the STIC model design for diferentiating malignant
hepatic tumors based on multi-phase CECT and clinical features. Our model can assist doctors to achieve better diag nostic performance, which is expected to serve as an AI assistance system and promote the precise treatment of liver cancer.
我们在 STIC 模型设计中结合了深度 CNN 和门控 RNN,用于基于多相 CECT 和临床特征区分恶性肝肿瘤。我们的模型可以帮助医生达到更好的诊断性能,预期将作为一个 AI 辅助系统,促进肝癌的精准治疗。
Results
结果
The STIC model achieved an accuracy of 86.2% and AUC of 0.893 for classifying HCC and ICC on the test set. When extended to diferential diagnosis of malignant hepatic tumors, the STIC model achieved an accuracy of 72.6% on the test set, comparable with the diagnostic level of doctors’ consensus (70.8%). With the assistance of the STIC model, doctors achieved better performance than doctors’ consensus diagnosis, with an increase of 8.3% in accuracy and 26.9% in sensitivity for ICC diagnosis on average. On the external test set from center 2, the STIC model achieved an accuracy of 82.9%, which verify the model’s generalization ability.STIC
模型在测试集上对 HCC 和 ICC 的分类准确率达到了 86.2%,AUC 为 0.893。当扩展到恶性肝肿瘤的鉴别诊断时,STIC 模型在测试集上的准确率达到了 72.6%,与医生共识的诊断水平(70.8%)相当。在 STIC 模型的协助下,医生的表现超过了医生共识诊断,平均准确率提高了 8.3%,对 ICC 诊断的敏感性提高了 26.9%。在来自中心 2 的外部测试集上,STIC 模型的准确率达到了 82.9%,验证了模型的泛化能力。
Method
方法
Our study consisted of 723 patients from two centers, who were pathologically diagnosed with HCC, ICC or metastatic liver cancer. The training set and the test set consisted of 499 and 113 patients from center 1, respec tively. The external test set consisted of 111 patients from center 2. We proposed a deep learning model with the modular design of SpatialExtractor-TemporalEncoder-Integration-Classifer (STIC), which take the advantage of deep CNN and gated RNN to efectively extract and integrate the diagnosis-related radiological and clinical features of patients.model.
我们的研究包括来自两个中心的 723 位经病理诊断为肝细胞癌(HCC)、肝内胆管细胞癌(ICC)或转移性肝癌的患者。训练集和测试集分别由来自中心 1 的 499 位和 113 位患者组成。外部测试集由来自中心 2 的 111 位患者组成。我们提出了一个深度学习模型,该模型的模块设计为空间提取器-时间编码器-集成-分类器(STIC),它利用深度 CNN 和门控 RNN 的优势有效提取和整合患者的诊断相关放射学和临床特征。
Figure
图
Fig. 1 The fowchart of dataset setup, the architecture of the STIC model and the performance on primary malignant hepatic tumors classifcation. A This study consisted of 612 patients in method development cohort and 111 patients in external validation cohort, who were pathologically diagnosed with HCC, ICC or metastatic liver cancer. B The STIC model contains four diferent modules. SpatialExtractor module is a deep CNN that uses convolutional layers to extract detailed spatial features of CECT images. TemporalEncoder module uses gated RNN to mine the changing pattern among diferent CECT phases. In the Integration module, the TemporalEncoder module is concatenated with the vector of encoded dummy clinical variables. Finally, in the Classifer module, the Integration output is passed through the softmax activation function to implement the classifcation task. C The ROC curves of fve-fold cross-validation of the STIC model for classifying benign and malignant hepatic tumors in the preliminary study, where the mean ROC curve was obtained by interpolation of the ROC curves of each fold, with mean AUC of 0.987. D Comparison of the performance for diferencing HCC and ICC on the test set by ROC curve analysis. The AUC of the STIC model was 0.893 (95% CIs, 0.803–0.982), which was much higher than 0.709 (95% CIs, 0.573–0.845) in the Naive RBG model and 0.766 (95% CIs, 0.644–0.888) in the Naive joint model. E Among three models, the STIC model produced the best performance in distinguishing two primary malignant hepatic tumors, with accuracy of 86.2% (95% CIs, 74.6%-93.9%), sensitivity of 0.892 (95% CIs, 0.746–0.970) and specifcity of 0.810 (95% CIs, 0.581–0.946), where sensitivity and specifcity are defned by viewing HCC as positive and ICC as negative. The error bars represent 95% CIs calculated by Wald Z Method with Continuity Correction for accuracy, sensitivity and specifcity and by DeLong method for AUC. F Using McNemar’s Chi-squared test, the STIC model outperformed the Naive RBG model with an increase of 25.9% (95% CIs 11.0%-40.7%, p value=0.001) in accuracy and 0.270 (95% CIs 0.082–0.459,pvalue=0.009) in sensitivity. It also outperformed the Naive joint model with an increase of 17.2% (95% CIs 3.7%-30.8%, p value=0.016) in accuracy and 0.189 (95% CIs 0.015–0.363, p value=0.046) in sensitivity. G The distribution of the predicted score for HCC and ICC according to three models. For two benchmark models, the score predicted had much wider distribution. Our proposed STIC model had a more concentrated distribution of predicted scores for both HCC and ICC. H Comparison of the performance of the STIC model and two benchmark models using diferent extractor’s backbone for binary classifcation of primary malignant hepatic tumors. Using Cochran’s Q test, there were no signifcant diferences in the diagnostic level among STIC models with diferent extractor’s backbone. For Naïve RGB models with diferent extractor’s backbone, there were signifcant diferences in sensitivity (p value<0.001) and specifcity (p value=0.012). For Naïve joint models with diferent extractor’s backbone,there were also signifcant diferences in sensitivity (p value<0.001) and specifcity (p value<0.001)
图 1 数据集设置的流程图、STIC模型的架构以及在原发性恶性肝肿瘤分类上的性能。 A 本研究包括方法开发队列中的612位患者和外部验证队列中的111位患者,这些患者被病理诊断为HCC、ICC或转移性肝癌。B STIC模型包含四个不同的模块。空间提取器模块是一个深度CNN,使用卷积层提取CECT图像的详细空间特征。时间编码器模块使用门控RNN挖掘不同CECT阶段之间的变化模式。在集成模块中,时间编码器模块与编码的虚拟临床变量向量连接。最后,在分类器模块中,集成输出通过softmax激活函数传递以实现分类任务。C STIC模型在初步研究中对良性和恶性肝肿瘤进行分类的五折交叉验证的ROC曲线,其中平均ROC曲线通过插值每一折的ROC曲线获得,平均AUC为0.987。D 通过ROC曲线分析比较在测试集上区分HCC和ICC的性能。STIC模型的AUC为0.893(95% CI,0.803–0.982),远高于Naive RBG模型的0.709(95% CI,0.573–0.845)和Naive联合模型的0.766(95% CI,0.644–0.888)。E 在三个模型中,STIC模型在区分两种原发性恶性肝肿瘤方面产生了最佳性能,准确率为86.2%(95% CI,74.6%-93.9%),敏感性为0.892(95% CI,0.746–0.970)和特异性为0.810(95% CI,0.581–0.946),其中敏感性和特异性是将HCC视为阳性,ICC视为阴性定义的。误差条代表准确性、敏感性和特异性的95% CI由Wald Z方法加连续性校正计算,AUC由DeLong方法计算。F 使用McNemar的卡方检验,STIC模型的性能超过了Naive RBG模型,在准确性上增加了25.9%(95% CI 11.0%-40.7%,p值=0.001)和在敏感性上增加了0.270(95% CI 0.082–0.459,p值=0.009)。它也超过了Naive联合模型,在准确性上增加了17.2%(95% CI 3.7%-30.8%,p值=0.016)和在敏感性上增加了0.189(95% CI 0.015–0.363,p值=0.046)。G 根据三个模型对HCC和ICC的预测分数分布。对于两个基准模型,预测分数有更广的分布。我们提出的STIC模型对HCC和ICC的预测分数分布更集中。H 使用不同提取器骨架对原发性恶性肝肿瘤进行二元分类的STIC模型和两个基准模型的性能比较。使用Cochran的Q检验,STIC模型与不同提取器骨架之间在诊断水平上没有显著差异。对于具有不同提取器骨架的Naïve RGB模型,敏感性(p值
Fig. 2 Model’s performance on the multinomial classifcation of malignant hepatic tumors A Micro-average and macro-average ROC curves of the STIC model for diferentiating HCC, ICC and metastasis on the test set. B The ROC curves of the STIC model for HCC, ICC, metastasis diagnosis on the test set and corresponding diagnosis points of doctors’ consensus and three STIC-assisted doctors. The orange star represents the diagnostic performance of doctors’ consensus. Three triangles with diferent colors represent the diagnostic performance of three STIC-assisted doctors, respectively, and the red pentagon represents the average diagnostic level of these three doctors. For the ICC diagnosis, the performance of doctors’ consensus diagnosis was below the ROC curve of the STIC model, and the performances of three STIC-assisted doctors were all above the ROC curve. C The total accuracy of the STIC model was 72.6% (95% CIs, 63.4%-80.5%), and the total accuracy of the doctors’ consensus was 70.8% (95% CIs, 61.5%-79.0%). Three STIC-assisted doctors achieved the total accuracy of 77.0% (95% CIs, 68.1%-84.4%), 78.8% (95% CIs, 70.1%-85.9%) and 81.4% (95% CIs, 73.0%-88.1%) on the test set, respectively. Using Cochran’s Q test, there was no signifcant diferences in the diagnostic level among three STIC-assisted doctors. When comparing the diagnostic level between three STIC-assisted doctors and doctors’ consensus diagnosis, there were signifcant diferences in sensitivity for ICC (p value=0.038). D The case study of three test samples pathologically diagnosed with ICC. For case 1, the enhancement pattern of CECT was typical, where ICC tumor showed homogeneously low attenuation on NC phase, faint peripheral enhancement on ART phase and gradual centripetal enhancement on PV phase. The diagnosis of doctors’ consensus was ICC. The output of the STIC model was {HCC: 0.067, ICC: 0.646, metastasis: 0.287}. All three STIC-assisted doctors independently diagnosed it as ICC. For case 2, the enhancement pattern of CECT was similar with the typical pattern of HCC tumor, exhibiting low attenuation on NC phase, the early peak of enhancement on ART phase, and followed by a continuous decrease in PV phase. The doctors’ consensus misdiagnosed it as HCC. The output of the STIC model was {HCC: 0.881, ICC: 0.067, metastasis: 0.052}, which also diagnosed it as HCC incorrectly. All three STIC-assisted doctors
misdiagnosed it as HCC. For case 3, there was peripheral enhancement on ART phase, but it was not obvious to the human eyes. The doctors’consensus misdiagnosed it as metastasis. The output of the STIC model was {HCC: 0.114, ICC: 0.587, metastasis: 0.299}, which diagnosed it as ICC correctly. All three STIC-assisted doctors diagnosed it as ICC correctly. E The case study of three test samples pathologically diagnosed with metastasis. For case 1, the doctors’ consensus misdiagnosed it as ICC. The output of the STIC model was {HCC: 0.031, ICC: 0.343, metastasis: 0.626}.Two STIC-assisted doctors independently diagnosed it as metastasis correctly. One STIC-assisted doctor misdiagnosed it as metastasis. For case 2,the doctors’ consensus misdiagnosed it as ICC. The output of the STIC model was {HCC: 0.306, ICC: 0.240, metastasis: 0.454}. All three STIC-assisted doctors independently diagnosed it as metastasis correctly. For case 3, the doctors’ consensus misdiagnosed it as ICC. The output of the STIC model
was {HCC: 0.173, ICC: 0.176, metastasis: 0.651}. All three STIC-assisted doctors independently diagnosed it as metastasis correctly. F The ROC curve analysis of the STIC model for HCC, ICC, metastasis diagnosis on the external test set for additional verifcation. The AUC for diagnosis of HCC, ICC and metastasis on the external test set was 0.986, 0.881 and 0.920, respectively. G Comparison of the performance of the STIC model on the test set from center 1 and on the external test set from center 2 for diferentiating malignant hepatic tumors. Using McNemar’s Chi-squared test, the STIC model’s performance has no signifcant diference on the center 1 and center 2 for the accuracy, sensitivity and specifcity of each type of malignant tumors. Using DeLong test for two ROC curves’ comparison, the STIC mode achieved signifcant better performance on the external test set from center 2 than on the test set from center 1 for the AUC of HCC diagnosis (p value=0.048) and ICC diagnosis (p value=0.039)
图 2 模型在恶性肝肿瘤多项分类上的性能 A STIC模型在测试集上区分HCC、ICC和转移性肿瘤的微平均和宏平均ROC曲线。B 测试集上STIC模型对HCC、ICC、转移性肿瘤诊断的ROC曲线及医生共识和三位STIC辅助医生的对应诊断点。橙色星星代表医生共识的诊断性能。三个不同颜色的三角形分别代表三位STIC辅助医生的诊断性能,红色五边形代表这三位医生的平均诊断水平。对于ICC诊断,医生共识诊断的性能低于STIC模型的ROC曲线,三位STIC辅助医生的性能都高于ROC曲线。C STIC模型的总准确率为72.6%(95% CI,63.4%-80.5%),医生共识的总准确率为70.8%(95% CI,61.5%-79.0%)。三位STIC辅助医生在测试集上分别达到了77.0%(95% CI,68.1%-84.4%)、78.8%(95% CI,70.1%-85.9%)和81.4%(95% CI,73.0%-88.1%)的总准确率。使用Cochran的Q检验,三位STIC辅助医生之间在诊断水平上没有显著差异。在比较三位STIC辅助医生和医生共识诊断的诊断水平时,对于ICC的敏感性存在显著差异(p值=0.038)。D 三个病理诊断为ICC的测试样本案例研究。对于案例1,CECT的增强模式典型,其中ICC肿瘤在NC阶段显示均匀低衰减,在ART阶段显示微弱的周边增强,在PV阶段显示逐渐向心性增强。医生共识的诊断是ICC。STIC模型的输出是{HCC: 0.067, ICC: 0.646, 转移: 0.287}。所有三位STIC辅助医生独立诊断为ICC。对于案例2,CECT的增强模式与HCC肿瘤的典型模式相似,在NC阶段显示低衰减,在ART阶段显示早期增强峰值,然后在PV阶段持续减少。医生共识误诊为HCC。STIC模型的输出是{HCC: 0.881, ICC: 0.067, 转移: 0.052},也错误地诊断为HCC。所有三位STIC辅助医生误诊为HCC。对于案例3,在ART阶段有周边增强,但对人眼来说不明显。医生共识误诊为转移。STIC模型的输出是{HCC: 0.114, ICC: 0.587, 转移: 0.299},正确地诊断为ICC。所有三位STIC辅助医生正确诊断为ICC。E 三个病理诊断为转移的测试样本案例研究。对于案例1,医生共识误诊为ICC。STIC模型的输出是{HCC: 0.031, ICC: 0.343, 转移: 0.626}。两位STIC辅助医生独