目录
1. 数据集处理
1.1 实现脚本
1.2 json文件
2. 设置读取路径
2.1 设置路径
2.2 数据集转换
2.3 数据集预处理
2.4 训练(3d_fullres)
3. 训练结果展示
关于nnUnet 数据集的处理和环境搭建,参考上文:第四章:nnUnet大模型之环境配置、数据集制作-CSDN博客
1. 数据集处理
因为上文数据集的标签有很多问题,虽然处理起来很简单,为了防止后续需要,这里记录下
观察上文发现,数据的标签是19类别,但是mask的绘制不是连续的0 1 2 3,这样在图像分割中是
不允许的,需要做灰度映射。
实际上,在做unet一些列多类别分割的时候,已经介绍过自适应的灰度映射,这里只做简单介绍,具体参考下文:Unet 实战分割项目、多尺度训练、多类别分割_unet实例分割-CSDN博客
如果数据没有问题的话,直接跳到第二章即可!!
1.1 实现脚本
如下
import SimpleITK as sitk
import numpy as np
import os
from tqdm import tqdm
import shutil
def main():
root = 'labelsTr'
images = [os.path.join(root, u) for u in os.listdir(root)]
root_ret = 'ret_labelsTr'
if os.path.exists(root_ret):
shutil.rmtree(root_ret)
os.mkdir(root_ret)
# 计算灰度
cl = []
for i in tqdm(images, desc='process'):
mask = sitk.ReadImage(i)
mask = sitk.GetArrayFromImage(mask)
mask = np.unique(mask)
for h in mask:
if h not in cl:
cl.append(h)
cl.sort()
n = len(cl)
print(cl) # [0, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22]
print('分割的个数:',n)
if n == cl[n-1]:
return
# 灰度映射
for i in tqdm(images, desc='process'):
mask = sitk.ReadImage(i)
mask = sitk.GetArrayFromImage(mask)
for index,h in enumerate(cl):
mask[mask==h] = index
mask = sitk.GetImageFromArray(mask)
ret_path = i.replace(root,root_ret)
sitk.WriteImage(mask,ret_path)
# 检查灰度
cl_ret = []
images = [os.path.join(root_ret, u) for u in os.listdir(root_ret)]
for i in tqdm(images, desc='process'):
mask = sitk.ReadImage(i)
mask = sitk.GetArrayFromImage(mask)
mask = np.unique(mask)
for h in mask:
if h not in cl_ret:
cl_ret.append(h)
cl_ret.sort()
n = len(cl_ret)
print(cl_ret) # [0, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22]
print('处理后分割的个数:',n)
if __name__ == '__main__':
main()
摆放如下:脚本会将labelsTr的标签自动映射成0 1 2 3连续的,并且保存在新生成的ret下
运行如下:
可以看到mask的灰度已经进行了映射
通过itk打开,可以发现mask并没有改变,只是里面的数字变了,这样颜色显示也就变了
源标签:
处理完的:
1.2 json文件
更改如下:当然新的json文件可以用上文的脚本生成
{
"labels": {
"0": "background",
"1": "L1",
"2": "L2",
"3": "L3",
"4": "L4",
"5": "L5",
"6": "L6",
"7": "L7",
"8": "L8",
"9": "L9",
"10": "L10",
"11": "L11",
"12": "L12",
"13": "L13",
"14": "L14",
"15": "L15",
"16": "L16",
"17": "L17",
"18": "L18"
},
"modality": {
"0": "CT"
},
"numTest": 0,
"numTraining": 40,
"tensorImageSize": "3D",
"test": [],
"training": [
{
"image": "./imagesTr/spine_001.nii.gz",
"label": "./labelsTr/spine_001.nii.gz"
},
{
"image": "./imagesTr/spine_002.nii.gz",
"label": "./labelsTr/spine_002.nii.gz"
},
{
"image": "./imagesTr/spine_003.nii.gz",
"label": "./labelsTr/spine_003.nii.gz"
},
{
"image": "./imagesTr/spine_004.nii.gz",
"label": "./labelsTr/spine_004.nii.gz"
},
{
"image": "./imagesTr/spine_005.nii.gz",
"label": "./labelsTr/spine_005.nii.gz"
},
{
"image": "./imagesTr/spine_006.nii.gz",
"label": "./labelsTr/spine_006.nii.gz"
},
{
"image": "./imagesTr/spine_007.nii.gz",
"label": "./labelsTr/spine_007.nii.gz"
},
{
"image": "./imagesTr/spine_008.nii.gz",
"label": "./labelsTr/spine_008.nii.gz"
},
{
"image": "./imagesTr/spine_009.nii.gz",
"label": "./labelsTr/spine_009.nii.gz"
},
{
"image": "./imagesTr/spine_010.nii.gz",
"label": "./labelsTr/spine_010.nii.gz"
},
{
"image": "./imagesTr/spine_011.nii.gz",
"label": "./labelsTr/spine_011.nii.gz"
},
{
"image": "./imagesTr/spine_012.nii.gz",
"label": "./labelsTr/spine_012.nii.gz"
},
{
"image": "./imagesTr/spine_013.nii.gz",
"label": "./labelsTr/spine_013.nii.gz"
},
{
"image": "./imagesTr/spine_014.nii.gz",
"label": "./labelsTr/spine_014.nii.gz"
},
{
"image": "./imagesTr/spine_015.nii.gz",
"label": "./labelsTr/spine_015.nii.gz"
},
{
"image": "./imagesTr/spine_016.nii.gz",
"label": "./labelsTr/spine_016.nii.gz"
},
{
"image": "./imagesTr/spine_017.nii.gz",
"label": "./labelsTr/spine_017.nii.gz"
},
{
"image": "./imagesTr/spine_018.nii.gz",
"label": "./labelsTr/spine_018.nii.gz"
},
{
"image": "./imagesTr/spine_019.nii.gz",
"label": "./labelsTr/spine_019.nii.gz"
},
{
"image": "./imagesTr/spine_020.nii.gz",
"label": "./labelsTr/spine_020.nii.gz"
},
{
"image": "./imagesTr/spine_021.nii.gz",
"label": "./labelsTr/spine_021.nii.gz"
},
{
"image": "./imagesTr/spine_022.nii.gz",
"label": "./labelsTr/spine_022.nii.gz"
},
{
"image": "./imagesTr/spine_023.nii.gz",
"label": "./labelsTr/spine_023.nii.gz"
},
{
"image": "./imagesTr/spine_024.nii.gz",
"label": "./labelsTr/spine_024.nii.gz"
},
{
"image": "./imagesTr/spine_025.nii.gz",
"label": "./labelsTr/spine_025.nii.gz"
},
{
"image": "./imagesTr/spine_026.nii.gz",
"label": "./labelsTr/spine_026.nii.gz"
},
{
"image": "./imagesTr/spine_027.nii.gz",
"label": "./labelsTr/spine_027.nii.gz"
},
{
"image": "./imagesTr/spine_028.nii.gz",
"label": "./labelsTr/spine_028.nii.gz"
},
{
"image": "./imagesTr/spine_029.nii.gz",
"label": "./labelsTr/spine_029.nii.gz"
},
{
"image": "./imagesTr/spine_030.nii.gz",
"label": "./labelsTr/spine_030.nii.gz"
},
{
"image": "./imagesTr/spine_031.nii.gz",
"label": "./labelsTr/spine_031.nii.gz"
},
{
"image": "./imagesTr/spine_032.nii.gz",
"label": "./labelsTr/spine_032.nii.gz"
},
{
"image": "./imagesTr/spine_033.nii.gz",
"label": "./labelsTr/spine_033.nii.gz"
},
{
"image": "./imagesTr/spine_034.nii.gz",
"label": "./labelsTr/spine_034.nii.gz"
},
{
"image": "./imagesTr/spine_035.nii.gz",
"label": "./labelsTr/spine_035.nii.gz"
},
{
"image": "./imagesTr/spine_036.nii.gz",
"label": "./labelsTr/spine_036.nii.gz"
},
{
"image": "./imagesTr/spine_037.nii.gz",
"label": "./labelsTr/spine_037.nii.gz"
},
{
"image": "./imagesTr/spine_038.nii.gz",
"label": "./labelsTr/spine_038.nii.gz"
},
{
"image": "./imagesTr/spine_039.nii.gz",
"label": "./labelsTr/spine_039.nii.gz"
},
{
"image": "./imagesTr/spine_040.nii.gz",
"label": "./labelsTr/spine_040.nii.gz"
}
]
}
2. 设置读取路径
回到正文,这里的Task下有如下数据,source nnunet/bin/activate 激活nnunet环境
Tips:这里的 labelsTr和dataset.json是第一节处理后的
任务名称为Task01_Spine
2.1 设置路径
这里设置为绝对路径,除了DATASET后面的,前面部分需要根据不同机器设定
在这里更改 vim .bashrc(vim ~/.bashrc 末尾最后面)
export nnUNet_raw_data_base="/*/DATASET/nnUNet_raw"
export nnUNet_preprocessed="/*/DATASET/nnUNet_preprocessed"
export RESULTS_FOLDER="/*/DATASET/nnUNet_trained_models"
这里设置后,如果想要训练其他模型,不需要在进行更改
添加完成后保存, source ~/.bashrc 更新环境变量,可以通过echo $RESULTS_FOLDER 检查是否修改成功
2.2 数据集转换
下面命令都是在environments 目录里进行操作
转换命令为:
nnUNet_convert_decathlon_task -i DATASET/nnUNet_raw/nnUNet_raw_data/Task01_Spine/
转换完的数据在:
图像可能具有多种模态,nnU-Net通过其后缀(文件名末尾的四位整数)识别成像模态。因此,图像文件必须遵循以下命名约定:case_identifier_XXXX.nii.gz。
这里,XXXX是模态标识符。dataset.json文件中指定了这些标识符所属的模态。
标签文件保存为case_identifier.nii.gz
例如:BrainTumor。每个图像有四种模态:FLAIR(0000)、T1w(0001)、T1gd(0002)和T2w(0003)
2.3 数据集预处理
命令如下:(这里只会做训练集进行预处理,测试集不会处理)
nnUNet_plan_and_preprocess -t 1
只需要一行命令,因为 Task_id是1,所以这里的数字就是1。这个过程会消耗很多的时间,速度慢的原因在于对要进行插值等各种操作。
生成的数据在crop和precocessed里面查看
2.4 训练(3d_fullres)
命令如下:
nnUNet_train 3d_fullres nnUNetTrainerV2 1 0
1 指的是Task标号,5 指定训练的是5倍交叉验证的哪一倍。
会实时生成如下结果:在这里 nnUNet_trained_models
3. 训练结果展示
RTX 3090跑一个epoch大概100s,1000个epoch估计要一两天,等跑完下篇文章在贴训练结果吧