monai
medical
katielink commited on
Commit
967fa85
·
1 Parent(s): 42e62c6

add the ONNX-TensorRT way of model conversion

Browse files
README.md CHANGED
@@ -70,6 +70,33 @@ The validation accuracy in this curve is the mean of mAP, mAR, AP(IoU=0.1), and
70
 
71
  ![A graph showing the detection val accuracy](https://developer.download.nvidia.com/assets/Clara/Images/monai_retinanet_detection_val_acc_v2.png)
72
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
73
  ## MONAI Bundle Commands
74
  In addition to the Pythonic APIs, a few command line interfaces (CLI) are provided to interact with the bundle. The CLI supports flexible use cases, such as overriding configs at runtime and predefining arguments in a file.
75
 
@@ -98,6 +125,18 @@ Note that in inference.json, the transform "LoadImaged" in "preprocessing" and "
98
  This depends on the input images. LUNA16 needs `"affine_lps_to_ras": true`.
99
  It is possible that your inference dataset should set `"affine_lps_to_ras": false`.
100
 
 
 
 
 
 
 
 
 
 
 
 
 
101
  # References
102
  [1] Lin, Tsung-Yi, et al. "Focal loss for dense object detection." ICCV 2017. https://arxiv.org/abs/1708.02002)
103
 
 
70
 
71
  ![A graph showing the detection val accuracy](https://developer.download.nvidia.com/assets/Clara/Images/monai_retinanet_detection_val_acc_v2.png)
72
 
73
+ #### TensorRT speedup
74
+ The `lung_nodule_ct_detection` bundle supports acceleration with TensorRT through the ONNX-TensorRT method. The table below displays the speedup ratios observed on an A100 80G GPU. Please note that when using the TensorRT model for inference, the `force_sliding_window` parameter in the `inference.json` file must be set to `true`. This ensures that the bundle uses the `SlidingWindowInferer` during inference and maintains the input spatial size of the network. Otherwise, if given an input with spatial size less than the `infer_patch_size`, the input spatial size of the network would be changed.
75
+
76
+ | method | torch_fp32(ms) | torch_amp(ms) | trt_fp32(ms) | trt_fp16(ms) | speedup amp | speedup fp32 | speedup fp16 | amp vs fp16|
77
+ | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
78
+ | model computation | 7449.84 | 996.08 | 976.67 | 626.90 | 7.63 | 7.63 | 11.88 | 1.56 |
79
+ | end2end | 36458.26 | 7259.35 | 6420.60 | 4698.34 | 5.02 | 5.68 | 7.76 | 1.55 |
80
+
81
+ Where:
82
+ - `model computation` means the speedup ratio of model's inference with a random input without preprocessing and postprocessing
83
+ - `end2end` means run the bundle end-to-end with the TensorRT based model.
84
+ - `torch_fp32` and `torch_amp` are for the PyTorch models with or without `amp` mode.
85
+ - `trt_fp32` and `trt_fp16` are for the TensorRT based models converted in corresponding precision.
86
+ - `speedup amp`, `speedup fp32` and `speedup fp16` are the speedup ratios of corresponding models versus the PyTorch float32 model
87
+ - `amp vs fp16` is the speedup ratio between the PyTorch amp model and the TensorRT float16 based model.
88
+
89
+ Currently, the only available method to accelerate this model is through ONNX-TensorRT. However, the Torch-TensorRT method is under development and will be available in the near future.
90
+
91
+ This result is benchmarked under:
92
+ - TensorRT: 8.5.3+cuda11.8
93
+ - Torch-TensorRT Version: 1.4.0
94
+ - CPU Architecture: x86-64
95
+ - OS: ubuntu 20.04
96
+ - Python version:3.8.10
97
+ - CUDA version: 12.0
98
+ - GPU models and configuration: A100 80G
99
+
100
  ## MONAI Bundle Commands
101
  In addition to the Pythonic APIs, a few command line interfaces (CLI) are provided to interact with the bundle. The CLI supports flexible use cases, such as overriding configs at runtime and predefining arguments in a file.
102
 
 
125
  This depends on the input images. LUNA16 needs `"affine_lps_to_ras": true`.
126
  It is possible that your inference dataset should set `"affine_lps_to_ras": false`.
127
 
128
+ #### Export checkpoint to TensorRT based models with fp32 or fp16 precision
129
+
130
+ ```bash
131
+ python -m monai.bundle trt_export --net_id network_def --filepath models/model_trt.ts --ckpt_file models/model.pt --meta_file configs/metadata.json --config_file configs/inference.json --precision <fp32/fp16> --input_shape "[1, 1, 512, 512, 192]" --use_onnx "True" --use_trace "True" --onnx_output_names "['output_0', 'output_1', 'output_2', 'output_3', 'output_4', 'output_5']" --network_def#use_list_output "True"
132
+ ```
133
+
134
+ #### Execute inference with the TensorRT model
135
+
136
+ ```
137
+ python -m monai.bundle run --config_file "['configs/inference.json', 'configs/inference_trt.json']"
138
+ ```
139
+
140
  # References
141
  [1] Lin, Tsung-Yi, et al. "Focal loss for dense object detection." ICCV 2017. https://arxiv.org/abs/1708.02002)
142
 
configs/inference.json CHANGED
@@ -13,6 +13,14 @@
13
  "test_datalist": "$monai.data.load_decathlon_datalist(@data_list_file_path, is_segmentation=True, data_list_key='validation', base_dir=@dataset_dir)",
14
  "device": "$torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')",
15
  "amp": true,
 
 
 
 
 
 
 
 
16
  "infer_patch_size": [
17
  512,
18
  512,
@@ -47,22 +55,22 @@
47
  "feature_extractor": "$monai.apps.detection.networks.retinanet_network.resnet_fpn_feature_extractor(@backbone,3,False,[1,2],None)",
48
  "network_def": {
49
  "_target_": "RetinaNet",
50
- "spatial_dims": 3,
51
- "num_classes": 1,
52
  "num_anchors": 3,
53
  "feature_extractor": "@feature_extractor",
54
- "size_divisible": [
55
- 16,
56
- 16,
57
- 8
58
- ]
59
  },
60
  "network": "$@network_def.to(@device)",
61
  "detector": {
62
  "_target_": "RetinaNetDetector",
63
  "network": "@network",
64
  "anchor_generator": "@anchor_generator",
65
- "debug": false
 
 
 
66
  },
67
  "detector_ops": [
68
  "[email protected]_target_keys(box_key='box', label_key='label')",
@@ -136,7 +144,8 @@
136
  },
137
  "inferer": {
138
  "_target_": "scripts.detection_inferer.RetinaNetInferer",
139
- "detector": "@detector"
 
140
  },
141
  "postprocessing": {
142
  "_target_": "Compose",
 
13
  "test_datalist": "$monai.data.load_decathlon_datalist(@data_list_file_path, is_segmentation=True, data_list_key='validation', base_dir=@dataset_dir)",
14
  "device": "$torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')",
15
  "amp": true,
16
+ "spatial_dims": 3,
17
+ "num_classes": 1,
18
+ "force_sliding_window": false,
19
+ "size_divisible": [
20
+ 16,
21
+ 16,
22
+ 8
23
+ ],
24
  "infer_patch_size": [
25
  512,
26
  512,
 
55
  "feature_extractor": "$monai.apps.detection.networks.retinanet_network.resnet_fpn_feature_extractor(@backbone,3,False,[1,2],None)",
56
  "network_def": {
57
  "_target_": "RetinaNet",
58
+ "spatial_dims": "@spatial_dims",
59
+ "num_classes": "@num_classes",
60
  "num_anchors": 3,
61
  "feature_extractor": "@feature_extractor",
62
+ "size_divisible": "@size_divisible",
63
+ "use_list_output": false
 
 
 
64
  },
65
  "network": "$@network_def.to(@device)",
66
  "detector": {
67
  "_target_": "RetinaNetDetector",
68
  "network": "@network",
69
  "anchor_generator": "@anchor_generator",
70
+ "debug": false,
71
+ "spatial_dims": "@spatial_dims",
72
+ "num_classes": "@num_classes",
73
+ "size_divisible": "@size_divisible"
74
  },
75
  "detector_ops": [
76
  "[email protected]_target_keys(box_key='box', label_key='label')",
 
144
  },
145
  "inferer": {
146
  "_target_": "scripts.detection_inferer.RetinaNetInferer",
147
+ "detector": "@detector",
148
+ "force_sliding_window": "@force_sliding_window"
149
  },
150
  "postprocessing": {
151
  "_target_": "Compose",
configs/inference_trt.json ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "imports": [
3
+ "$import glob",
4
+ "$import os",
5
+ "$import torch_tensorrt"
6
+ ],
7
+ "force_sliding_window": true,
8
+ "handlers#0#_disabled_": true,
9
+ "network_def": "$torch.jit.load(@bundle_root + '/models/model_trt.ts')",
10
+ "evaluator#amp": false
11
+ }
configs/metadata.json CHANGED
@@ -1,7 +1,8 @@
1
  {
2
  "schema": "https://github.com/Project-MONAI/MONAI-extra-test-data/releases/download/0.8.1/meta_schema_20220324.json",
3
- "version": "0.5.5",
4
  "changelog": {
 
5
  "0.5.5": "update retrained validation results and training curve",
6
  "0.5.4": "add non-deterministic note",
7
  "0.5.3": "adapt to BundleWorkflow interface",
@@ -19,7 +20,7 @@
19
  "0.1.1": "add reference for LIDC dataset",
20
  "0.1.0": "complete the model package"
21
  },
22
- "monai_version": "1.2.0rc4",
23
  "pytorch_version": "1.13.1",
24
  "numpy_version": "1.22.2",
25
  "optional_packages_version": {
 
1
  {
2
  "schema": "https://github.com/Project-MONAI/MONAI-extra-test-data/releases/download/0.8.1/meta_schema_20220324.json",
3
+ "version": "0.5.6",
4
  "changelog": {
5
+ "0.5.6": "add the ONNX-TensorRT way of model conversion",
6
  "0.5.5": "update retrained validation results and training curve",
7
  "0.5.4": "add non-deterministic note",
8
  "0.5.3": "adapt to BundleWorkflow interface",
 
20
  "0.1.1": "add reference for LIDC dataset",
21
  "0.1.0": "complete the model package"
22
  },
23
+ "monai_version": "1.2.0rc5",
24
  "pytorch_version": "1.13.1",
25
  "numpy_version": "1.22.2",
26
  "optional_packages_version": {
docs/README.md CHANGED
@@ -63,6 +63,33 @@ The validation accuracy in this curve is the mean of mAP, mAR, AP(IoU=0.1), and
63
 
64
  ![A graph showing the detection val accuracy](https://developer.download.nvidia.com/assets/Clara/Images/monai_retinanet_detection_val_acc_v2.png)
65
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
66
  ## MONAI Bundle Commands
67
  In addition to the Pythonic APIs, a few command line interfaces (CLI) are provided to interact with the bundle. The CLI supports flexible use cases, such as overriding configs at runtime and predefining arguments in a file.
68
 
@@ -91,6 +118,18 @@ Note that in inference.json, the transform "LoadImaged" in "preprocessing" and "
91
  This depends on the input images. LUNA16 needs `"affine_lps_to_ras": true`.
92
  It is possible that your inference dataset should set `"affine_lps_to_ras": false`.
93
 
 
 
 
 
 
 
 
 
 
 
 
 
94
  # References
95
  [1] Lin, Tsung-Yi, et al. "Focal loss for dense object detection." ICCV 2017. https://arxiv.org/abs/1708.02002)
96
 
 
63
 
64
  ![A graph showing the detection val accuracy](https://developer.download.nvidia.com/assets/Clara/Images/monai_retinanet_detection_val_acc_v2.png)
65
 
66
+ #### TensorRT speedup
67
+ The `lung_nodule_ct_detection` bundle supports acceleration with TensorRT through the ONNX-TensorRT method. The table below displays the speedup ratios observed on an A100 80G GPU. Please note that when using the TensorRT model for inference, the `force_sliding_window` parameter in the `inference.json` file must be set to `true`. This ensures that the bundle uses the `SlidingWindowInferer` during inference and maintains the input spatial size of the network. Otherwise, if given an input with spatial size less than the `infer_patch_size`, the input spatial size of the network would be changed.
68
+
69
+ | method | torch_fp32(ms) | torch_amp(ms) | trt_fp32(ms) | trt_fp16(ms) | speedup amp | speedup fp32 | speedup fp16 | amp vs fp16|
70
+ | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
71
+ | model computation | 7449.84 | 996.08 | 976.67 | 626.90 | 7.63 | 7.63 | 11.88 | 1.56 |
72
+ | end2end | 36458.26 | 7259.35 | 6420.60 | 4698.34 | 5.02 | 5.68 | 7.76 | 1.55 |
73
+
74
+ Where:
75
+ - `model computation` means the speedup ratio of model's inference with a random input without preprocessing and postprocessing
76
+ - `end2end` means run the bundle end-to-end with the TensorRT based model.
77
+ - `torch_fp32` and `torch_amp` are for the PyTorch models with or without `amp` mode.
78
+ - `trt_fp32` and `trt_fp16` are for the TensorRT based models converted in corresponding precision.
79
+ - `speedup amp`, `speedup fp32` and `speedup fp16` are the speedup ratios of corresponding models versus the PyTorch float32 model
80
+ - `amp vs fp16` is the speedup ratio between the PyTorch amp model and the TensorRT float16 based model.
81
+
82
+ Currently, the only available method to accelerate this model is through ONNX-TensorRT. However, the Torch-TensorRT method is under development and will be available in the near future.
83
+
84
+ This result is benchmarked under:
85
+ - TensorRT: 8.5.3+cuda11.8
86
+ - Torch-TensorRT Version: 1.4.0
87
+ - CPU Architecture: x86-64
88
+ - OS: ubuntu 20.04
89
+ - Python version:3.8.10
90
+ - CUDA version: 12.0
91
+ - GPU models and configuration: A100 80G
92
+
93
  ## MONAI Bundle Commands
94
  In addition to the Pythonic APIs, a few command line interfaces (CLI) are provided to interact with the bundle. The CLI supports flexible use cases, such as overriding configs at runtime and predefining arguments in a file.
95
 
 
118
  This depends on the input images. LUNA16 needs `"affine_lps_to_ras": true`.
119
  It is possible that your inference dataset should set `"affine_lps_to_ras": false`.
120
 
121
+ #### Export checkpoint to TensorRT based models with fp32 or fp16 precision
122
+
123
+ ```bash
124
+ python -m monai.bundle trt_export --net_id network_def --filepath models/model_trt.ts --ckpt_file models/model.pt --meta_file configs/metadata.json --config_file configs/inference.json --precision <fp32/fp16> --input_shape "[1, 1, 512, 512, 192]" --use_onnx "True" --use_trace "True" --onnx_output_names "['output_0', 'output_1', 'output_2', 'output_3', 'output_4', 'output_5']" --network_def#use_list_output "True"
125
+ ```
126
+
127
+ #### Execute inference with the TensorRT model
128
+
129
+ ```
130
+ python -m monai.bundle run --config_file "['configs/inference.json', 'configs/inference_trt.json']"
131
+ ```
132
+
133
  # References
134
  [1] Lin, Tsung-Yi, et al. "Focal loss for dense object detection." ICCV 2017. https://arxiv.org/abs/1708.02002)
135
 
scripts/detection_inferer.py CHANGED
@@ -25,14 +25,19 @@ class RetinaNetInferer(Inferer):
25
  Args:
26
  detector: the RetinaNetDetector that converts network output BxCxMxN or BxCxMxNxP
27
  map into boxes and classification scores.
 
 
 
 
28
  args: other optional args to be passed to detector.
29
  kwargs: other optional keyword args to be passed to detector.
30
  """
31
 
32
- def __init__(self, detector: RetinaNetDetector, *args, **kwargs) -> None:
33
  Inferer.__init__(self)
34
  self.detector = detector
35
  self.sliding_window_size = None
 
36
  if self.detector.inferer is not None:
37
  if hasattr(self.detector.inferer, "roi_size"):
38
  self.sliding_window_size = np.prod(self.detector.inferer.roi_size)
@@ -52,8 +57,10 @@ class RetinaNetInferer(Inferer):
52
 
53
  # if image smaller than sliding window roi size, no need to use sliding window inferer
54
  # use sliding window inferer only when image is large
55
- use_inferer = self.sliding_window_size is not None and not all(
56
- [data_i[0, ...].numel() < self.sliding_window_size for data_i in inputs]
 
 
57
  )
58
 
59
  return self.detector(inputs, use_inferer=use_inferer, *args, **kwargs)
 
25
  Args:
26
  detector: the RetinaNetDetector that converts network output BxCxMxN or BxCxMxNxP
27
  map into boxes and classification scores.
28
+ force_sliding_window: whether to force using a SlidingWindowInferer to do the inference.
29
+ If False, will check the input spatial size to decide whether to simply
30
+ forward the network or using SlidingWindowInferer.
31
+ If True, will force using SlidingWindowInferer to do the inference.
32
  args: other optional args to be passed to detector.
33
  kwargs: other optional keyword args to be passed to detector.
34
  """
35
 
36
+ def __init__(self, detector: RetinaNetDetector, force_sliding_window: bool = False) -> None:
37
  Inferer.__init__(self)
38
  self.detector = detector
39
  self.sliding_window_size = None
40
+ self.force_sliding_window = force_sliding_window
41
  if self.detector.inferer is not None:
42
  if hasattr(self.detector.inferer, "roi_size"):
43
  self.sliding_window_size = np.prod(self.detector.inferer.roi_size)
 
57
 
58
  # if image smaller than sliding window roi size, no need to use sliding window inferer
59
  # use sliding window inferer only when image is large
60
+ use_inferer = (
61
+ self.force_sliding_window
62
+ or self.sliding_window_size is not None
63
+ and not all([data_i[0, ...].numel() < self.sliding_window_size for data_i in inputs])
64
  )
65
 
66
  return self.detector(inputs, use_inferer=use_inferer, *args, **kwargs)