amonfortc commited on
Commit
dc709da
·
verified ·
1 Parent(s): 19f2b5b

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +45 -66
README.md CHANGED
@@ -45,53 +45,29 @@ The system provides an easy-to-use interface built with Gradio, allowing users t
45
  #### **Predictive Features**
46
  Below are the features used for prediction across all targets:
47
 
48
- 1. **Pedigree** (0 - 67):
49
- Represents the familial history related to fibrotic conditions.
50
 
51
- 2. **Age at diagnosis** (36.0 - 92.0):
52
- Age of the patient at the time of diagnosis. A critical factor as progression and treatment response vary with age.
53
 
54
- 3. **FVC (L) at diagnosis** (0.0 - 5.0):
55
- Forced vital capacity in liters at the time of diagnosis, reflecting lung function.
56
 
57
- 4. **FVC (%) at diagnosis** (0.0 - 200.0):
58
- Forced vital capacity as a percentage of the expected value for the patient’s age and sex.
59
 
60
- 5. **DLCO (%) at diagnosis** (0.0 - 200.0):
61
- Diffusion capacity for carbon monoxide as a percentage, measuring gas exchange efficiency in the lungs.
62
 
63
- 6. **RadioWorsening2y** (0 - 3):
64
- Radiological assessment of lung deterioration over two years. Higher values indicate significant progression.
65
 
66
- 7. **Severity of telomere shortening - Transform 4** (1 - 6):
67
- Indicates the degree of telomere shortening.
68
 
69
- 8. **Progressive disease** (0 - 1):
70
- Binary variable indicating whether the disease is progressive (1) or stable (0).
71
 
72
- 9. **Antifibrotic Drug** (0 - 1):
73
- Binary variable representing the use of antifibrotic drugs. 1 indicates use; 0 indicates none.
74
 
75
- 10. **Prednisone** (0 - 1):
76
- Binary variable reflecting prednisone usage. 1 indicates use; 0 indicates none.
77
 
78
- 11. **Mycophenolate** (0 - 1):
79
- Binary variable indicating mycophenolate usage. 1 indicates use; 0 indicates none.
80
 
81
- 12. **FVC (L) 1 year after diagnosis** (0.0 - 5.0):
82
- Forced vital capacity in liters one year after diagnosis, used to evaluate changes in lung function.
83
-
84
- 13. **FVC (%) 1 year after diagnosis** (0.0 - 200.0):
85
- Forced vital capacity as a percentage one year after diagnosis.
86
-
87
- 14. **DLCO (%) 1 year after diagnosis** (0.0 - 200.0):
88
- Diffusion capacity for carbon monoxide as a percentage one year after diagnosis.
89
-
90
- 15. **Genetic mutation studied in patient** (0 - 1):
91
- Binary variable indicating the presence of specific genetic mutations. 1 indicates mutation found; 0 indicates none.
92
-
93
- 16. **Comorbidities** (0 - 1):
94
- Binary variable representing the presence of relevant comorbidities. 1 indicates presence; 0 indicates absence.
95
 
96
  ---
97
 
@@ -152,101 +128,103 @@ The performance and feature importance for the prediction targets are detailed b
152
  ![Cross-validation Accuracy for Death](Figures/Figure_1.png)
153
 
154
  - **Cross-validation Accuracy**:
155
- The cross-validation results for "Death" show some variability across folds, but overall, the model achieves consistently high accuracy, indicating good generalization ability across subsets of the data.
156
 
157
  - **Train vs Test Accuracy**:
158
- The train-test accuracy comparison shows minimal overfitting, as the performance on both sets is closely aligned.
159
 
160
- ![Feature Importance for Death](Figures/Figure_2.png)
161
 
162
  - **Feature Importance**:
163
- Features such as "Progressive disease" and "DLCO (%) at diagnosis" are the most influential, highlighting their significant role in predicting mortality.
164
 
165
- ![ROC-AUC Curve for Death](Figures/Figure_3.png)
166
 
167
  - **ROC-AUC Curve**:
168
- The ROC-AUC curve illustrates strong model performance, with an area under the curve (AUC) of 0.92, confirming the model's ability to distinguish between positive and negative cases effectively.
169
 
170
  ##### **Prediction Target: Binary Diagnosis**
171
 
172
- ![Cross-validation Accuracy for Binary Diagnosis](Figures/Figure_4.png)
173
 
174
  - **Cross-validation Accuracy**:
175
  Variability in cross-validation accuracy is observed, but the model maintains high performance across most folds.
176
 
177
- ![Feature Importance for Binary Diagnosis](Figures/Figure_5.png)
 
 
178
 
179
  - **Feature Importance**:
180
- Key predictors include "Prednisone" and "Antifibrotic Drug", reflecting the importance of treatment factors in classification.
181
 
182
- ![ROC-AUC Curve for Death](Figures/Figure_6.png)
183
 
184
  - **ROC-AUC Curve**:
185
- The high AUC value of 0.95 indicates excellent discrimination ability for the binary classification task.
186
 
187
  ##### **Prediction Target: Progressive Disease**
188
 
189
- ![Cross-validation Accuracy for Progressive Disease](Figures/Figure_10.png)
190
 
191
  - **Cross-validation Accuracy**:
192
- Accuracy scores across folds highlight variability, but peaks show strong model performance, suggesting robustness in certain data subsets.
193
 
194
  - **Train vs Test Accuracy**:
195
- The parity between training and testing accuracy confirms minimal overfitting and reliable generalization.
196
 
197
  ![Feature Importance for Progressive Disease](Figures/Figure_11.png)
198
 
199
  - **Feature Importance**:
200
- "RadioWorsening2y" emerges as the dominant predictor, supported by secondary factors like "FVC (%) 1 year after diagnosis".
201
 
202
-
203
  ![ROC-AUC Curve for Progressive Disease](Figures/Figure_12.png)
204
 
205
  - **ROC-AUC Curve**:
206
- With an AUC of 0.98, the model demonstrates exceptional predictive power for disease progression.
207
 
208
  ##### **Prediction Target: Necessity of Transplantation**
209
 
210
  ![Cross-validation Accuracy for Necessity of Transplantation](Figures/Figure_13.png)
211
 
212
-
213
  - **Cross-validation Accuracy**:
214
- Cross-validation reveals excellent model accuracy, with minimal performance dips across folds.
215
 
216
- ![Feature Importance for Necessity of Transplantation](Figures/Figure_14.png)
 
 
217
 
218
  - **Feature Importance**:
219
- "Age at diagnosis" and "FVC (%) 1 year after diagnosis" are the most critical variables, underscoring their importance in assessing transplantation need.
220
 
221
- ![ROC-AUC Curve for Necessity of Transplantation](Figures/Figure_15.png)
222
 
223
  - **ROC-AUC Curve**:
224
- The model achieves an AUC of 1.00, reflecting perfect discrimination between cases where transplantation is needed and those where it is not.
225
 
226
  ---
227
 
228
  #### **Future Improvements**
229
 
230
  - **Optimizing Variable Names**:
231
- Review and refine the naming conventions for variables to improve clarity and consistency, facilitating better understanding for medical practitioners and data scientists.
232
 
233
  - **Improving Model Precision**:
234
- Retrain the model with a larger and more diverse dataset, incorporating data from additional patients to enhance accuracy and generalization.
235
 
236
  - **Identifying Optimal Medical Variables**:
237
- Conduct a detailed analysis to identify which medical variables contribute most significantly to prediction accuracy and consider eliminating less relevant ones to simplify the model.
238
 
239
  - **Testing Model Performance with Reduced Variables**:
240
- Assess whether the model maintains strong predictive performance with a reduced set of optimized medical variables, which could enhance interpretability and efficiency.
241
 
242
  - **Expanding Dataset Diversity**:
243
- Incorporate data from different demographics, regions, and clinical conditions to ensure the model performs well across diverse patient groups.
244
 
245
  - **Adding Longitudinal Data Analysis**:
246
- Integrate longitudinal data to capture temporal patterns in disease progression, which could significantly enhance prediction capabilities.
247
 
248
  - **Real-time Model Retraining**:
249
- Develop an interface or mechanism for users to upload new patient data and retrain the model seamlessly, keeping it up-to-date with the latest insights.
250
 
251
  ### Associated Space
252
 
@@ -257,3 +235,4 @@ Check out the interactive demo of this model on Hugging Face Spaces:
257
  ---
258
 
259
  This README provides a comprehensive guide to understanding and using the **FibroPred** predictive system effectively.
 
 
45
  #### **Predictive Features**
46
  Below are the features used for prediction across all targets:
47
 
48
+ 1. **Pedigree**: Represents the familial history related to fibrotic conditions.
 
49
 
50
+ 2. **Age at diagnosis**: Age of the patient at the time of diagnosis.
 
51
 
52
+ 3. **FVC (L) at diagnosis**: Forced vital capacity in liters at the time of diagnosis, reflecting lung function.
 
53
 
54
+ 4. **FVC (%) at diagnosis**: Forced vital capacity as a percentage of the expected value for the patient’s age and sex.
 
55
 
56
+ 5. **DLCO (%) at diagnosis**: Diffusion capacity for carbon monoxide as a percentage, measuring gas exchange efficiency in the lungs.
 
57
 
58
+ 6. **RadioWorsening2y**: Radiological assessment of lung deterioration over two years.
 
59
 
60
+ 7. **Severity of telomere shortening - Transform 4**: Indicates the degree of telomere shortening.
 
61
 
62
+ 8. **Progressive disease**: Binary variable indicating whether the disease is progressive or stable.
 
63
 
64
+ 9. **Biopsy**: Binary variable indicating whether a biopsy was performed.
 
65
 
66
+ 10. **Genetic mutation studied in patient**: Binary variable indicating the presence of specific genetic mutations.
 
67
 
68
+ 11. **Comorbidities**: Binary variable representing the presence of relevant comorbidities.
 
69
 
70
+ 12. **Tobacco use**: Binary variable reflecting whether the patient has a history of tobacco use.
 
 
 
 
 
 
 
 
 
 
 
 
 
71
 
72
  ---
73
 
 
128
  ![Cross-validation Accuracy for Death](Figures/Figure_1.png)
129
 
130
  - **Cross-validation Accuracy**:
131
+ The cross-validation results for "Death" show some variability across folds, but overall, the model achieves consistently high accuracy.
132
 
133
  - **Train vs Test Accuracy**:
134
+ ![Train vs Test Accuracy for Death](Figures/Figure_2.png)
135
 
136
+ ![Feature Importance for Death](Figures/Figure_3.png)
137
 
138
  - **Feature Importance**:
139
+ Features such as "Progressive disease", "DLCO (%) at diagnosis", and "FVC (%) at diagnosis" are the most influential.
140
 
141
+ ![ROC-AUC Curve for Death](Figures/Figure_4.png)
142
 
143
  - **ROC-AUC Curve**:
144
+ The ROC-AUC curve illustrates strong model performance, with an AUC of 0.92.
145
 
146
  ##### **Prediction Target: Binary Diagnosis**
147
 
148
+ ![Cross-validation Accuracy for Binary Diagnosis](Figures/Figure_5.png)
149
 
150
  - **Cross-validation Accuracy**:
151
  Variability in cross-validation accuracy is observed, but the model maintains high performance across most folds.
152
 
153
+ ![Train vs Test Accuracy for Binary Diagnosis](Figures/Figure_6.png)
154
+
155
+ ![Feature Importance for Binary Diagnosis](Figures/Figure_7.png)
156
 
157
  - **Feature Importance**:
158
+ Key predictors include "Age at diagnosis", "Pedigree", and "Tobacco use".
159
 
160
+ ![ROC-AUC Curve for Binary Diagnosis](Figures/Figure_8.png)
161
 
162
  - **ROC-AUC Curve**:
163
+ The high AUC value of 0.95 indicates excellent discrimination ability.
164
 
165
  ##### **Prediction Target: Progressive Disease**
166
 
167
+ ![Cross-validation Accuracy for Progressive Disease](Figures/Figure_9.png)
168
 
169
  - **Cross-validation Accuracy**:
170
+ Accuracy scores across folds highlight variability, but peaks show strong model performance.
171
 
172
  - **Train vs Test Accuracy**:
173
+ ![Train vs Test Accuracy for Progressive Disease](Figures/Figure_10.png)
174
 
175
  ![Feature Importance for Progressive Disease](Figures/Figure_11.png)
176
 
177
  - **Feature Importance**:
178
+ "DLCO (%) at diagnosis", "Age at diagnosis", and "Pedigree" emerge as the dominant predictors.
179
 
 
180
  ![ROC-AUC Curve for Progressive Disease](Figures/Figure_12.png)
181
 
182
  - **ROC-AUC Curve**:
183
+ With an AUC of 0.98, the model demonstrates exceptional predictive power.
184
 
185
  ##### **Prediction Target: Necessity of Transplantation**
186
 
187
  ![Cross-validation Accuracy for Necessity of Transplantation](Figures/Figure_13.png)
188
 
 
189
  - **Cross-validation Accuracy**:
190
+ Cross-validation reveals excellent model accuracy.
191
 
192
+ ![Train vs Test Accuracy for Necessity of Transplantation](Figures/Figure_14.png)
193
+
194
+ ![Feature Importance for Necessity of Transplantation](Figures/Figure_15.png)
195
 
196
  - **Feature Importance**:
197
+ "RadioWorsening2y", "FVC (%) 1 year after diagnosis", and "Comorbidities" are critical.
198
 
199
+ ![ROC-AUC Curve for Necessity of Transplantation](Figures/Figure_16.png)
200
 
201
  - **ROC-AUC Curve**:
202
+ The model achieves an AUC of 1.00.
203
 
204
  ---
205
 
206
  #### **Future Improvements**
207
 
208
  - **Optimizing Variable Names**:
209
+ Review and refine variable naming conventions for clarity.
210
 
211
  - **Improving Model Precision**:
212
+ Retrain the model with a larger dataset.
213
 
214
  - **Identifying Optimal Medical Variables**:
215
+ Simplify the model by removing less relevant variables.
216
 
217
  - **Testing Model Performance with Reduced Variables**:
218
+ Assess predictive performance with a reduced set of variables.
219
 
220
  - **Expanding Dataset Diversity**:
221
+ Incorporate data from diverse demographics.
222
 
223
  - **Adding Longitudinal Data Analysis**:
224
+ Integrate temporal patterns in disease progression.
225
 
226
  - **Real-time Model Retraining**:
227
+ Develop mechanisms for real-time updates.
228
 
229
  ### Associated Space
230
 
 
235
  ---
236
 
237
  This README provides a comprehensive guide to understanding and using the **FibroPred** predictive system effectively.
238
+