Update README.md
Browse files
README.md
CHANGED
@@ -227,13 +227,13 @@ Raw complexity score: 1.5163
|
|
227 |
|
228 |
**2. Normalization**
|
229 |
The raw score is then normalized to a range of 0-1 using predefined minimum (1.39) and maximum (1.69) normalization values which determined from dataset's score distributions:
|
230 |
-
|
231 |
-
|
232 |
|
233 |
**3. Mapping to Masking Probability**
|
234 |
I decided to use quadratic mapping with 0.3 steps, ensuring smooth masking probability adjustment in range between 15% to 45% with more complex molecules having a higher masking probability:
|
235 |
|
236 |
-
|
237 |
|
238 |
**4. Multi-Strategy Masking**
|
239 |
Three different masking strategies are employed for each SELFIES string:
|
|
|
227 |
|
228 |
**2. Normalization**
|
229 |
The raw score is then normalized to a range of 0-1 using predefined minimum (1.39) and maximum (1.69) normalization values which determined from dataset's score distributions:
|
230 |
+
|
231 |
+
$$Sc_{norm} = max(0, min(1, (Sc - min_{norm}) / (max_{norm} - min_{norm})))$$
|
232 |
|
233 |
**3. Mapping to Masking Probability**
|
234 |
I decided to use quadratic mapping with 0.3 steps, ensuring smooth masking probability adjustment in range between 15% to 45% with more complex molecules having a higher masking probability:
|
235 |
|
236 |
+
$$P_{\text{mask}} = 0.15 + 0.3 * (Sc_{norm})^2$$
|
237 |
|
238 |
**4. Multi-Strategy Masking**
|
239 |
Three different masking strategies are employed for each SELFIES string:
|