tsreekrishna's picture
updated readme
ef5531e
---
license: other
language:
- en
metrics:
- precision
- f1
pipeline_tag: tabular-classification
tags:
- agriculture
- remote sensing
- satellite imagery
- crop classification
library_name: XGBoost
library_version: 2.0.3
---
# Model Card for AI-Enhanced Crop Field Data Curation
## Summary
This repository primarily features models for classifying crop types for the Kharif and Rabi seasons. These models have been trained and fine-tuned by Wadhwani AI on open-source Sentinel-1 and Sentinel-2 image datasets, with ground truth data supplied by the [Mahalanobis National Crop Forecast Center](https://www.ncfc.gov.in/about-us.html).
## Model Details
Scalers have been built using the `StandardScaler` from [scikit-learn library](https://scikit-learn.org/stable/) and ML classifiers have been trained using [XGBoost](https://xgboost.readthedocs.io/en/latest/).
## Training Data
- Sowing Year:
- `rabi`: 2022
- `kharif`: 2023
- location: Please find the location wise crop distribution [here](https://drive.google.com/file/d/1HEC9r3cu17eeXOssxjDfHg-x8cidMH8j/view?usp=sharing)
- Predictors:
- Source: Sentinel-2 and sentinel-1 image data from Google Earth Engine
- data type:
- `rabi`: Fortnightly recorded NDVI(Normalized Difference Vegetation Index) values for individual crop lands throughout the entire Rabi season (October to April).
- `kharif`: NDVI values and VH(Vertical-Horizontal Polarization) values recorded at fortnightly intervals for individual crop lands throughout the entire Kharif season (May to November).
- GT Source: Ground truth data curated by [Mahalanobis National Crop Forecast Center](https://www.ncfc.gov.in/about-us.html)
## Kharif Season Models
### Crop Type Classifiers
These models can be used to scale and predict crop types for the Kharif season. For predictions, you can use either the entire season’s NDVI and VH data (from the 1st fortnight of May to the 2nd fortnight of November) or a subset of this data for early crop type identification. Ensure that the model you use is trained with the same dataset you are applying for predictions. Target labels are: `{0: Paddy, 1:Sugarcane, 2: Cotton}`.
- `kharif_ctc_scaler.pkl`: Standard scaler for Kharif crop type classification.
- `may_1f-jul_2f_kharif_ctc.pkl`: Crop type classifier trained on data from 2nd fortnight of May to 2nd fortnight of July.
- `may_1f-aug_1f_kharif_ctc.pkl`: Crop type classifier trained on data from 1st fortnight of May to 1st fortnight of August.
- `may_1f-aug_2f_kharif_ctc.pkl`: Crop type classifier trained on data from 2nd fortnight of May to 2nd fortnight of August.
- `may_1f-sep_1f_kharif_ctc.pkl`: Crop type classifier trained on data from 1st fortnight of May to 1st fortnight of September.
- `may_1f-sep_2f_kharif_ctc.pkl`: Crop type classifier trained on data from 2nd fortnight of May to 2nd fortnight of September.
- `may_1f-oct_1f_kharif_ctc.pkl`: Crop type classifier trained on data from 1st fortnight of May to 1st fortnight of October.
- `may_1f-oct_2f_kharif_ctc.pkl`: Crop type classifier trained on data from 2nd fortnight of May to 2nd fortnight of October.
- `may_1f-nov_1f_kharif_ctc.pkl`: Crop type classifier trained on data from 1st fortnight of May to 1st fortnight of November.
- `may_1f-nov_2f_kharif_ctc.pkl`: Crop type classifier trained on data from 2nd fortnight of May to 2nd fortnight of November.
## Rabi Season Models
### Crop Type Classifiers
These models can be used to scale and predict crop types for the Rabi season. For predictions, you can use either the entire season’s NDVI data (from the 1st fortnight of October to the 2nd fortnight of April) or a subset of this data for early crop type identification. Ensure that the model you use is trained with the same dataset you are applying for predictions. Target labels are: `{0: Mustard, 1: Wheat, 2: Potato, 3: Bengal Gram}`.
- `rabi_ctc_scaler.pkl`: Standard Scaler for Rabi crop type classification.
- `oct_1f-dec_2f_rabi_ctc.pkl`: Crop type classifier trained on data from 2nd fortnight of October to 2nd fortnight of December.
- `oct_1f-jan_1f_rabi_ctc.pkl`: Crop type classifier trained on data from 1st fortnight of October to 1st fortnight of January.
- `oct_1f-jan_2f_rabi_ctc.pkl`: Crop type classifier trained on data from 2nd fortnight of October to 2nd fortnight of January.
- `oct_1f-feb_1f_rabi_ctc.pkl`: Crop type classifier trained on data from 1st fortnight of October to 1st fortnight of February.
- `oct_1f-feb_2f_rabi_ctc.pkl`: Crop type classifier trained on data from 2nd fortnight of October to 2nd fortnight of February.
- `oct_1f-mar_1f_rabi_ctc.pkl`: Crop type classifier trained on data from 1st fortnight of October to 1st fortnight of March.
- `oct_1f-mar_2f_rabi_ctc.pkl`: Crop type classifier trained on data from 2nd fortnight of October to 2nd fortnight of March.
- `oct_1f-apr_1f_rabi_ctc.pkl`: Crop type classifier trained on data from 1st fortnight of October to 1st fortnight of April.
- `oct_1f-apr_2f_rabi_ctc.pkl`: Crop type classifier trained on data from 2nd fortnight of October to 2nd fortnight of April.
### Other Crop Rejection Classifiers
These models can be used to scale and reject other crop types for the Rabi season. For predictions, you can use either the entire season’s NDVI data (from the 1st fortnight of October to the 2nd fortnight of April) or a subset of this data for rejection. Ensure that the model you use is trained with the same dataset you are applying for predictions. Target labels are: `{0: Desired, 1: Others}`.
- `rabi_ocr_scaler.pkl`: Standard Scaler for Rabi other crop rejection.
- `oct_1f-dec_2f_rabi_ocr.pkl`: Other crop rejection classifier trained on data from 2nd fortnight of October to 2nd fortnight of December.
- `oct_1f-jan_1f_rabi_ocr.pkl`: Other crop rejection classifier trained on data from 1st fortnight of October to 1st fortnight of January.
- `oct_1f-jan_2f_rabi_ocr.pkl`: Other crop rejection classifier trained on data from 2nd fortnight of October to 2nd fortnight of January.
- `oct_1f-feb_1f_rabi_ocr.pkl`: Other crop rejection classifier trained on data from 1st fortnight of October to 1st fortnight of February.
- `oct_1f-feb_2f_rabi_ocr.pkl`: Other crop rejection classifier trained on data from 2nd fortnight of October to 2nd fortnight of February.
- `oct_1f-mar_1f_rabi_ocr.pkl`: Other crop rejection classifier trained on data from 1st fortnight of October to 1st fortnight of March.
- `oct_1f-mar_2f_rabi_ocr.pkl`: Other crop rejection classifier trained on data from 2nd fortnight of October to 2nd fortnight of March.
- `oct_1f-apr_1f_rabi_ocr.pkl`: Other crop rejection classifier trained on data from 1st fortnight of October to 1st fortnight of April.
- `oct_1f-apr_2f_rabi_ocr.pkl`: Other crop rejection classifier trained on data from 2nd fortnight of October to 2nd fortnight of April.
## Usage
### Installation
To use the models, you need to have the XGBoost, scikit-learn and pandas library installed. You can install it via pip:
```bash
pip install xgboost~=2.0.3
pip install scikit-learn~=1.5.0
pip install pandas~=2.2.2
```
### Inference
Here is a sample code snippet for using the model to classify rabi season crop types:
```python
import xgboost, scikit-learn
import pandas as pd
import pickle
# Load your desired scaler and model
with open('path/to/rabi_ctc_scaler.pkl', 'rb') as f:
scaler = pickle.load(f)
with open('path/to/oct_1f-apr_2f_rabi_ctc.pkl', 'rb') as f:
model = pickle.load(f)
# Prepare your data
# Assuming data is a pandas dataframe with features corresponding to NDVI(normalized to 0-200) from 1st fortnight of October to 2nd fortnight of April
data = pd.DataFrame(data=[[115, 122, 138, 145, 152, 159, 165, 172, 140, 130, 110],
[116, 123, 139, 146, 153, 160, 166, 173, 141, 131, 111]],
columns=['oct_1f', 'oct_2f', 'nov_1f', 'nov_2f', 'dec_1f', 'dec_2f', 'jan_1f',
'jan_2f', 'feb_1f', 'feb_2f', 'mar_1f', 'mar_2f', 'apr_1f', 'apr_2f'])
# Scale your data
scaled_data = scaler.transform(data)
# Make predictions
predictions = model.predict(scaled_data)
# Interpret predictions
# Rabi Season Crops
class_mapping = {0: 'Mustard', 1: 'Wheat', 2: 'Potato', 'Bengal Gram'}
classified_crops = list(map(lambda label: class_mapping[label], predictions))
print(classified_crops)
```
For Kharif Season:
```python
import xgboost, scikit-learn
import pandas as pd
import pickle
# Load your desired scaler and model
with open('path/to/kharif_ctc_scaler.pkl', 'rb') as f:
scaler = pickle.load(f)
with open('path/to/may_1f-nov_2f_kharif_ctc.pkl', 'rb') as f:
model = pickle.load(f)
# Prepare your data
# Assuming data is a pandas dataframe with features corresponding to NDVI(normalized to 0-200) and VH(in db) alternatively from 1st fortnight of May to 2nd fortnight of November
data = pd.DataFrame(data=[-10.0000, 110.0000, -11.6667, 118.3333, -13.3333, 126.6667, -15.0000, 135.0000, -16.6667, 143.3333, -18.3333, 151.6667, -20.0000, 160.0000, -21.6667, 168.3333, -23.3333, 176.6667, -25.0000, 185.0000, -26.6667, 186.6667, -28.3333, 188.3333],
columns= ['may_1f_vh', 'may_1f_ndvi', 'may_2f_vh', 'may_2f_ndvi', 'jun_1f_vh',
'jun_1f_ndvi', 'jun_2f_vh', 'jun_2f_ndvi', 'jul_1f_vh', 'jul_1f_ndvi',
'jul_2f_vh', 'jul_2f_ndvi', 'aug_1f_vh', 'aug_1f_ndvi', 'aug_2f_vh',
'aug_2f_ndvi', 'sep_1f_vh', 'sep_1f_ndvi', 'sep_2f_vh', 'sep_2f_ndvi',
'oct_1f_vh', 'oct_1f_ndvi', 'oct_2f_vh', 'oct_2f_ndvi', 'nov_1f_vh',
'nov_1f_ndvi', 'nov_2f_vh', 'nov_2f_ndvi'])
# Scale your data
scaled_data = scaler.transform(data)
# Make predictions
predictions = model.predict(scaled_data)
# Interpret predictions
# Rabi Season Crops
class_mapping = {0: Paddy, 1:Sugarcane, 2: Cotton}
classified_crops = list(map(lambda label: class_mapping[label], predictions))
print(classified_crops)
```
## Out-of-Scope
The models are not designed to handle crops outside what has been mentioned under targets. Our `ocr` models for the Rabi season may face potential misclassification due to unrefined “Others" crop data. For the Kharif season, the lack of ground truth data limits our model to classifying only sugarcane, paddy, and cotton, with no out-of-distribution rejection mechanisms. Although effective in tested regions, further validation is needed to ensure performance across diverse geographical areas.
## Abbreviations
- `NDVI`: Normalized Difference Vegetation Index
- `VH`: Vertical-Horizontal Polarization
- `Kharif`: Season of heavy rainfall (May to November)
- `Rabi`: Season of scanty rainfall (October to April)
- `Crop lands`: Areas of land that are planted with crops
- `Ground` truth data: Data that is considered to be true and accurate
- `1f` : First fortnight of the month
- `2f`: Second fortnight of the month
## Contact
For any queries, please feel free to reach out to us at this email: [[email protected]]([email protected])