Spaces:
Sleeping
Sleeping
<html lang="en"> | |
<head> | |
<meta charset="UTF-8" /> | |
<meta name="viewport" content="width=device-width, initial-scale=1.0" /> | |
<title>SpaCy NER Training Guide</title> | |
<link | |
rel="stylesheet" | |
href="https://maxcdn.bootstrapcdn.com/bootstrap/4.5.2/css/bootstrap.min.css" | |
/> | |
<style> | |
body { | |
background-color: #121212; | |
font-family: "Poppins", sans-serif; | |
color: #e0e0e0; | |
margin: 0; | |
padding: 0; | |
} | |
h1, | |
h2 { | |
color: #007bff; | |
} | |
.step { | |
margin-bottom: 30px; | |
border: 1px solid #007bff; | |
border-radius: 5px; | |
padding: 20px; | |
background-color: #1e1e1e; | |
} | |
.btn-primary { | |
color: #fff; | |
background-color: #007bff; | |
border: 1px solid #007bff; | |
} | |
.btn-primary:hover { | |
background-color: transparent; | |
border: 1px solid #007bff; | |
} | |
</style> | |
</head> | |
<body> | |
<div class="container"> | |
<h1>SpaCy NER Model Training Guide</h1> | |
<div class="step"> | |
<h2>Step 1: Upload Your Resume File</h2> | |
<p> | |
Upload a resume or document file for text extraction. Supported | |
formats include: | |
</p> | |
<ul> | |
<li>PDF</li> | |
<li>DOCX (Word Document)</li> | |
<li>RSF (Rich Structured Format)</li> | |
<li>ODT (Open Document Text)</li> | |
<li>PNG, JPG, JPEG (Image Formats)</li> | |
<li>JSON</li> | |
</ul> | |
<p> | |
Ensure that your file is in one of the supported formats before | |
uploading. The system will extract and process the text from your | |
document automatically. | |
</p> | |
<a href="{{ url_for('index') }}" class="btn btn-primary" | |
>Proceed to Upload</a | |
> | |
</div> | |
<div class="step"> | |
<h2>Step 2: Preview and Edit Extracted Text</h2> | |
<p> | |
After uploading your document, you will be shown a preview of the | |
extracted text. This preview allows you to edit the text if needed to | |
correct any extraction errors or remove unwanted content. Once you're | |
satisfied, click "Next" to proceed to Named Entity Recognition (NER) | |
annotations. | |
</p> | |
<a href="{{ url_for('text_preview') }}" class="btn btn-primary" | |
>Proceed to Text Preview</a | |
> | |
</div> | |
<div class="step"> | |
<h2>Step 3: Annotate Named Entities</h2> | |
<p> | |
In this step, you will preview the Named Entity Recognition (NER) | |
results generated from your text. You can add new entity labels, | |
select relevant text for each label, and make manual adjustments. Once | |
you’ve annotated the text with the appropriate labels, save your | |
annotations and export the data in JSON format for model training. | |
NOTE:(following labels can be taken in use: ["ABOUT","CERTIFICATE", | |
"COMPANY","CONTACT","COURSE", "DOB", "EMAIL", "EXPERIENCE", "HOBBIES", | |
"INSTITUTE", "JOB_TITLE", "LANGUAGE", "LAST_QUALIFICATION_YEAR", "LINK", | |
"LOCATION", "PERSON", "PROJECTS", "QUALIFICATION", "SCHOOL", "SKILL", | |
"SOFT_SKILL", "UNIVERSITY", "YEARS_EXPERIENCE"]) | |
</p> | |
<p>Instructions:</p> | |
<ul> | |
<li>Click "Begin!" to load the extracted text.</li> | |
<li> | |
Highlight sections of the text and assign them to the available | |
labels. | |
</li> | |
<li>Add new labels if necessary.</li> | |
<li> | |
Once done, click "Export" to download your annotations as a JSON | |
file. | |
</li> | |
</ul> | |
<a href="{{ url_for('ner_preview') }}" class="btn btn-primary" | |
>Proceed to NER Annotation</a | |
> | |
</div> | |
<div class="step"> | |
<h2>Step 4: Save and Format JSON Data</h2> | |
<p> | |
Upload your annotated JSON file from the previous step. The system | |
will process and reformat the JSON file to ensure compatibility with | |
the SpaCy model training process. After formatting, you can proceed to | |
the model training step. | |
</p> | |
<p>Instructions:</p> | |
<ul> | |
<li> | |
Upload the JSON file you downloaded after the annotation step. | |
</li> | |
<li>Click "Process" to reformat the file.</li> | |
<li> | |
Once processing is complete, click "Next" to proceed with training. | |
</li> | |
</ul> | |
<a href="{{ url_for('json_file') }}" class="btn btn-primary" | |
>Proceed to Save JSON</a | |
> | |
</div> | |
<div class="step"> | |
<h2>Step 5: Train the NER Model</h2> | |
<p> | |
In this final step, you will convert the formatted JSON data into the | |
SpaCy format and begin training the NER model. You can customize the | |
training by selecting the number of epochs (iterations) the model will | |
go through and setting the version for the trained model. | |
</p> | |
<p>Guidelines:</p> | |
<ul> | |
<li> | |
Number of epochs: The higher the number of epochs, the more times | |
the model will learn from the data, but too many epochs can lead to | |
overfitting. Start with 10 epochs for a balanced training approach. | |
</li> | |
<li> | |
Model versioning: Provide a version name for this training session, | |
so you can keep track of different versions of the model. | |
</li> | |
</ul> | |
<p> | |
Once the training is complete, you can download the latest version of | |
the trained model for use in production. | |
</p> | |
<a href="{{ url_for('spacy_file') }}" class="btn btn-primary" | |
>Proceed to Model Training</a | |
> | |
</div> | |
</div> | |
<script src="https://code.jquery.com/jquery-3.5.1.slim.min.js"></script> | |
<script src="https://cdn.jsdelivr.net/npm/@popperjs/[email protected]/dist/umd/popper.min.js"></script> | |
<script src="https://stackpath.bootstrapcdn.com/bootstrap/4.5.2/js/bootstrap.min.js"></script> | |
</body> | |
</html> | |