File size: 5,957 Bytes
342c773
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="UTF-8" />
    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
    <title>SpaCy NER Training Guide</title>
    <link

      rel="stylesheet"

      href="https://maxcdn.bootstrapcdn.com/bootstrap/4.5.2/css/bootstrap.min.css"

    />
    <style>

      body {

        background-color: #121212;

        font-family: "Poppins", sans-serif;

        color: #e0e0e0;

        margin: 0;

        padding: 0;

      }

      h1,

      h2 {

        color: #007bff;

      }

      .step {

        margin-bottom: 30px;

        border: 1px solid #007bff;

        border-radius: 5px;

        padding: 20px;

        background-color: #1e1e1e;

      }

      .btn-primary {

        color: #fff;

        background-color: #007bff;

        border: 1px solid #007bff;

      }

      .btn-primary:hover {

        background-color: transparent;

        border: 1px solid #007bff;

      }

    </style>
  </head>
  <body>
    <div class="container">
      <h1>SpaCy NER Model Training Guide</h1>

      <div class="step">
        <h2>Step 1: Upload Your Resume File</h2>
        <p>
          Upload a resume or document file for text extraction. Supported
          formats include:
        </p>
        <ul>
          <li>PDF</li>
          <li>DOCX (Word Document)</li>
          <li>RSF (Rich Structured Format)</li>
          <li>ODT (Open Document Text)</li>
          <li>PNG, JPG, JPEG (Image Formats)</li>
          <li>JSON</li>
        </ul>
        <p>
          Ensure that your file is in one of the supported formats before
          uploading. The system will extract and process the text from your
          document automatically.
        </p>
        <a href="{{ url_for('index') }}" class="btn btn-primary"

          >Proceed to Upload</a
        >
      </div>

      <div class="step">
        <h2>Step 2: Preview and Edit Extracted Text</h2>
        <p>
          After uploading your document, you will be shown a preview of the
          extracted text. This preview allows you to edit the text if needed to
          correct any extraction errors or remove unwanted content. Once you're
          satisfied, click "Next" to proceed to Named Entity Recognition (NER)
          annotations.
        </p>
        <a href="{{ url_for('text_preview') }}" class="btn btn-primary"

          >Proceed to Text Preview</a
        >
      </div>

      <div class="step">
        <h2>Step 3: Annotate Named Entities</h2>
        <p>
          In this step, you will preview the Named Entity Recognition (NER)
          results generated from your text. You can add new entity labels,
          select relevant text for each label, and make manual adjustments. Once
          you’ve annotated the text with the appropriate labels, save your
          annotations and export the data in JSON format for model training.
        </p>
        <p>Instructions:</p>
        <ul>
          <li>Click "Begin!" to load the extracted text.</li>
          <li>
            Highlight sections of the text and assign them to the available
            labels.
          </li>
          <li>Add new labels if necessary.</li>
          <li>
            Once done, click "Export" to download your annotations as a JSON
            file.
          </li>
        </ul>
        <a href="{{ url_for('ner_preview') }}" class="btn btn-primary"

          >Proceed to NER Annotation</a
        >
      </div>

      <div class="step">
        <h2>Step 4: Save and Format JSON Data</h2>
        <p>
          Upload your annotated JSON file from the previous step. The system
          will process and reformat the JSON file to ensure compatibility with
          the SpaCy model training process. After formatting, you can proceed to
          the model training step.
        </p>
        <p>Instructions:</p>
        <ul>
          <li>
            Upload the JSON file you downloaded after the annotation step.
          </li>
          <li>Click "Process" to reformat the file.</li>
          <li>
            Once processing is complete, click "Next" to proceed with training.
          </li>
        </ul>
        <a href="{{ url_for('json_file') }}" class="btn btn-primary"

          >Proceed to Save JSON</a
        >
      </div>

      <div class="step">
        <h2>Step 5: Train the NER Model</h2>
        <p>
          In this final step, you will convert the formatted JSON data into the
          SpaCy format and begin training the NER model. You can customize the
          training by selecting the number of epochs (iterations) the model will
          go through and setting the version for the trained model.
        </p>
        <p>Guidelines:</p>
        <ul>
          <li>
            Number of epochs: The higher the number of epochs, the more times
            the model will learn from the data, but too many epochs can lead to
            overfitting. Start with 10 epochs for a balanced training approach.
          </li>
          <li>
            Model versioning: Provide a version name for this training session,
            so you can keep track of different versions of the model.
          </li>
        </ul>
        <p>
          Once the training is complete, you can download the latest version of
          the trained model for use in production.
        </p>
        <a href="{{ url_for('spacy_file') }}" class="btn btn-primary"

          >Proceed to Model Training</a
        >
      </div>
    </div>

    <script src="https://code.jquery.com/jquery-3.5.1.slim.min.js"></script>
    <script src="https://cdn.jsdelivr.net/npm/@popperjs/[email protected]/dist/umd/popper.min.js"></script>
    <script src="https://stackpath.bootstrapcdn.com/bootstrap/4.5.2/js/bootstrap.min.js"></script>
  </body>
</html>