lfoppiano commited on
Commit
182ca2f
·
1 Parent(s): a7ac5d0

update documentation

Browse files
Files changed (2) hide show
  1. README.md +16 -9
  2. streamlit_app.py +6 -11
README.md CHANGED
@@ -1,19 +1,26 @@
1
- # DocumentIQA: Scientific Document Insight Question/Answer
2
 
3
  ## Introduction
4
 
5
- Question/Answering on scientific documents.
6
- In our implementation we use [Grobid](https://github.com/kermitt2/grobid) for text extraction instead of the raw PDF2Text converter.
7
- Thanks to Grobid we are able to precisely extract abstract and full-text.
8
- This is just the beginning and publishing might help gathering more feedback.
9
-
10
- **NOTE**: This project focus on scientific articles. Uploading books or other large document might not work as expected.
11
 
12
  **Work in progress**
13
 
14
- https://document-insights.streamlit.app/
 
 
 
 
 
 
 
15
 
16
- **OpenAI or HuggingFace API KEY required**
 
 
 
17
 
18
 
19
  ### Acknolwedgement
 
1
+ # DocumentIQA: Scientific Document Insight QA
2
 
3
  ## Introduction
4
 
5
+ Question/Answering on scientific documents using LLMs (OpenAI, Mistral, LLama2).
6
+ This application is the frontend for testing the RAG (Retrieval Augmented Generation) on scientific documents, that we are developing at NIMS.
7
+ Differently to most of the project, we focus on scientific articles and we are using [Grobid](https://github.com/kermitt2/grobid) for text extraction instead of the raw PDF2Text converter allow to extract only full-text.
 
 
 
8
 
9
  **Work in progress**
10
 
11
+ - Select the model+embedding combination you want ot use.
12
+ - Enter your API Key (Open AI or Huggingface).
13
+ - Upload a scientific article as PDF document. You will see a spinner or loading indicator while the processing is in progress.
14
+ - Once the spinner stops, you can proceed to ask your questions
15
+
16
+ ### Query mode (LLm vs Embeddings)
17
+ By default, the mode is set to LLM (Language Model) which enables question/answering. You can directly ask questions related to the document content, and the system will answer the question using content from the document.
18
+ If you switch the mode to "Embedding," the system will return specific chunks from the document that are semantically related to your query. This mode helps to test why sometimes the answers are not satisfying or incomplete.
19
 
20
+ ## Demo
21
+ The demo is deployed with streamlit and, depending on the model used, requires either OpenAI or HuggingFace **API KEYs**.
22
+
23
+ https://document-insights.streamlit.app/
24
 
25
 
26
  ### Acknolwedgement
streamlit_app.py CHANGED
@@ -118,16 +118,14 @@ if not st.session_state['api_key']:
118
  else:
119
  is_api_key_provided = st.session_state['api_key']
120
 
121
- st.title("📝 Document insight Q&A")
122
- st.subheader("Upload a PDF document, ask questions, get insights.")
123
 
124
  upload_col, radio_col, context_col = st.columns([7, 2, 2])
125
  with upload_col:
126
  uploaded_file = st.file_uploader("Upload an article", type=("pdf", "txt"), on_change=new_file,
127
  disabled=not is_api_key_provided,
128
- help="The file will be uploaded to Grobid, extracted the text and calculated "
129
- "embeddings of each paragraph which are then stored to a Db for be picked "
130
- "to answer specific questions. ")
131
  with radio_col:
132
  mode = st.radio("Query mode", ("LLM", "Embeddings"), disabled=not uploaded_file, index=0,
133
  help="LLM will respond the question, Embedding will show the "
@@ -147,20 +145,17 @@ with st.sidebar:
147
  st.header("Documentation")
148
  st.markdown("https://github.com/lfoppiano/document-qa")
149
  st.markdown(
150
- """After entering your API Key (Open AI or Huggingface). Upload a scientific article as PDF document, click on the designated button and select the file from your device.""")
151
-
152
- st.markdown(
153
- """After uploading, please wait for the PDF to be processed. You will see a spinner or loading indicator while the processing is in progress. Once the spinner stops, you can proceed to ask your questions.""")
154
 
155
  st.markdown("**Revision number**: [" + st.session_state[
156
  'git_rev'] + "](https://github.com/lfoppiano/grobid-magneto/commit/" + st.session_state['git_rev'] + ")")
157
 
158
  st.header("Query mode (Advanced use)")
159
  st.markdown(
160
- """By default, the mode is set to LLM (Language Model) which enables question/answering. You can directly ask questions related to the PDF content, and the system will provide relevant answers.""")
161
 
162
  st.markdown(
163
- """If you switch the mode to "Embedding," the system will return specific paragraphs from the document that are semantically similar to your query. This mode focuses on providing relevant excerpts rather than answering specific questions.""")
164
 
165
  if uploaded_file and not st.session_state.loaded_embeddings:
166
  with st.spinner('Reading file, calling Grobid, and creating memory embeddings...'):
 
118
  else:
119
  is_api_key_provided = st.session_state['api_key']
120
 
121
+ st.title("📝 Scientific Document Insight Q&A")
122
+ st.subheader("Upload a scientific article in PDF, ask questions, get insights.")
123
 
124
  upload_col, radio_col, context_col = st.columns([7, 2, 2])
125
  with upload_col:
126
  uploaded_file = st.file_uploader("Upload an article", type=("pdf", "txt"), on_change=new_file,
127
  disabled=not is_api_key_provided,
128
+ help="The full-text is extracted using Grobid. ")
 
 
129
  with radio_col:
130
  mode = st.radio("Query mode", ("LLM", "Embeddings"), disabled=not uploaded_file, index=0,
131
  help="LLM will respond the question, Embedding will show the "
 
145
  st.header("Documentation")
146
  st.markdown("https://github.com/lfoppiano/document-qa")
147
  st.markdown(
148
+ """After entering your API Key (Open AI or Huggingface). Upload a scientific article as PDF document. You will see a spinner or loading indicator while the processing is in progress. Once the spinner stops, you can proceed to ask your questions.""")
 
 
 
149
 
150
  st.markdown("**Revision number**: [" + st.session_state[
151
  'git_rev'] + "](https://github.com/lfoppiano/grobid-magneto/commit/" + st.session_state['git_rev'] + ")")
152
 
153
  st.header("Query mode (Advanced use)")
154
  st.markdown(
155
+ """By default, the mode is set to LLM (Language Model) which enables question/answering. You can directly ask questions related to the document content, and the system will answer the question using content from the document.""")
156
 
157
  st.markdown(
158
+ """If you switch the mode to "Embedding," the system will return specific chunks from the document that are semantically related to your query. This mode helps to test why sometimes the answers are not satisfying or incomplete. """)
159
 
160
  if uploaded_file and not st.session_state.loaded_embeddings:
161
  with st.spinner('Reading file, calling Grobid, and creating memory embeddings...'):