|
--- |
|
license: apache-2.0 |
|
datasets: |
|
- PipableAI/pip-txt-to-sql-spider-bird-dataset |
|
language: |
|
- en |
|
metrics: |
|
- accuracy |
|
tags: |
|
- document |
|
- code |
|
- text2sql |
|
- instruction_tuned |
|
- basemodel |
|
- jax |
|
- pytorch |
|
- tensorflow |
|
- text-generation-inference |
|
library_name: transformers |
|
pipeline_tag: text-generation |
|
widget: |
|
- text: "<schema>CREATE TABLE system(JobID: String,GID: String, UID: String, Start:Time(yyyy/mm/dd), End: Time,ElapsedRaw: Time, CPUTimeRAW: Time,NCPUS: Number,NNodes: Number, NodeList: List, State:String, Timelimit: Time);</schema><question>Get UID and job id for Jobs that started on Jan 20 , 2023 ended on feb 14 2023 and has job id 20</question><sql>" |
|
example_title: "example" |
|
|
|
--- |
|
# pip-parse |
|
|
|
[pipableAi](https://www.linkedin.com/company/pipable.ai/about/) |
|
|
|
[colab_notebook]() |
|
|
|
## What have we built? |
|
A 1.3 bn code documentation model that outperforms most models on documenting codes and making your in-house libs ready for LLM and RAG pipelines. |
|
We have also open sourced a parsing lib for the same , together the lib and model can turn your codebase to functional parse tree ready to be consumed by LLMs to execute complex tasks. |
|
This is a further trained version of pip-sql-1.3b. |
|
|
|
|
|
|
|
## How we built it? |
|
|
|
We used softmax cross entropy and a modified form of policy grad along with Q loss, optimized in an EM set up. |
|
Loss behaviour in the set up mentioned above - |
|
|
|
|
|
## License |
|
The model is open source under apache 2.0. License |
|
|
|
## Usage |
|
|
|
### Installation |
|
|
|
```bash |
|
pip install transformers |
|
``` |
|
|
|
### Prompt |
|
```python |
|
prompt = f"""<code>{code}</code> |
|
<question>Document the code above</question> |
|
<doc>""" |
|
``` |
|
|
|
### PyTorch |
|
```python |
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
device = "cuda" |
|
model = AutoModelForCausalLM.from_pretrained("PipableAI/pip-parser") |
|
tokenizer = AutoTokenizer.from_pretrained("PipableAI/pip-parser") |
|
|
|
inputs = tokenizer(text, return_tensors="pt") |
|
outputs = model.generate(**inputs, max_new_tokens=300) |
|
tokenizer.decode(outputs[0], skip_special_tokens=True).split('<doc>')[-1].split('</doc>')[0] |
|
``` |
|
|
|
|
|
|
|
## Examples |
|
|
|
### Code |
|
```python |
|
<code> |
|
########################### |
|
# Generate Analytical Model |
|
########################### |
|
################################################## |
|
# func: get_np_array_transition_probability_matrix |
|
################################################## |
|
def get_np_array_transition_probability_matrix(int_num_states, np_array_A_matrix): |
|
print('np_array_A_matrix:') |
|
print(np_array_A_matrix) |
|
##################################################### |
|
# Perturb the adjacency matrix to avoid singularities |
|
##################################################### |
|
np_array_A_matrix += (np.full((int_num_states, int_num_states), float_eps) - (np.identity(int_num_states) * float_eps)) |
|
print('np_array_A_matrix:') |
|
print(np_array_A_matrix) |
|
print('np_array_D_matrix:') |
|
np_array_D_matrix = np.diag(np.sum(np_array_A_matrix, axis=1)) |
|
print(np_array_D_matrix) |
|
print('np_array_D_matrix_inv:') |
|
np_array_D_matrix_inv = np.linalg.inv(np_array_D_matrix) |
|
print(np_array_D_matrix_inv) |
|
print('\n\n') |
|
print('np_array_P_matrix:') |
|
np_array_P_matrix = np.dot(np_array_D_matrix_inv, np_array_A_matrix) |
|
print(np_array_P_matrix) |
|
print('np.sum(np_array_P_matrix, axis=1):') |
|
print(np.sum(np_array_P_matrix, axis=1)) |
|
print('\n\n') |
|
return np_array_P_matrix |
|
################################################## |
|
# func: get_np_array_perron_frobenius_eigen_vector |
|
################################################## |
|
def get_np_array_perron_frobenius_matrix(int_num_states, np_array_P_matrix): |
|
np_array_perron_frobenius_matrix = np.linalg.matrix_power(np_array_P_matrix,1000) |
|
np_array_perron_frobenius_vector = np_array_perron_frobenius_matrix[0,:] |
|
print('np_array_perron_frobenius_matrix:') |
|
print(np_array_perron_frobenius_matrix) |
|
print('np.sum(np_array_perron_frobenius_matrix, axis=1):') |
|
print(np.sum(np_array_perron_frobenius_matrix, axis=1)) |
|
print('np.sum(np_array_perron_frobenius_matrix, axis=0):') |
|
print(np.sum(np_array_perron_frobenius_matrix, axis=0)) |
|
print('np.sum(np_array_perron_frobenius_matrix, axis=0)/int_num_states:') |
|
print(np.sum(np_array_perron_frobenius_matrix, axis=0)/int_num_states) |
|
print('np.dot(np_array_perron_frobenius_vector, np_array_P_matrix):') |
|
print(np.dot(np_array_perron_frobenius_vector, np_array_P_matrix)) |
|
print('np_array_perron_frobenius_vector:') |
|
print(np_array_perron_frobenius_vector) |
|
print('\n\n') |
|
return np_array_perron_frobenius_vector, np_array_perron_frobenius_matrix |
|
############################# |
|
# func: get_np_array_Z_matrix |
|
############################# |
|
def get_np_array_Z_matrix(int_num_states, np_array_P_matrix, np_array_perron_frobenius_matrix): |
|
np_array_Z_matrix = np.linalg.inv(np.identity(int_num_states) - np_array_P_matrix + np_array_perron_frobenius_matrix) |
|
print('np_array_Z_matrix:') |
|
print(np_array_Z_matrix) |
|
print('\n\n') |
|
return(np_array_Z_matrix) |
|
############################# |
|
# func: get_np_array_H_matrix |
|
############################# |
|
def get_np_array_H_matrix(int_num_states, np_array_Z_matrix, np_array_perron_frobenius_vector): |
|
np_array_H_matrix = np.zeros([int_num_states, int_num_states]) |
|
for i in range(int_num_states): |
|
for j in range(int_num_states): |
|
np_array_H_matrix[i][j] = (np_array_Z_matrix[j][j] - np_array_Z_matrix[i][j])/np_array_perron_frobenius_vector[j] |
|
print('np_array_H_matrix:') |
|
print(np_array_H_matrix) |
|
print('\n\n') |
|
return np_array_H_matrix |
|
########### |
|
# func: run |
|
########### |
|
def run(np_array_A_matrix): |
|
int_num_states = len(np_array_A_matrix) |
|
np_array_P_matrix = get_np_array_transition_probability_matrix(int_num_states, np_array_A_matrix) |
|
np_array_perron_frobenius_vector, np_array_perron_frobenius_matrix = get_np_array_perron_frobenius_matrix(int_num_states, np_array_P_matrix) |
|
np_array_Z_matrix = get_np_array_Z_matrix(int_num_states, np_array_P_matrix, np_array_perron_frobenius_matrix) |
|
np_array_H_matrix = get_np_array_H_matrix(int_num_states, np_array_Z_matrix, np_array_perron_frobenius_vector) |
|
return(np_array_H_matrix) |
|
<question>Document the python code above. |
|
</question><doc> |
|
``` |
|
|
|
### Response |
|
What are the email address, town and county of the customers who are of the least common gender? |
|
```python |
|
The Python code provided is used to generate an analytical model for a Markov chain with a given adjacency matrix. |
|
The model is then used to compute the Perron-Frobenius eigenvector and the corresponding matrix. The resulting matrices are then used to compute the Z-matrix and |
|
the H-matrix. The H-matrix is then returned as the output of the function. The code is designed to handle large matrices and perform computations efficiently. |
|
The matrices are manipulated using numpy's powerful and efficient numerical computation library. |
|
The code also includes comments to explain the functionality of each part of the code. |
|
``` |
|
|
|
### Team |
|
Avi Kothari, Pratham Gupta, Ritvik Aryan Kalra, Rohan Bhatial, Soham Acharya |
|
|