File size: 2,380 Bytes
27aa556 314bd5a ab9f287 314bd5a ab9f287 314bd5a ab9f287 c7832f4 ab9f287 c9ac29a 1e20624 c9ac29a ab9f287 7f8b307 ab9f287 913c654 ab9f287 913c654 7f8b307 09e1b9e 58c0d0f dc13096 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 |
---
license: mit
---
# PyAutoCode: GPT-2 based Python auto-code.
PyAutoCode is a cut-down python autosuggestion built on **GPT-2** *(motivation: GPyT)* model. This baby model *(trained only up to 3 epochs)* is not **"fine-tuned"** yet therefore, I highly recommend not to use it in a production environment or incorporate PyAutoCode in any of your projects. It has been trained on **112GB** of Python data sourced from the best crowdsource platform ever -- **GitHub**.
*NOTE: Increased training and fine tuning would be highly appreciated and I firmly believe that it would improve the ability of PyAutoCode significantly.*
## Some Model Features
- Built on *GPT-2*
- Tokenized with *ByteLevelBPETokenizer*
- Data Sourced from *GitHub (almost 5 consecutive days of latest Python repositories)*
- Makes use of *GPTLMHeadModel* and *DataCollatorForLanguageModelling* for training
- Newline characters are custom coded as `<N>`
## Get a Glimpse of the Model
You can make use of the **Inference API** of huggingface *(present on the right sidebar)* to load the model and check the result. Just enter any code snippet as input. Something like:
```sh
for i in range(
```
## Usage
You can use my model too!. Here's a quick tour of how you can achieve this:
Install transformers
```sh
$ pip install transformers
```
Call the API and get it to work!
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("P0intMaN/PyAutoCode")
model = AutoModelForCausalLM.from_pretrained("P0intMaN/PyAutoCode")
# input: single line or multi-line. Highly recommended to use doc-strings.
inp = """import pandas"""
format_inp = inp.replace('\n', "<N>")
tokenize_inp = tokenizer.encode(format_inp, return_tensors='pt')
result = model.generate(tokenize_inp)
decode_result = tokenizer.decode(result[0])
format_result = decode_result.replace('<N>', "\n")
# printing the result
print(format_result)
```
Upon successful execution, the above should probably produce *(your results may vary when this model is fine-tuned)*
```sh
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
```
## Credits
##### *Developed as a part of a university project by [Pratheek U](https://www.github.com/P0intMaN) and [Sourav Singh](https://github.com/Sourav11902312lpu)* |