File size: 2,380 Bytes

27aa556
 
 
314bd5a
 
 
ab9f287
314bd5a
ab9f287
314bd5a
ab9f287
 
 
 
 
 
c7832f4
ab9f287
c9ac29a
 
1e20624
 
 
 
 
c9ac29a
ab9f287
 
 
 
7f8b307
 
 
 
 
 
ab9f287
913c654
ab9f287
913c654
 
 
7f8b307
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
09e1b9e
58c0d0f
dc13096

---

license: mit
---


# PyAutoCode: GPT-2 based Python auto-code.

PyAutoCode is a cut-down python autosuggestion built on **GPT-2** *(motivation: GPyT)* model. This baby model *(trained only up to 3 epochs)* is not **"fine-tuned"** yet therefore, I highly recommend not to use it in a production environment or incorporate PyAutoCode in any of your projects. It has been trained on **112GB** of Python data sourced from the best crowdsource platform ever -- **GitHub**.

*NOTE: Increased training and fine tuning would be highly appreciated and I firmly believe that it would improve the ability of PyAutoCode significantly.*

## Some Model Features

- Built on *GPT-2*
- Tokenized with *ByteLevelBPETokenizer*
- Data Sourced from *GitHub (almost 5 consecutive days of latest Python repositories)*
- Makes use of *GPTLMHeadModel* and *DataCollatorForLanguageModelling* for training
- Newline characters are custom coded as `<N>`

## Get a Glimpse of the Model

You can make use of the **Inference API** of huggingface *(present on the right sidebar)* to load the model and check the result. Just enter any code snippet as input. Something like:

```sh

for i in range(

```

## Usage

You can use my model too!. Here's a quick tour of how you can achieve this:

Install transformers
```sh

$ pip install transformers

```

Call the API and get it to work!
```python

from transformers import AutoTokenizer, AutoModelForCausalLM



tokenizer = AutoTokenizer.from_pretrained("P0intMaN/PyAutoCode")



model = AutoModelForCausalLM.from_pretrained("P0intMaN/PyAutoCode")



# input: single line or multi-line. Highly recommended to use doc-strings.

inp = """import pandas"""



format_inp = inp.replace('\n', "<N>")

tokenize_inp = tokenizer.encode(format_inp, return_tensors='pt')

result = model.generate(tokenize_inp)



decode_result = tokenizer.decode(result[0])

format_result = decode_result.replace('<N>', "\n")



# printing the result

print(format_result)

```

Upon successful execution, the above should probably produce *(your results may vary when this model is fine-tuned)*
```sh

import pandas as pd

import numpy as np

import matplotlib.pyplot as plt



```
## Credits
##### *Developed as a part of a university project by [Pratheek U](https://www.github.com/P0intMaN) and [Sourav Singh](https://github.com/Sourav11902312lpu)*