zuminghuang commited on
Commit
0a05754
·
verified ·
1 Parent(s): 18c3a4b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -216
README.md CHANGED
@@ -46,222 +46,7 @@ Overview of Infinity-Parser training framework. Our model is optimized via reinf
46
 
47
  # Quick Start
48
 
49
- ## Install Infinity_Parser
50
- ```shell
51
- conda create -n Infinity_Parser python=3.11
52
- conda activate Infinity_Parser
53
-
54
- git clone https://github.com/infly-ai/INF-MLLM.git
55
- cd INF-MLLM/Infinity-Parser
56
- # Install pytorch, see https://pytorch.org/get-started/previous-versions/ for your cuda version
57
- conda install pytorch==2.5.0 torchvision==0.20.0 torchaudio==2.5.0 pytorch-cuda=12.1 -c pytorch -c nvidia
58
- pip install .
59
- ```
60
- Before starting, make sure that **PyTorch** is correctly installed according to the official installation guide at [https://pytorch.org/](https://pytorch.org/).
61
-
62
- ## Download Model Weights
63
-
64
- ```shell
65
- pip install -r requirements.txt
66
-
67
- python3 tools/download_model.py
68
- ```
69
-
70
- ## Vllm Inference
71
- We recommend using the vLLM backend for accelerated inference.
72
- It supports image and PDF inputs, automatically parses the document content, and exports the results in Markdown format to a specified directory.
73
-
74
- ```shell
75
- parser --model /path/model --input dir/PDF/Image --output output_folders --batch_size 128 --tp 1
76
- ```
77
-
78
- Adjust the tensor parallelism (tp) value — 1, 2, or 4 — and the batch size according to the number of GPUs and the available memory.
79
-
80
- <details>
81
- <summary> [The information of result folder] </summary>
82
- The result folder contains the following contents:
83
-
84
- ```
85
- output_folders/
86
- ├── <file_name>/output.md
87
- ├── ...
88
- ├── ...
89
- ```
90
-
91
- </details>
92
-
93
- ### Online Serving
94
-
95
- <details>
96
- <summary> Example </summary>
97
-
98
- - Launch the vLLM Server
99
-
100
- ```shell
101
- vllm serve /path/to/model --tensor-parallel-size=4 --served-model-name=Infinity_Parser
102
- ```
103
-
104
- - Python Client Example
105
-
106
- ```python
107
- import os
108
- import re
109
- import sys
110
- import json
111
- from PIL import Image
112
- from openai import OpenAI, AsyncOpenAI
113
- import base64, pathlib
114
-
115
- prompt = r'''You are an AI assistant specialized in converting PDF images to Markdown format. Please follow these instructions for the conversion:
116
-
117
- 1. Text Processing:
118
- - Accurately recognize all text content in the PDF image without guessing or inferring.
119
- - Convert the recognized text into Markdown format.
120
- - Maintain the original document structure, including headings, paragraphs, lists, etc.
121
-
122
- 2. Mathematical Formula Processing:
123
- - Convert all mathematical formulas to LaTeX format.
124
- - Enclose inline formulas with \( \). For example: This is an inline formula \( E = mc^2 \)
125
- - Enclose block formulas with \\[ \\]. For example: \[ \frac{-b \pm \sqrt{b^2 - 4ac}}{2a} \]
126
-
127
- 3. Table Processing:
128
- - Convert tables to HTML format.
129
- - Wrap the entire table with <table> and </table>.
130
-
131
- 4. Figure Handling:
132
- - Ignore figures content in the PDF image. Do not attempt to describe or convert images.
133
-
134
- 5. Output Format:
135
- - Ensure the output Markdown document has a clear structure with appropriate line breaks between elements.
136
- - For complex layouts, try to maintain the original document's structure and format as closely as possible.
137
-
138
- Please strictly follow these guidelines to ensure accuracy and consistency in the conversion. Your task is to accurately convert the content of the PDF image into Markdown format without adding any extra explanations or comments.
139
- '''
140
-
141
- def encode_image(image_path):
142
- with open(image_path, "rb") as image_file:
143
- return base64.b64encode(image_file.read()).decode("utf-8")
144
-
145
-
146
- def build_message(image_path, prompt):
147
-
148
- content = [
149
- {
150
- "type": "image_url",
151
- "image_url": {
152
- "url": f"data:image/jpeg;base64,{encode_image(image_path)}"
153
- }
154
- },
155
- {"type": "text", 'text': prompt}
156
- ]
157
- messages = [
158
- {"role": "system", "content": "You are a helpful assistant."},
159
- {'role': 'user', 'content': content}
160
- ]
161
-
162
- return messages
163
-
164
- client = OpenAI(
165
- api_key="EMPTY",
166
- base_url="http://localhost:8000/v1",
167
- )
168
-
169
-
170
- def request(messages):
171
- completion = client.chat.completions.create(
172
- messages=messages,
173
- extra_headers={
174
- "Authorization": f"Bearer {Authorization}"
175
- },
176
- model="Infinity_Parser",
177
- max_completion_tokens=8192,
178
- temperature=0.0,
179
- top_p=0.95
180
- )
181
-
182
- return completion.choices[0].message.content
183
-
184
-
185
- if __name__ == "__main__":
186
- img_path = "path/to/image.png"
187
- res = build_message(img_path, prompt)
188
- print(res)
189
- ```
190
- </details>
191
-
192
- ## Using Transformers to Inference
193
-
194
- <details>
195
- <summary> Transformers Inference Example </summary>
196
-
197
- ```python
198
- import torch
199
- from transformers import Qwen2_5_VLForConditionalGeneration, AutoTokenizer, AutoProcessor
200
- from qwen_vl_utils import process_vision_info
201
-
202
- model_path = "infly/Infinity-Parser-7B"
203
- prompt = "Please transform the document’s contents into Markdown format."
204
-
205
- print("Loading model and processor...")
206
- # Default: Load the model on the available device(s)
207
- # model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
208
- # model_path, torch_dtype="auto", device_map="auto"
209
- # )
210
-
211
- # We recommend enabling flash_attention_2 for better acceleration and memory saving, especially in multi-image and video scenarios.
212
- model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
213
- model_path,
214
- torch_dtype=torch.bfloat16,
215
- attn_implementation="flash_attention_2",
216
- device_map="auto",
217
- )
218
-
219
- # Default processor
220
- # processor = AutoProcessor.from_pretrained(model_path)
221
-
222
- # Recommended processor
223
- min_pixels = 256 * 28 * 28 # 448 * 448
224
- max_pixels = 2304 * 28 * 28 # 1344 * 1344
225
- processor = AutoProcessor.from_pretrained(model_path, min_pixels=min_pixels, max_pixels=max_pixels)
226
-
227
- print("Preparing messages for inference...")
228
- messages = [
229
- {
230
- "role": "user",
231
- "content": [
232
- {
233
- "type": "image",
234
- "image": "https://ofasys-multimodal-wlcb-3-toshanghai.oss-accelerate.aliyuncs.com/wpf272043/keepme/image/receipt.png",
235
- },
236
- {"type": "text", "text": prompt},
237
- ],
238
- }
239
- ]
240
-
241
- text = processor.apply_chat_template(
242
- messages, tokenize=False, add_generation_prompt=True
243
- )
244
- image_inputs, video_inputs = process_vision_info(messages)
245
- inputs = processor(
246
- text=[text],
247
- images=image_inputs,
248
- videos=video_inputs,
249
- padding=True,
250
- return_tensors="pt",
251
- )
252
- inputs = inputs.to("cuda")
253
-
254
- print("Generating results...")
255
- generated_ids = model.generate(**inputs, max_new_tokens=4096)
256
- generated_ids_trimmed = [
257
- out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
258
- ]
259
- output_text = processor.batch_decode(
260
- generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
261
- )
262
- print(output_text)
263
- ```
264
- </details>
265
 
266
  # Visualization
267
 
 
46
 
47
  # Quick Start
48
 
49
+ Please refer to <a href="https://github.com/infly-ai/INF-MLLM/tree/main/Infinity-Parser#quick-start">Quick_Start.</a>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
50
 
51
  # Visualization
52