Implement AI Typesetting in Word

3 months ago 1

This is an article translated from Chinese, original text:

https://medium.com/@mapinxuesmail/ai-%E6%8E%92%E7%89%88-word-%E6%80%8E%E4%B9%88%E5%AE%9E%E7%8E%B0-c801ef0b6f66

Recently, I received a more complex requirement to use LLM to format Word files in a certain format.

The format example is as follows:

1. xxx is the first-level title, 1.1 xxx is the second-level title, using Songti, size 4, bold
The volume name uses black font, size 3, right-aligned, fixed value 35pt
… and so on

The effect to be achieved is: users upload files, AI typeset, and download files. So the core is the back-end large model LLM typesetting Word part.

Briefly describe this process:

Use Python-Docx to read the Word file and read each valid paragraph
Submit the content of this paragraph + the required rules to the big model, let it determine which rule I should use in this paragraph
Encode this rule into a tool and provide it to the big model for calling
Execute the above loop to achieve the final goal

Reading Word files using Python-docx

Reading the file is relatively simple. Just use the official document or AI to write a paragraph:

from docx import Document
from docx.text.paragraph import Paragraph

document = Document(record.upload_file_path)
paragraphs = document.paragraphs

# Divide into chunks every ten paragraphs
chunked_list: list[list[Paragraph]] = list(
chunk_list_loop(paragraphs, 10)
)
total_chunks = len(chunked_list)

The code above shows how to use python-docx to read a Word document and chunk the existing paragraphs of the document to speed up execution and possibly provide context for the execution.

Now that we have read the content of the Word document and chunked it into ten chunks, we should now hand it over to the big model.

AI Identification

The main function of this part is to let AI judge each paragraph of text and decide which rule should be used to format the current paragraph. The process can be broken down again:

Write rule definition
Write prompt words and determine the function of the big model
Inject rules and parameters (paragraph chunks) into prompt words
Let the big model select the typesetting rules applicable to the current paragraph
First, determine our typesetting rules. I will only give an example here:

1. Company logo: fixed pattern
3. Qualification certificate, serial number: Songti size 4, single line spacing
4. Project name: Songti size 2 bold, fixed value 30 pounds
8. Signature: xxx Co., Ltd. XX year XX month XX day (Songti size 3 font)
9. Header: bold size 5, 0.5 line after paragraph, single line spacing
10. Underline of header: 1.5 pounds
12. Manual directory: bold size 2, level 1 title (bold size 4 without bold 13. Table of contents: fixed value 28 pounds
14. Two words "attachment" after the table of contents: bold 4, attachment content: Song 4, fixed value 24 pounds
15. Main text 1st and 2nd level titles: Song 4th bold
16. Main text: Song 4th, fixed value OR minimum value 28 pounds, first line indented 2 characters
17. Tables in the main text: first line indented 2 characters

The above is an example definition. Of course, the rules that are actually used in the production environment will definitely be more detailed.

The second is to write prompts to require the big model to perform our functions. However, according to the process we mentioned at the beginning, the big model should not only help me choose what rules, but also use the tools under this rule to operate the Word document. So we first write tools (code) for the rules here, and then write the correspondence between these rules and tools.

How to write these tools, without using an external framework, directly use Python functions to define these tools to perform operations on Word files. Use AI to generate functions under the rules, and the prompts are as follows:

Please help me write a function using python-docx.
Each rule shown below corresponds to a function,
where the parameters are passed into the paragraph;
and after the code is generated,
a Json is given to map the relationship between the rule
and the function name.

The rules are as follows:
1. xxx...

The above prompt will first generate a series of functions for us, which provide the ability to operate Word documents. In addition, it also provides a mapping relationship so that LLM can directly determine what tool needs to be executed based on the paragraph we provide. The content generated by the above prompt is as follows:

def set_run_font_style(run: Run, font_name: str, size_pt: int, bold: bool = False):
"""A general Run style setting function, used to set the font, size and boldness"""
font = run.font
font.name = font_name
font.size = Pt(size_pt)
font.bold = bold

# Chinese fonts in Word need to set this property to display correctly
r = run._r
r.rPr.rFonts.set(qn('w:eastAsia'), font_name)

def style_certificate_info(p: Paragraph):
"""Engineering Design Qualification Certificate, No.: Songti Small Four, Single Line Spacing, Right Alignment"""
fmt = p.paragraph_format
fmt.alignment = WD_ALIGN_PARAGRAPH.RIGHT # <--- Add right alignment
fmt.line_spacing_rule = WD_LINE_SPACING.SINGLE
for run in p.runs:
set_run_font_style(run, '宋体', 12)

paragaph_requirement_tools = {
"3": {
"description": "Engineering Design Qualification Certificate, Number: Songti Size 4, Single Line Spacing",
"function_name": "style_certificate_info"
}
}

Above are some examples of tools we generated. Now we can add this information to the prompt words to let the large model use these tools.

Write prompt words

Let me structure the content of the prompt words:

What LLM needs to do
The correspondence between rules and tools
Chunk list

So the general prompt words are as follows:

# Role
You are a professional Word document formatting assistant.

# Task
Your core task is to analyze the given document paragraph list, and accurately match the most appropriate style for each paragraph according to the **context variables** and **core rules** provided below, and only output the tool `function_name` corresponding to the style.

# Context variables (to be filled in before processing)
- **Current volume name**: Volume 4 Substation Engineering 64-W0047K-A04
- **Current volume name**: Volume 4 10kV Off-site Power Supply Engineering 64-W0047K-A0404

# Core rules and tool mapping table
Please strictly follow the rules in the table below to make judgments. Judgment should be made from top to bottom according to priority. Once a match is successful, the judgment of subsequent rules will be stopped.

| Priority | Style category | Paragraph features/description | Output `function_name` |
|:---:|:---|:---|:---|
| 1 | **Volume title** | - Starts with `Volume X` and `Book X` at the same time, for example `Volume 4 xxx Book 4 xxx` | `style_volume_title` |
... //To be supplemented

# Context judgment logic
1. **General catalog matching logic**: When judging the general catalog entry (priority 2-5), you need to extract the name part from the paragraph (that is, remove the code at the end), and then compare it with the `[current volume name]` or `[current book name]` variable.
2. **Distinguishing between catalog and body**: The key to judging whether a paragraph is a "Manual Catalog" (priority 8-9) is **context**. If a paragraph and its preceding and succeeding paragraphs together form a "title-page number" list, they belong to the "Manual Catalog" as a whole. Otherwise, even if the format is similar (such as `1. Overview`), it should be considered "body-heading".
3. **Default rule**: If a paragraph does not meet all the characteristics of priorities 1 to 12, it **must** be classified as "body text" (priority 13) and output `style_body_text`. This is the final "fallback" rule.

# Output requirements
- For each paragraph in the list, you must output one and only one `function_name` on a new line.
- Do not output any explanations, descriptions, or any text other than `function_name`.

paragaph_requirement_tools = {
"3": {
"description": "Engineering Design Qualification Certificate, Number: Songti Small Four, Single Line Spacing",
"function_name": "style_certificate_info"
}
}

Now, please start processing the list of paragraphs I will provide.

Text list: {str(text_seq)}

Please output in the format of List[json], including the fields [text, function_name, reason]

We hand it over to the big model for execution, use Python to connect to the endpoint, pass the prompt word to the execution function, and wait for the output of the big model:

client = OpenAI(
api_key="sk-xxx",
base_url="http://xx.ai/v1",
)

def ai(prompts: list[str]):
# print(prompts)
completion = client.chat.completions.create(
model="deepseek-r1",
messages=prompts,
stream=True
)

# Print the thinking process through the reasoning_content field
content = ""
for chunk in completion:
if chunk.choices[0].delta.content:
content += chunk.choices[0].delta.content

#print(content)
return content

Generally, the result returned by the large model is a json:

[
{
text: "1.1 Project Overview",
function_name: "style_body_heading",
reason: "1.1 complies with the requirements of the secondary title rules"
}
]

We then pass this result to the execution tool function:

def use_tools(workflow_steps: list[dict], useful_chunk: list[Paragraph]):
for index in range(len(workflow_steps)):
# Here we traverse the executed workflow
# Get the index of each execution, which is the same as the element in the chunk

paragraph = useful_chunk[index]
step = workflow_steps[index]

function_name = step.get('function_name')

print(step)

if function_name == "None":
continue

globals()[function_name](paragraph)

At this point, all the functionality has been implemented, and what remains is testing, optimizing, and improving stability.

Let’s review this solution again:

The user uploads a Word document, the backend saves the file, and starts Word typesetting.
LLM segments the paragraphs, and then uses the set rules to determine the specific tool to use.
Execute the tool, save the file, and provide it to the user for download.

We have formed a complete product. If you have any questions, please contact us at: [email protected]

Read Entire Article