Python3英文断句

您好!
请教下,我有多个以下格式的语句需要断句:
From aaaaa abaTo bbbbb cdc For xxxx
需要把aaaaa aba 和 bbbbb cdc 以及 xxxx 从句子中提取出来,放到excel
表中,表中每个占一列。
一共有n行,n>100
请教下如何写,谢谢。

Jason990420
最佳答案

需要安装 openpyxl

d:\>pip install openpyxl

Before,

file

import re
from openpyxl import load_workbook

def symbol(text):
    return [item.strip()
        for item in re.split('|'.join(keywords), text) if item.strip()!='']

keywords = ["From", "To", "For", "On"]
filename = 'Example.xlsx'
wb = load_workbook(filename=filename)
ws = wb['Sheet1']

for cell in ws[1:1]:
    text = cell.value
    symbols = symbol(text)
    for i, s in enumerate(symbols):
        ws[f'{cell.column_letter}{i+2}'] = s

wb.save(filename)

After,

file

3年前 评论
Priest (楼主) 3年前
Priest (楼主) 3年前
讨论数量: 5
Jason990420

断句准则为何 ? 取舍准则又为何 ?

3年前 评论

语句是一系列带有关键词的描述,关键词包括:From、To、For、On, 语句的格式是:From aaaaaTo bbbbb For cccccddddd On xxxxx 其中aaaa和To之间通常是没有空格的,所以需要用关键词来断句。 就是把关键词中间的部分提取出来,不管有没有空格。

3年前 评论
Jason990420

需要安装 xlsxwriter

D:>pip install xlsxwriter

按所要求的断句准则, 最简单的方式, 可以按以下代码来完成要求

import re
import xlsxwriter

def symbol(text):
    """
    Of course you can do it by using REGEX,
    return [item.strip() for item in re.split('|'.join(keyword), text) if item.strip()!='']
    """
    for key in keyword:
        text = text.replace(key, magic_key)
    return [item.strip() for item in text.split(magic_key) if item.strip() != '']

magic_key = '$'*10
keyword = ["From", "To", "For", "On"]
filename = 'Example.xlsx'

text = "From aaaaa abaTo bbbbb cdc For xxxx From aaaaaTo bbbbb For cccccddddd On xxxxx"
symbols = symbol(text)

# Create a workbook and add a worksheet.
workbook = xlsxwriter.Workbook(filename)
worksheet = workbook.add_worksheet()

col = 0
for row, symbol in enumerate(symbols):
    worksheet.write(row, col, symbol)

workbook.close()
>>> symbols
['aaaaa aba', 'bbbbb cdc', 'xxxx', 'aaaaa', 'bbbbb', 'cccccddddd', 'xxxxx']

file

3年前 评论
Priest (楼主) 3年前
Priest (楼主) 3年前

如果要求: 输入:是一个excel文件,含有n行语句 输出:仍是这个excel文件,同时把去除关键词后剩余的部分与语句列在同一行,每个字段一列呢? 即: 输入 From aaaaaTo bbbbb For cccccddddd On xxxxx From aaaaaTo bbbbb For cccccddddd On xxxxx From aaaaaTo bbbbb For cccccddddd On xxxxx

输出 From aaaaaTo bbbbb For cccccddddd On xxxxx, aaaaa, bbbbb, cccccddddd, xxxxx From aaaaaTo bbbbb For cccccddddd On xxxxx, aaaaa, bbbbb, cccccddddd, xxxxx From aaaaaTo bbbbb For cccccddddd On xxxxx, aaaaa, bbbbb, cccccddddd, xxxxx

谢谢指点! :+1:

3年前 评论
Jason990420

需要安装 openpyxl

d:\>pip install openpyxl

Before,

file

import re
from openpyxl import load_workbook

def symbol(text):
    return [item.strip()
        for item in re.split('|'.join(keywords), text) if item.strip()!='']

keywords = ["From", "To", "For", "On"]
filename = 'Example.xlsx'
wb = load_workbook(filename=filename)
ws = wb['Sheet1']

for cell in ws[1:1]:
    text = cell.value
    symbols = symbol(text)
    for i, s in enumerate(symbols):
        ws[f'{cell.column_letter}{i+2}'] = s

wb.save(filename)

After,

file

3年前 评论
Priest (楼主) 3年前
Priest (楼主) 3年前

讨论应以学习和精进为目的。请勿发布不友善或者负能量的内容,与人为善,比聪明更重要!