利用python中re模块匹配字符串不成功问题

import re

text=’<a src=”111”>111</a> <a src=”222”>222</a>test1 <a src=”333”>333</a> test2 <a src=”444”>444</a> <a src=”5”>5</a> <a src=”6”>6</a> test3 <a src=”7”>7</a>’

pattren=re.compile(‘<a.*?>(.*?)</a>.*?<p.*?>(.*?)’,re.S)
items=re.findall(pattren,text)
for item in items:
print(item)
print(“++++++++++++++++++++++++++++++++++++++”)

在学习python里的正则表达式。以上代码想实现把所有和之间、

和

之间的内容匹配出来。
我想得到的结果是：
111 222 test1
333 test2
444 5 6 test3
可实际得到:
111 test1
333 test2
444 test3
求帮助，怎么修改正则表达式

shashadefbq

4 声望

暂无个人描述~

0 人点赞

推荐文章：

更多推荐...

置顶

[进度 100.00%] Python Masonite 4.0 中文翻译召集（Python 中的类 Laravel 框架） 15 / 19 |

博客

2021年python库大全 30 / 5 |

公告

Python Masonite 框架中文翻译召集（Python 中的类 Laravel 框架） 24 / 25 |

博客

收集了一些各大网站 python 的登陆方式,希望对学习 python 的小白，和想写爬虫的你们有所帮助,,本项目用于研究和分享各大网站的模拟登陆方式 17 / 5 |

翻译

Python 3.7 的一些新特性 10 / 2 |

公告

一起学 Python 《Python 最佳实践指南》翻译召集 16 / 2 |

Jason990420

1.9k 声望 / 個人 @ 個人

最佳答案

A capture group can only capture one thing, and there's no way to create a dynamic number of capture groups. When you repeat a capturing group, in most flavors, only the last capture is kept; any previous capture is overwritten. In short, it's impossible to do all of this in the re engine. You cannot generate more groups dynamically. It will all put it in one group.

refer: www.regular-expressions.info/captu...

import re

text="""
<a src="111">111</a> <a src="222">222</a> <p title="test1">test1</p>
<a src="333">333</a> <p title="test2">test2</p>
<a src="444">444</a> <a src="5">5</a> <a src="6">6</a> <p title="test3">test3</p>
<a src="7">7</a>
"""

regex = re.compile(r'<(a|p)\s+?.+?>(.+?)<\/\1>', re.S)
result, tmp = [], ()
for tag, text in regex.findall(text):
    if tag in ('a', 'p'):
        tmp = tmp + (text,)
        if tag == 'p':
            result.append(tmp)
            tmp = ()

print(result)

[('111', '222', 'test1'), ('333', 'test2'), ('444', '5', '6', 'test3')]

2年前评论

讨论数量: 2

Jason990420

1.9k 声望 / 個人 @ 個人

A capture group can only capture one thing, and there's no way to create a dynamic number of capture groups. When you repeat a capturing group, in most flavors, only the last capture is kept; any previous capture is overwritten. In short, it's impossible to do all of this in the re engine. You cannot generate more groups dynamically. It will all put it in one group.

refer: www.regular-expressions.info/captu...

import re

text="""
<a src="111">111</a> <a src="222">222</a> <p title="test1">test1</p>
<a src="333">333</a> <p title="test2">test2</p>
<a src="444">444</a> <a src="5">5</a> <a src="6">6</a> <p title="test3">test3</p>
<a src="7">7</a>
"""

regex = re.compile(r'<(a|p)\s+?.+?>(.+?)<\/\1>', re.S)
result, tmp = [], ()
for tag, text in regex.findall(text):
    if tag in ('a', 'p'):
        tmp = tmp + (text,)
        if tag == 'p':
            result.append(tmp)
            tmp = ()

print(result)

[('111', '222', 'test1'), ('333', 'test2'), ('444', '5', '6', 'test3')]

2年前评论

shashadefbq

4 声望

谢谢。

1年前评论

讨论应以学习和精进为目的。请勿发布不友善或者负能量的内容，与人为善，比聪明更重要！

帮助

利用python中re模块匹配字符串不成功问题

推荐文章：

社区赞助商

关于 LearnKu

资源推荐

服务提供商

其他信息

利用python中re模块匹配字符串不成功问题

推荐文章：

社区赞助商

关于 LearnKu

资源推荐

服务提供商

其他信息

请登录