利用python中re模块匹配字符串不成功问题

import re

text=’<a src=”111”>111</a> <a src=”222”>222</a><p title=”test1”>test1</p> <a src=”333”>333</a> <p title=”test2”>test2</p> <a src=”444”>444</a> <a src=”5”>5</a> <a src=”6”>6</a> <p title=”test3”>test3</p> <a src=”7”>7</a>’

pattren=re.compile(‘<a.*?>(.*?)</a>.*?<p.*?>(.*?)</p>’,re.S)
items=re.findall(pattren,text)
for item in items:
print(item)
print(“++++++++++++++++++++++++++++++++++++++”)

在学习python里的正则表达式。以上代码想实现把所有之间、

之间的内容匹配出来。
我想得到的结果是:
111 222 test1
333 test2
444 5 6 test3
可实际得到:
111 test1
333 test2
444 test3
求帮助,怎么修改正则表达式

Jason990420
最佳答案

A capture group can only capture one thing, and there's no way to create a dynamic number of capture groups. When you repeat a capturing group, in most flavors, only the last capture is kept; any previous capture is overwritten. In short, it's impossible to do all of this in the re engine. You cannot generate more groups dynamically. It will all put it in one group.

refer: www.regular-expressions.info/captu...

import re

text="""
<a src="111">111</a> <a src="222">222</a> <p title="test1">test1</p>
<a src="333">333</a> <p title="test2">test2</p>
<a src="444">444</a> <a src="5">5</a> <a src="6">6</a> <p title="test3">test3</p>
<a src="7">7</a>
"""

regex = re.compile(r'<(a|p)\s+?.+?>(.+?)<\/\1>', re.S)
result, tmp = [], ()
for tag, text in regex.findall(text):
    if tag in ('a', 'p'):
        tmp = tmp + (text,)
        if tag == 'p':
            result.append(tmp)
            tmp = ()

print(result)
[('111', '222', 'test1'), ('333', 'test2'), ('444', '5', '6', 'test3')]
10个月前 评论
讨论数量: 2
Jason990420

A capture group can only capture one thing, and there's no way to create a dynamic number of capture groups. When you repeat a capturing group, in most flavors, only the last capture is kept; any previous capture is overwritten. In short, it's impossible to do all of this in the re engine. You cannot generate more groups dynamically. It will all put it in one group.

refer: www.regular-expressions.info/captu...

import re

text="""
<a src="111">111</a> <a src="222">222</a> <p title="test1">test1</p>
<a src="333">333</a> <p title="test2">test2</p>
<a src="444">444</a> <a src="5">5</a> <a src="6">6</a> <p title="test3">test3</p>
<a src="7">7</a>
"""

regex = re.compile(r'<(a|p)\s+?.+?>(.+?)<\/\1>', re.S)
result, tmp = [], ()
for tag, text in regex.findall(text):
    if tag in ('a', 'p'):
        tmp = tmp + (text,)
        if tag == 'p':
            result.append(tmp)
            tmp = ()

print(result)
[('111', '222', 'test1'), ('333', 'test2'), ('444', '5', '6', 'test3')]
10个月前 评论

讨论应以学习和精进为目的。请勿发布不友善或者负能量的内容,与人为善,比聪明更重要!