正则表达式的使用疑惑

部分代码如下

                                <tr>
                                    <th>ID</th>
                                    <th>Key Name</th>
                                    <th>Value</th>
                                    <th>Action</th>
                                </tr>
                            </thead>
                            <tbody>
                                <tr id="cancel">
   <td>1</td>
   <td>cancel</td>
   <td id="edit_cancel">Cancel</td>
   <td>
      <button type="button" class="btn btn-default waves-effect btn-lang m-r-20" data-id="cancel" data-toggle="modal" data-target="#defaultModal">EDIT</button>
   </td>
</tr><tr id="delete">
   <td>2</td>
   <td>delete</td>
   <td id="edit_delete">Delete</td>
   <td>
      <button type="button" class="btn btn-default waves-effect btn-lang m-r-20" data-id="delete" data-toggle="modal" data-target="#defaultModal">EDIT</button>
   </td>
</tr><tr id="my_profile">
   <td>3</td>
   <td>my_profile</td>
   <td id="edit_my_profile">My Profile</td>
   <td>
      <button type="button" class="btn btn-default waves-effect btn-lang m-r-20" data-id="my_profile" data-toggle="modal" data-target="#defaultModal">EDIT</button>
   </td>

Q:

  • 如何把html中的VALUE值提取出来。
    eg:将Cancel提取并存储。
    <td id="edit_cancel">Cancel</td>
  • 将存储起来的字符访问有道翻译进行翻译,将翻译的结果进行可读写存储。
讨论数量: 2

这种html格式的建议你用Xpath,我给你打个样

from lxml import etree
html = """                                <tr>
                                    <th>ID</th>
                                    <th>Key Name</th>
                                    <th>Value</th>
                                    <th>Action</th>
                                </tr>
                            </thead>
                            <tbody>
                                <tr id="cancel">
   <td>1</td>
   <td>cancel</td>
   <td id="edit_cancel">Cancel</td>
   <td>
      <button type="button" class="btn btn-default waves-effect btn-lang m-r-20" data-id="cancel" data-toggle="modal" data-target="#defaultModal">EDIT</button>
   </td>
</tr><tr id="delete">
   <td>2</td>
   <td>delete</td>
   <td id="edit_delete">Delete</td>
   <td>
      <button type="button" class="btn btn-default waves-effect btn-lang m-r-20" data-id="delete" data-toggle="modal" data-target="#defaultModal">EDIT</button>
   </td>
</tr><tr id="my_profile">
   <td>3</td>
   <td>my_profile</td>
   <td id="edit_my_profile">My Profile</td>
   <td>
      <button type="button" class="btn btn-default waves-effect btn-lang m-r-20" data-id="my_profile" data-toggle="modal" data-target="#defaultModal">EDIT</button>
   </td>"""
select = etree.HTML(html)
print(select.xpath('//td[@id="edit_cancel"]/text()')[0])

详细的Xpath用法你可以看我这篇文章

2年前 评论
Jason990420

Example here,

import re
import json
import requests

def translate(word):
    # 有道词典 api
    url = 'http://fanyi.youdao.com/translate?smartresult=dict&smartresult=rule&smartresult=ugc&sessionFrom=null'
    # 传输的参数,其中 i 为需要翻译的内容
    key = {
        'type': "AUTO",
        'i': word,
        "doctype": "json",
        "version": "2.1",
        "keyfrom": "fanyi.web",
        "ue": "UTF-8",
        "action": "FY_BY_CLICKBUTTON",
        "typoResult": "true"
    }
    # key 这个字典为发送给有道词典服务器的内容
    response = requests.post(url, data=key)
    # 判断服务器是否相应成功
    if response.status_code == 200:
        # 然后相应的结果
        result = json.loads(response.text)
        return result['translateResult'][0][0]['tgt']
    else:
        print("有道词典调用失败")
        # 相应失败就返回空
        return None

html = """
    <table>
      <thead>
        <tr>
          <th>ID</th>
          <th>Key Name</th>
          <th>Value</th>
          <th>Action</th>
        </tr>
      </thead>
      <tbody>
        <tr id="cancel">
          <td>1</td>
          <td>cancel</td>
          <td id="edit_cancel">Cancel</td>
          <td>button</td>
        </tr>
        <tr id="delete">
          <td>2</td>
          <td>delete</td>
          <td id="edit_delete">Delete</td>
          <td>button</td>
        </tr>
        <tr id="my_profile">
          <td>3</td>
          <td>my_profile</td>
          <td id="edit_my_profile">My Profile</td>
          <td>button</td>
        </tr>
      </tbody>
    </table>
"""

match_English = re.findall(r'<td.*?>(.*?)</td>', html, re.M|re.I|re.S)
match_Chinese = [translate(item) if i%4==2 else item for i, item in enumerate(match_English)]
result_English = [','.join(match_English[i:i+4]) for i in range(0, len(match_English), 4)]
result_Chinese = [','.join(match_Chinese[i:i+4]) for i in range(0, len(match_Chinese), 4)]

for item1, item2 in zip(result_English, result_Chinese):
    print(item1, "==>", item2)
1,cancel,Cancel,button ==> 1,cancel,取消,button
2,delete,Delete,button ==> 2,delete,删除,button
3,my_profile,My Profile,button ==> 3,my_profile,我的资料,button
2年前 评论

讨论应以学习和精进为目的。请勿发布不友善或者负能量的内容,与人为善,比聪明更重要!