pandas read_csv读取文件问题请教

有如下csv,分割符为逗号,单其中有个字段也包含逗号,请问如何正确读取?

column1,column2,column3,column4
11,12,13,{“name”:’name1’,”data”:’data1’}
21,22,23,{“name”:’name2’,”data”:’data2’}
31,32,33,{“name”:’name3’,”data”:’data3’}
41,42,43,{“name”:’name4’,”data”:’data4’}

Jason990420
最佳答案

[], {}, (), <>, 应该是没有前后分隔符不一样的方法可用, 如果是单一分隔符, 如', ", 就可以使用quotechar 选项, pandas read_csv 也有该选项, 比如

with open(file) as csvfile:
    reader = csv.reader(csvfile , delimiter=',', quotechar='"')
    for line in reader:
        print(line)

以下采用正则式来读取

from re import split
from pandas import DataFrame

"""
with open(csv_file, 'rt', encoding='utf-8') as f:
    data = f.read()
"""

data = """
column1,column2,column3,column4
11,12,13,{“name”:’name1’,”data”:’data1’}
21,22,23,{“name”:’name2’,”data”:’data2’}
31,32,33,{“name”:’name3’,”data”:’data3’}
41,42,43,{“name”:’name4’,”data”:’data4’}
"""

lines = list(map(lambda line: split(',\s*(?![^{}]*\})', line), data.strip().split('\n')))

df = DataFrame(lines)
>>> lines
[['column1', 'column2', 'column3', 'column4'],
 ['11', '12', '13', '{“name”:’name1’,”data”:’data1’}'],
 ['21', '22', '23', '{“name”:’name2’,”data”:’data2’}'],
 ['31', '32', '33', '{“name”:’name3’,”data”:’data3’}'],
 ['41', '42', '43', '{“name”:’name4’,”data”:’data4’}']]
>>>
>>> df
         0        1        2                                3
0  column1  column2  column3                          column4
1       11       12       13  {“name”:’name1’,”data”:’data1’}
2       21       22       23  {“name”:’name2’,”data”:’data2’}
3       31       32       33  {“name”:’name3’,”data”:’data3’}
4       41       42       43  {“name”:’name4’,”data”:’data4’}
2年前 评论
讨论数量: 1
Jason990420

[], {}, (), <>, 应该是没有前后分隔符不一样的方法可用, 如果是单一分隔符, 如', ", 就可以使用quotechar 选项, pandas read_csv 也有该选项, 比如

with open(file) as csvfile:
    reader = csv.reader(csvfile , delimiter=',', quotechar='"')
    for line in reader:
        print(line)

以下采用正则式来读取

from re import split
from pandas import DataFrame

"""
with open(csv_file, 'rt', encoding='utf-8') as f:
    data = f.read()
"""

data = """
column1,column2,column3,column4
11,12,13,{“name”:’name1’,”data”:’data1’}
21,22,23,{“name”:’name2’,”data”:’data2’}
31,32,33,{“name”:’name3’,”data”:’data3’}
41,42,43,{“name”:’name4’,”data”:’data4’}
"""

lines = list(map(lambda line: split(',\s*(?![^{}]*\})', line), data.strip().split('\n')))

df = DataFrame(lines)
>>> lines
[['column1', 'column2', 'column3', 'column4'],
 ['11', '12', '13', '{“name”:’name1’,”data”:’data1’}'],
 ['21', '22', '23', '{“name”:’name2’,”data”:’data2’}'],
 ['31', '32', '33', '{“name”:’name3’,”data”:’data3’}'],
 ['41', '42', '43', '{“name”:’name4’,”data”:’data4’}']]
>>>
>>> df
         0        1        2                                3
0  column1  column2  column3                          column4
1       11       12       13  {“name”:’name1’,”data”:’data1’}
2       21       22       23  {“name”:’name2’,”data”:’data2’}
3       31       32       33  {“name”:’name3’,”data”:’data3’}
4       41       42       43  {“name”:’name4’,”data”:’data4’}
2年前 评论

讨论应以学习和精进为目的。请勿发布不友善或者负能量的内容,与人为善,比聪明更重要!