CSV 读档 + 写档时间慢,求协助寻找问题所在
小弟有一个CSV档案,总共有32000行资料,每一行资料都有8列
我需要从CSV档案读取第6列的每一行的资料并进行处理,且成功运作。
一开始使用1000行测试的时候,执行时间非常快,但是改为执行原档32000行的时候,却变得特别慢,我不是很懂问题所在,希望大家可以给我一些意见。
程式如下:
from datetime import date,datetime
import csv
import codecs
import time
import re
import sys
import os
import jieba
from itertools import repeat
sys.setrecursionlimit(100000000)
input_file = 'Data.csv'
output_file = 'All.csv'
with open(input_file, newline='', encoding='utf-8') as csvfile:
total_line = len(csvfile.readlines())-1
for_loop = total_line + 1
print(for_loop)
with open(output_file, 'a', newline='', encoding='utf-8') as csvfile:
csvfile.write('回應作者\n')
for post in range(1,for_loop):
with open(input_file, newline='', encoding='utf-8') as csvfile:
reader = csv.reader(csvfile)
column = [row[5] for row in reader]
for i, rows in enumerate(column):
if i == post:
string = rows
#print(string)
string = re.sub('@.*?@', ' ', string, 1)
flag = 1
print('Post: ',post)
while(flag):
if (string.find('@!@') != -1):
index = string.find('@!@')
output_string = string[1:index]
#print(output_string)
with open(output_file, 'a', newline='', encoding='utf-8') as csvfile:
csvfile.write(output_string+'\n')
string = re.sub(' .*?@', '', string, 1)
string = re.sub('!.*?@', ' ', string, 1)
else:
if(string != ''):
string = re.sub(' ', '', string, 1)
output_string = string
#print(output_string)
with open(output_file, 'a', newline='', encoding='utf-8') as csvfile:
csvfile.write(output_string+'\n')
string = ''
if(string == ''):
flag = 0
else:
print("not found")
flag = 0
这里重复太多了, 32000 行, 就读取整个文件32000次, 还循环了32000次, 不慢才奇怪 !
稍作修改, 没 Data.csv 没测试.