001.00 一般檔案處理

Jason990420 的个人博客 / 0 / 0 / 创建于 6年前 / 更新于 6年前

001.00 一般档案处理

建檔日期: 2019/07/28

更新日期: 2020/03/28 错误更正, 增加路径名的脱离字符的说明

Win 10

Python 3.7.4

档案目录:

- Script(.py)文件位置: D:/Python_Work/001.00 文本文件处理

- 文字/二进制文件位置: D:/Python_Work/001.00 文本文件处理/文本文件

- 图形文件: D:/Python_Work/001.00 文本文件处理/图形文件/

<<<注意>>> 本文为作者学习笔记, 如有错误, 敬请见谅, 繁请提示修正, 谢谢 !

# 檔名: 001.00 一般档案处理.py
# 预备: 档案001.txt以unicode存盘, 档案002.txt以ANSI存档
# 档案内容, 印在console上.
# 文件名变数
filename1 = 'D:/Python_Work/001.00 一般档案处理/一般档案/一般档案001(unicode).txt'
filename2 = 'D:/Python_Work/001.00 一般档案处理/一般档案/一般档案002(ansi).txt'

# 定义档案打印函数
def print_all(filename, codec):
    fn = open(filename, encoding=codec)
    data = fn.read()
    print(data)
    fn.close()
print_all(filename1, "utf_16")
print("")
print_all(filename2, "mbcs")

1. 数据文件名称字符串

相对路径: "一般档案/一般档案001.txt"
絶对路径: “D:/Python_Work/001.00 一般档案处理/一般档案/一般档案001.txt”

分隔符可以使用/或\\, 但在Python中, \为脱离符, 所以使用\, 必须为\\. 几种表示方式, 如

'd:\\test.txt'
'd:/test.txt'
r'd:\test.txt', r代表raw, \不會被當作脱离符

2. 打开档案

fileobj = open(filename, access_mode=’r’, buffering=-1, encoding=None, errors=None, newline=None, closefd=True, opener=None)
- filename: 文件名字符串
- access_mode: 访问模式, 默认为只读’r’, 不冲突的模式可以组合一起
  - r - 只读模式 (Read Mode)
  - w - 写入新文件 (Write Mode)
  - x - 独占档案, 不存在则出错
  - a - 附加新数据
  - b - 二进制文件 (Binary Mode)
  - t - 文本文件 (Text Mode)
  - + - 可读可写
  - 常用的参数只有”r”, “w”, “t”及”b”, 也就是二进制文件/文本文件, 读/写, 四者混用.
- buffering: 0-没缓冲(只有b模式可用), 1-行缓冲(只有t模式可用), >1-缓冲区大小, <0-系统默认, b模式下为io.DEFAULT_BUFFER_SIZE, 典型值为4096或8192 bytes; t模式下, 交谈式文本文件, isatty()是True, 则为行缓冲, 否则如b模式一样
- encoding: 仅供文本文件使用, 总共至少有113种编码, 比如’ascii’, ‘big5’, ‘big5hkscs’, ‘gbk’, ‘gb18030’, ‘hz’, ‘utf_8’, ‘utf_16’等等. 一定要确定文件格式, 否则会读不进数据, 比如记事本中可以存成四种不同格式的txt檔, ansi(mbcs), Unicode (utf_16), Unicode Big endian (utf_16_be)以及UTF-8(utf_8).
- errors: coding/encding错误时的处理程序
  - 'strict': 编码错误，则引发ValueError异常。默认值None具有相同的效果。
  - 'ignore': 忽略编码错误可能会导致数据丢失。
  - 'replace': 导致在有错误数据的地方插入替换标记。<<< 待确认 >>>
  - 'surrogateescape': 任何不正确的字节，将被转回到相同的字节中。这对于处理未知编码的文件很有用。
  - 'xmlcharrefreplace': 写入时才有用，编码不支持的字符将替换为＆＃nnn;。
  - 'backslashreplace': 用Python的反向转义序列替换格式错误的数据。
  - 'namereplace': 写入时才有用，用\ N {…}转义序列替换不支持的字符。
- newline:
  - 读入模式时: None: ‘\n’, ‘\r’, or ‘\r\n’会被转换成’\n’; 空字符串:不处理; ‘其他’: 则只取回不含’其他’的字符串.
  - 写出模式时: None: ‘\n’会被转换为os.linesep; “”/‘\n’不处理; ‘其他’: 则直接替换.
- closefd: 如果closefd为False并且给出了文件描述符而不是文件名，则在关闭文件时，底层文件描述符将保持打开状态。如果给出文件名，则closefd必须为True（默认值），否则将引发错误。<<< 待确认 >>>
- opener: 通过传递可调用的开启者可以使用自定义开启器。然后通过使用（file，flags）调用opener来获取文件对象的基础文件描述符。 opener必须返回一个打开的文件描述符（传递os.open作为opener导致类似于传递None的功能）。<<< 待确认 >>>

3. 关闭档案 fileobj.method()

close() 刷新缓冲区数据后, 关闭档案

4. 文件属性 fileobj.method()

closed() 档案已关闭? True / False
mode() 档案访问模式access mode
name() 文件名
softspace() 输出后没加一个空格 ? True / False

5. 档案读取 fileobj.method()

read([size]) 读入字节数, 省略则全部读入, 档案太大就不太合适.
readline() 一行一行读, 占内存少, 不过比较慢. 每一行字符串最后会带有’\n’, 最后一行可能没有’\n’, 可以使用line.strip(‘\n’)处理. print(‘\n’)会产生两空行, 可以使用print(“”)
readlines() 读出字符串, 组成一个list, 占内存多, 不过比较快.
linecache.method() 仅适用于ASCII或utf-8的档案, 耗内存;
- getline(filename, lineno[, module_globals=None]) 读入第几行.
- getlines(filename) 全部读入list中.
- clearcache() 清除缓存
- checkcache(filename=None) 检查(所有)档案的缓存
- lazycache(filename[, module_globals=None]) 捕捉足够信息供使用
- updatecache(filename) 更新缓存, 同时更新getlines的list内容

# 檔名: 001.01 一般档案处理.py
# readline(), readlines(),linecache.getline(),linecache.getlines() 用法

import linecache

# 文件名变数
filename1 = 'D://Python_Work//001.00 一般档案处理// 一般档案// 一般档案001(unicode).txt'
filename2 = 'D://Python_Work//001.00 一般档案处理// 一般档案// 一般档案002(ansi).txt'
filename3 = 'D://Python_Work//001.00 一般档案处理// 一般档案// 一般档案003(utf-8).txt'

# readline()用法 -- string
fn=open(filename1,encoding='utf_16')
line = fn.readline()
while line:
    print(line.strip('\n'))
    line = fn.readline()
fn.close()
print('')

# readlines()用法 -- list
fn=open(filename2,encoding='mbcs')
lines = fn.readlines()
for line in lines:
    print(line.strip('\n'))
fn.close()
print("")

# linecache.getlines() 用法 -- 不用open, 直接读全部, 仅适用于ASCII或utf-8
lines = linecache.getlines(filename3)
for line in lines:
    print(line.strip('\n'))
linecache.clearcache()
print("")

# linecache.getline() 用法 -- 不用open, 直接读某一行, 仅适用于ASCII或utf-8
# 文本文件有9行
for lineno in range(1,10):
    line = linecache.getline(filename3,lineno)
    print(line.strip('\n'))

6. 写入档案 fileobj.method()

write(string) 写入字符串, 不会自动加’\n’
writelines(list) 写入整个字符串list, 不会自动加’\n’
seek(offset[, whence]) 跳过offset个, 正值向前, 负值向后, whence 0表示从头开始, 1表示目前位置, 2表示最后位置. 有特别格式的档案, 不能跳offset, 不然译码器无法解析档案的编码格式作读取, 将造成错误, 最好是二进制文件.
flush() 更新缓冲区, 一般文件关闭后会自动更新.
tell() 指出目前档案内容指针位置.
fileno() 返回整数值的档案描述符, 类似open()返回值, 供OS档案操作使用.
isatty() 是否为TTY设备? True or False, TTY:使用者输入输出设备, 非档案.
next() 返回下一行. 档案刚打开时, 则为第一行.
truncate([size]) 把档案size后的内容删除, 没size, 则从目前位置起删除. 只读模式不能用.

# 檔名: 001.02 一般档案处理.py
# writeline() & writelines() 用法
# 文件名变数
filename1 = 'D:/Python_Work/001.00 一般档案处理/ 一般档案/ 一般档案001(unicode).txt'
filename2 = 'D:/Python_Work/001.00 一般档案处理/ 一般档案/ 一般档案001(unicode)_rev1.txt'
filename3 = 'D:/Python_Work/001.00 一般档案处理/ 一般档案/ 一般档案001(unicode)_rev2.txt'

with open(filename1, 'r', encoding='utf_16') as fileobj_read:
    # Note: readlines doesn't trim the line endings
    lines = fileobj_read.readlines()

with open(filename2, 'w') as fileobj_write1:
    fileobj_write1.writelines(lines)

with open(filename3, 'w') as fileobj_write2:
    for line in lines:
        fileobj_write2.write(line)

fileobj_read.close()
fileobj_write1.close()
fileobj_write2.close()

# 檔名: 001.03 一般档案处理.py
# write() 用法
filename1 = "图形文件/图形文件001.jpg"
filename2 = "图形文件/图形文件001_bak.jpg"

fileobj_read = open(filename1,"rb")
file = fileobj_read.read()

fileobj_write = open(filename2, "wb")
fileobj_write.write(file)

~ The End ~

Python的世界

本作品采用《CC 协议》，转载必须注明作者和本文链接

Jason Yang

001.00 一般檔案處理

001.00 一般档案处理

建檔日期: 2019/07/28

更新日期: 2020/03/28 错误更正, 增加路径名的脱离字符的说明

档案目录:

- Script(.py)文件位置: D:/Python_Work/001.00 文本文件处理

- 文字/二进制文件位置: D:/Python_Work/001.00 文本文件处理/文本文件

- 图形文件: D:/Python_Work/001.00 文本文件处理/图形文件/

<<<注意>>> 本文为作者学习笔记, 如有错误, 敬请见谅, 繁请提示修正, 谢谢 !

1. 数据文件名称字符串

2. 打开档案

3. 关闭档案 fileobj.method()

4. 文件属性 fileobj.method()

5. 档案读取 fileobj.method()

6. 写入档案 fileobj.method()

~ The End ~

推荐文章：

社区赞助商

关于 LearnKu

资源推荐

服务提供商

其他信息

001.00 一般檔案處理

001.00 一般档案处理

建檔日期: 2019/07/28

更新日期: 2020/03/28 错误更正, 增加路径名的脱离字符的说明

档案目录:

- Script(.py)文件位置: D:/Python_Work/001.00 文本文件处理

- 文字/二进制文件位置: D:/Python_Work/001.00 文本文件处理/文本文件

- 图形文件: D:/Python_Work/001.00 文本文件处理/图形文件/

<<<注意>>> 本文为作者学习笔记, 如有错误, 敬请见谅, 繁请提示修正, 谢谢 !

1. 数据文件名称字符串

2. 打开档案

3. 关闭档案 fileobj.method()

4. 文件属性 fileobj.method()

5. 档案读取 fileobj.method()

6. 写入档案 fileobj.method()

~ The End ~

推荐文章：

社区赞助商

关于 LearnKu

资源推荐

服务提供商

其他信息

请登录