文件(2) · 零基础学python（第二版）

上一节，对文件有了初步认识。要牢记，文件无非也是一种类型的数据。 ## [](https://github.com/qiwsir/StarterLearningPython/blob/master/127.md#文件的状态)文件的状态很多时候，我们需要获取一个文件的有关状态（也称为属性），比如创建日期，访问日期，修改日期，大小，等等。在os模块中，有这样一个方法，专门让我们查看文件的这些状态参数的。 ~~~ >>> import os >>> file_stat = os.stat("131.txt") #查看这个文件的状态 >>> file_stat #文件状态是这样的。从下面的内容，有不少从英文单词中可以猜测出来。 posix.stat_result(st_mode=33204, st_ino=5772566L, st_dev=2049L, st_nlink=1, st_uid=1000, st_gid=1000, st_size=69L, st_atime=1407897031, st_mtime=1407734600, st_ctime=1407734600) >>> file_stat.st_ctime #这个是文件创建时间 1407734600.0882277 ~~~ 这是什么时间？看不懂！别着急，换一种方式。在python中，有一个模块`time`，是专门针对时间设计的。 ~~~ >>> import time >>> time.localtime(file_stat.st_ctime) #这回看清楚了。 time.struct_time(tm_year=2014, tm_mon=8, tm_mday=11, tm_hour=13, tm_min=23, tm_sec=20, tm_wday=0, tm_yday=223, tm_isdst=0) ~~~ ## [](https://github.com/qiwsir/StarterLearningPython/blob/master/127.md#readreadlinereadlines)read/readline/readlines 上节中，简单演示了如何读取文件内容，但是，在用`dir(file)`的时候，会看到三个函数：read/readline/readlines，它们各自有什么特点，为什么要三个？一个不行吗？在读者向下看下面内容之前，请想一想，如果要回答这个问题，你要用什么方法？注意，我问的是用什么方法能够找到答案，不是问答案内容是什么。因为内容，肯定是在某个地方存放着呢，关键是用什么方法找到。搜索？是一个不错的方法。还有一种，就是在交互模式下使用的，你肯定也想到了。 ~~~ >>> help(file.read) ~~~ 用这样的方法，可以分别得到三个函数的说明： ~~~ read(...) read([size]) -> read at most size bytes, returned as a string. If the size argument is negative or omitted, read until EOF is reached. Notice that when in non-blocking mode, less data than what was requested may be returned, even if no size parameter was given. readline(...) readline([size]) -> next line from the file, as a string. Retain newline. A non-negative size argument limits the maximum number of bytes to return (an incomplete line may be returned then). Return an empty string at EOF. readlines(...) readlines([size]) -> list of strings, each a line from the file. Call readline() repeatedly and return a list of the lines so read. The optional size argument, if given, is an approximate bound on the total number of bytes in the lines returned. ~~~ 对照一下上面的说明，三个的异同就显现了。 EOF什么意思？End-of-file。在[维基百科](http://en.wikipedia.org/wiki/End-of-file)中居然有对它的解释： ~~~ In computing, End Of File (commonly abbreviated EOF[1]) is a condition in a computer operating system where no more data can be read from a data source. The data source is usually called a file or stream. In general, the EOF is either determined when the reader returns null as seen in Java's BufferedReader,[2] or sometimes people will manually insert an EOF character of their choosing to signal when the file has ended. ~~~ 明白EOF之后，就对比一下： * read：如果指定了参数size，就按照该指定长度从文件中读取内容，否则，就读取全文。被读出来的内容，全部塞到一个字符串里面。这样有好处，就是东西都到内存里面了，随时取用，比较快捷；“成也萧何败萧何”，也是因为这点，如果文件内容太多了，内存会吃不消的。文档中已经提醒注意在“non-blocking”模式下的问题，关于这个问题，不是本节的重点，暂时不讨论。 * readline：那个可选参数size的含义同上。它则是以行为单位返回字符串，也就是每次读一行，依次循环，如果不限定size，直到最后一个返回的是空字符串，意味着到文件末尾了(EOF)。 * readlines：size同上。它返回的是以行为单位的列表，即相当于先执行`readline()`，得到每一行，然后把这一行的字符串作为列表中的元素塞到一个列表中，最后将此列表返回。依次演示操作，即可明了。有这样一个文档，名曰：you.md，其内容和基本格式如下： > You Raise Me Up When I am down and, oh my soul, so weary; When troubles come and my heart burdened be; Then, I am still and wait here in the silence, Until you come and sit awhile with me. You raise me up, so I can stand on mountains; You raise me up, to walk on stormy seas; I am strong, when I am on your shoulders; You raise me up: To more than I can be. 分别用上述三种函数读取这个文件。 ~~~ >>> f = open("you.md") >>> content = f.read() >>> content 'You Raise Me Up\nWhen I am down and, oh my soul, so weary;\nWhen troubles come and my heart burdened be;\nThen, I am still and wait here in the silence,\nUntil you come and sit awhile with me.\nYou raise me up, so I can stand on mountains;\nYou raise me up, to walk on stormy seas;\nI am strong, when I am on your shoulders;\nYou raise me up: To more than I can be.\n' >>> f.close() ~~~ **提示：养成一个好习惯，**只要打开文件，不用该文件了，就一定要随手关闭它。如果不关闭它，它还驻留在内存中，后面又没有对它的操作，是不是浪费内存空间了呢？同时也增加了文件安全的风险。 > 注意：在python中，'\n'表示换行，这也是UNIX系统中的规范。但是，在奇葩的windows中，用'\r\n'表示换行。python在处理这个的时候，会自动将'\r\n'转换为'\n'。请仔细观察，得到的就是一个大大的字符串，但是这个字符串里面包含着一些符号`\n`，因为原文中有换行符。如果用print输出这个字符串，就是这样的了，其中的`\n`起作用了。 ~~~ >>> print content You Raise Me Up When I am down and, oh my soul, so weary; When troubles come and my heart burdened be; Then, I am still and wait here in the silence, Until you come and sit awhile with me. You raise me up, so I can stand on mountains; You raise me up, to walk on stormy seas; I am strong, when I am on your shoulders; You raise me up: To more than I can be. ~~~ 用`readline()`读取，则是这样的： ~~~ >>> f = open("you.md") >>> f.readline() 'You Raise Me Up\n' >>> f.readline() 'When I am down and, oh my soul, so weary;\n' >>> f.readline() 'When troubles come and my heart burdened be;\n' >>> f.close() ~~~ 显示出一行一行读取了，每操作一次`f.readline()`，就读取一行，并且将指针向下移动一行，如此循环。显然，这种是一种循环，或者说可迭代的。因此，就可以用循环语句来完成对全文的读取。 ~~~ #!/usr/bin/env python # coding=utf-8 f = open("you.md") while True: line = f.readline() if not line: #到EOF，返回空字符串，则终止循环 break print line , #注意后面的逗号，去掉print语句后面的'\n'，保留原文件中的换行 f.close() #别忘记关闭文件 ~~~ 将其和文件"you.md"保存在同一个目录中，我这里命名的文件名是12701.py，然后在该目录中运行`python 12701.py`，就看到下面的效果了： ~~~ ~/Documents$ python 12701.py You Raise Me Up When I am down and, oh my soul, so weary; When troubles come and my heart burdened be; Then, I am still and wait here in the silence, Until you come and sit awhile with me. You raise me up, so I can stand on mountains; You raise me up, to walk on stormy seas; I am strong, when I am on your shoulders; You raise me up: To more than I can be. ~~~ 也用`readlines()`来读取此文件： ~~~ >>> f = open("you.md") >>> content = f.readlines() >>> content ['You Raise Me Up\n', 'When I am down and, oh my soul, so weary;\n', 'When troubles come and my heart burdened be;\n', 'Then, I am still and wait here in the silence,\n', 'Until you come and sit awhile with me.\n', 'You raise me up, so I can stand on mountains;\n', 'You raise me up, to walk on stormy seas;\n', 'I am strong, when I am on your shoulders;\n', 'You raise me up: To more than I can be.\n'] ~~~ 返回的是一个列表，列表中每个元素都是一个字符串，每个字符串中的内容就是文件的一行文字，含行末的符号。显而易见，它是可以用for来循环的。 ~~~ >>> for line in content: ... print line , ... You Raise Me Up When I am down and, oh my soul, so weary; When troubles come and my heart burdened be; Then, I am still and wait here in the silence, Until you come and sit awhile with me. You raise me up, so I can stand on mountains; You raise me up, to walk on stormy seas; I am strong, when I am on your shoulders; You raise me up: To more than I can be. >>> f.close() ~~~ ## [](https://github.com/qiwsir/StarterLearningPython/blob/master/127.md#读很大的文件)读很大的文件前面已经说明了，如果文件太大，就不能用`read()`或者`readlines()`一次性将全部内容读入内存，可以使用while循环和`readlin()`来完成这个任务。此外，还有一个方法：fileinput模块 ~~~ >>> import fileinput >>> for line in fileinput.input("you.md"): ... print line , ... You Raise Me Up When I am down and, oh my soul, so weary; When troubles come and my heart burdened be; Then, I am still and wait here in the silence, Until you come and sit awhile with me. You raise me up, so I can stand on mountains; You raise me up, to walk on stormy seas; I am strong, when I am on your shoulders; You raise me up: To more than I can be. ~~~ 我比较喜欢这个，用起来是那么得心应手，简洁明快，还用for。对于这个模块的更多内容，读者可以自己在交互模式下利用`dir()`，`help()`去查看明白。还有一种方法，更为常用： ~~~ >>> for line in f: ... print line , ... You Raise Me Up When I am down and, oh my soul, so weary; When troubles come and my heart burdened be; Then, I am still and wait here in the silence, Until you come and sit awhile with me. You raise me up, so I can stand on mountains; You raise me up, to walk on stormy seas; I am strong, when I am on your shoulders; You raise me up: To more than I can be. ~~~ 之所以能够如此，是因为file是可迭代的数据类型，直接用for来迭代即可。 ## [](https://github.com/qiwsir/StarterLearningPython/blob/master/127.md#seek)seek 这个函数的功能就是让指针移动。特别注意，它是以字节为单位进行移动的。比如： ~~~ >>> f = open("you.md") >>> f.readline() 'You Raise Me Up\n' >>> f.readline() 'When I am down and, oh my soul, so weary;\n' ~~~ 现在已经移动到第四行末尾了，看`seek()`的能力： ~~~ >>> f.seek(0) ~~~ 意图是要回到文件的最开头，那么如果用`f.readline()`应该读取第一行。 ~~~ >>> f.readline() 'You Raise Me Up\n' ~~~ 果然如此。此时指针所在的位置，还可以用`tell()`来显示，如 ~~~ >>> f.tell() 17L >>> f.seek(4) ~~~ `f.seek(4)`就将位置定位到从开头算起的第四个字符后面，也就是"You "之后，字母"R"之前的位置。 ~~~ >>> f.tell() 4L ~~~ `tell()`也是这么说的。这时候如果使用`readline()`，得到就是从当前位置开始到行末。 ~~~ >>> f.readline() 'Raise Me Up\n' >>> f.close() ~~~ `seek()`还有别的参数，具体如下： > seek(...) seek(offset[, whence]) -> None. Move to new file position. > > Argument offset is a byte count. Optional argument whence defaults to 0 (offset from start of file, offset should be >= 0); other values are 1 (move relative to current position, positive or negative), and 2 (move relative to end of file, usually negative, although many platforms allow seeking beyond the end of a file). If the file is opened in text mode, only offsets returned by tell() are legal. Use of other offsets causes undefined behavior. Note that not all file objects are seekable. whence的值： * 默认值是0，表示从文件开头开始计算指针偏移的量（简称偏移量）。这是offset必须是大于等于0的整数。 * 是1时，表示从当前位置开始计算偏移量。offset如果是负数，表示从当前位置向前移动，整数表示向后移动。 * 是2时，表示相对文件末尾移动。