# 4.3. itertools — 迭代器函数

## 合并与分割迭代器

`chain()` 函数将几个迭代器作为参数，并返回一个迭代器。这个迭代器将会依次遍历那些作为输入的迭代器。

itertools_chain.py

``````from itertools import *

for i in chain([1, 2, 3], ['a', 'b', 'c']):
print(i, end=' ')
print()
``````

``````\$ python3 itertools_chain.py

1 2 3 a b c
``````

itertools_chain_from_iterable.py

``````from itertools import *

def make_iterables_to_chain():
yield [1, 2, 3]
yield ['a', 'b', 'c']

for i in chain.from_iterable(make_iterables_to_chain()):
print(i, end=' ')
print()
``````
``````\$ python3 itertools_chain_from_iterable.py

1 2 3 a b c
``````

itertools_zip.py

``````for i in zip([1, 2, 3], ['a', 'b', 'c']):
print(i)
``````

``````\$ python3 itertools_zip.py

(1, 'a')
(2, 'b')
(3, 'c')
``````

`zip()` 将会在任意一个输入迭代器被遍历完时停止。如果想完整的遍历所有的输入迭代器（即使它们有不同的长度），我们可以用 `zip_longest()` 函数。

itertools_zip_longest.py

``````from itertools import *

r1 = range(3)
r2 = range(2)

print('zip stops early:')
print(list(zip(r1, r2)))

r1 = range(3)
r2 = range(2)

print('\nzip_longest processes all of the values:')
print(list(zip_longest(r1, r2)))
``````

``````\$ python3 itertools_zip_longest.py

zip stops early:
[(0, 0), (1, 1)]

zip_longest processes all of the values:
[(0, 0), (1, 1), (2, None)]
``````

`islice()` 函数将把输入迭代器的一部分作为其输出的迭代器。

itertools_islice.py

``````from itertools import *

print('Stop at 5:')
for i in islice(range(100), 5):
print(i, end=' ')
print('\n')

print('Start at 5, Stop at 10:')
for i in islice(range(100), 5, 10):
print(i, end=' ')
print('\n')

print('By tens to 100:')
for i in islice(range(100), 0, 100, 10):
print(i, end=' ')
print('\n')
``````

`islice()` 和 slice 操作一样，将 `start``stop` 以及 `step` 作为输入参数。其中 `start``step` 参数是可选的。

``````\$ python3 itertools_islice.py

Stop at 5:
0 1 2 3 4

Start at 5, Stop at 10:
5 6 7 8 9

By tens to 100:
0 10 20 30 40 50 60 70 80 90
``````

itertools_tee.py

``````from itertools import *

r = islice(count(), 5)
i1, i2 = tee(r)

print('i1:', list(i1))
print('i2:', list(i2))
``````

``````\$ python3 itertools_tee.py

i1: [0, 1, 2, 3, 4]
i2: [0, 1, 2, 3, 4]
``````

itertools_tee_error.py

``````from itertools import *

r = islice(count(), 5)
i1, i2 = tee(r)

print('r:', end=' ')
for i in r:
print(i, end=' ')
if i > 1:
break
print()

print('i1:', list(i1))
print('i2:', list(i2))
``````

``````\$ python3 itertools_tee_error.py

r: 0 1 2
i1: [3, 4]
i2: [3, 4]
``````

## 变换输入

itertools_map.py

``````
def times_two(x):
return 2 * x

def multiply(x, y):
return (x, y, x * y)

print('Doubles:')
for i in map(times_two, range(5)):
print(i)

print('\nMultiples:')
r1 = range(5)
r2 = range(5, 10)
for i in map(multiply, r1, r2):
print('{:d} * {:d} = {:d}'.format(*i))

print('\nStopping:')
r1 = range(5)
r2 = range(2)
for i in map(multiply, r1, r2):
print(i)
``````

``````\$ python3 itertools_map.py

Doubles:
0
2
4
6
8

Multiples:
0 * 5 = 0
1 * 6 = 6
2 * 7 = 14
3 * 8 = 24
4 * 9 = 36

Stopping:
(0, 0, 0)
(1, 1, 1)
``````

`starmap()` 这个函数和 `map()` 的作用很相似。但是 `map()` 函数的输入实际上是由多个（或单个）迭代器组成的元组，而 `starmap()` 遍历的是一个返回元组的单个迭代器。它会用 `*` 记号把元组分离成参数列表传入函数。

itertools_starmap.py

``````from itertools import *

values = [(0, 5), (1, 6), (2, 7), (3, 8), (4, 9)]

for i in starmap(lambda x, y: (x, y, x * y), values):
print('{} * {} = {}'.format(*i))
``````

``````\$ python3 itertools_starmap.py

0 * 5 = 0
1 * 6 = 6
2 * 7 = 14
3 * 8 = 24
4 * 9 = 36
``````

## 产生新的值

`count()` 函数返回一个产生一列连续整数的迭代器。我们可以传递一个参数来设定起始值。与内置函数 `range()` 不同，不需要给出一个参数来设定上限。

itertools_count.py

``````from itertools import *

for i in zip(count(1), ['a', 'b', 'c']):
print(i)
``````

``````\$ python3 itertools_count.py

(1, 'a')
(2, 'b')
(3, 'c')
``````

`count()` 函数的起始和步长参数可以是任意可相加的数。

itertools_count_step.py

``````import fractions
from itertools import *

start = fractions.Fraction(1, 3)
step = fractions.Fraction(1, 3)

for i in zip(count(start, step), ['a', 'b', 'c']):
print('{}: {}'.format(*i))
``````

``````\$ python3 itertools_count_step.py

1/3: a
2/3: b
1: c
``````

`cycle()` 函数将会把输入的可迭代对象无限循环输出的迭代器。因此，这个函数会记住整个输入的迭代器，因此，这在输入迭代对象较长时会占用较多的内存。

itertools_cycle.py

``````from itertools import *

for i in zip(range(7), cycle(['a', 'b', 'c'])):
print(i)
``````

``````\$ python3 itertools_cycle.py

(0, 'a')
(1, 'b')
(2, 'c')
(3, 'a')
(4, 'b')
(5, 'c')
(6, 'a')
``````

`repeat()` 函数返回的迭代器会把一个值重复几次输出。

itertools_repeat.py

``````from itertools import *

for i in repeat('over-and-over', 5):
print(i)
``````

``````\$ python3 itertools_repeat.py

over-and-over
over-and-over
over-and-over
over-and-over
over-and-over
``````

`repeat()``zip()` 以及 `map()` 组合起来使用是一种常用的，把一个值和其他迭代器组合在一起的方法。

itertools_repeat_zip.py

``````from itertools import *

for i, s in zip(count(), repeat('over-and-over', 5)):
print(i, s)
``````

``````\$ python3 itertools_repeat_zip.py

0 over-and-over
1 over-and-over
2 over-and-over
3 over-and-over
4 over-and-over
``````

itertools_repeat_map.py

``````from itertools import *

for i in map(lambda x, y: (x, y, x * y), repeat(2), range(5)):
print('{:d} * {:d} = {:d}'.format(*i))
``````

``````\$ python3 itertools_repeat_map.py

2 * 0 = 0
2 * 1 = 2
2 * 2 = 4
2 * 3 = 6
2 * 4 = 8
``````

## 过滤

`dropwhile()` 函数返回一个迭代器，其中的元素为原迭代器中，给定条件首次为假之后的所有元素。

itertools_dropwhile.py

``````from itertools import *

def should_drop(x):
print('Testing:', x)
return x < 1

for i in dropwhile(should_drop, [-1, 0, 1, 2, -2]):
print('Yielding:', i)
``````

`dropwhile()` 并不过滤所有元素；当条件首次为假后，原迭代器中剩余元素将全部返回。

``````\$ python3 itertools_dropwhile.py

Testing: -1
Testing: 0
Testing: 1
Yielding: 1
Yielding: 2
Yielding: -2
``````

itertools_takewhile.py

``````from itertools import *

def should_take(x):
print('Testing:', x)
return x < 2

for i in takewhile(should_take, [-1, 0, 1, 2, -2]):
print('Yielding:', i)
``````

``````\$ python3 itertools_takewhile.py

Testing: -1
Yielding: -1
Testing: 0
Yielding: 0
Testing: 1
Yielding: 1
Testing: 2
``````

itertools_filter.py

``````from itertools import *

def check_item(x):
print('Testing:', x)
return x < 1

for i in filter(check_item, [-1, 0, 1, 2, -2]):
print('Yielding:', i)
``````

`dropwhile()``takewhile()` 不同的是，`filter()` 返回前，所有元素都会被测试。

``````\$ python3 itertools_filter.py

Testing: -1
Yielding: -1
Testing: 0
Yielding: 0
Testing: 1
Testing: 2
Testing: -2
Yielding: -2
``````

`filterfalse()` 返回一个迭代器，其中只包含使测试函数为假的所有元素。

itertools_filterfalse.py

``````from itertools import *

def check_item(x):
print('Testing:', x)
return x < 1

for i in filterfalse(check_item, [-1, 0, 1, 2, -2]):
print('Yielding:', i)
``````

`check_item()` 中的内容与前例相同，所以此例中 `filterfalse()` 返回的结果恰好与前例相反。

``````\$ python3 itertools_filterfalse.py

Testing: -1
Testing: 0
Testing: 1
Yielding: 1
Testing: 2
Yielding: 2
Testing: -2
``````

`compress()` 提供了另一种方法来过滤序列。它不是调用一个测试函数，而是使用另外一个序列中的值来决定元素的取舍。

itertools_compress.py

``````from itertools import *

every_third = cycle([False, False, True])
data = range(1, 10)

for i in compress(data, every_third):
print(i, end=' ')
print()
``````

``````\$ python3 itertools_compress.py

3 6 9
``````

## 数据分组

`groupby()` 函数返回一个迭代器，其中的每个元素是有一个共同的键的一组值。这个例子中展示了根据一个属性来对相关数据进行分组的方法。

itertools_groupby_seq.py

``````import functools
from itertools import *
import operator
import pprint

@functools.total_ordering
class Point:

def __init__(self, x, y):
self.x = x
self.y = y

def __repr__(self):
return '({}, {})'.format(self.x, self.y)

def __eq__(self, other):
return (self.x, self.y) == (other.x, other.y)

def __gt__(self, other):
return (self.x, self.y) > (other.x, other.y)

# 为 Point 实例创建一个数据集
data = list(map(Point,
cycle(islice(count(), 3)),
islice(count(), 7)))
print('Data:')
pprint.pprint(data, width=35)
print()

# 将未排序的数据按X值分组
print('Grouped, unsorted:')
for k, g in groupby(data, operator.attrgetter('x')):
print(k, list(g))
print()

# 对数据进行排序
data.sort()
print('Sorted:')
pprint.pprint(data, width=35)
print()

# 将排序后的数据按X值分组
print('Grouped, sorted:')
for k, g in groupby(data, operator.attrgetter('x')):
print(k, list(g))
print()
``````

``````\$ python3 itertools_groupby_seq.py

Data:
[(0, 0),
(1, 1),
(2, 2),
(0, 3),
(1, 4),
(2, 5),
(0, 6)]

Grouped, unsorted:
0 [(0, 0)]
1 [(1, 1)]
2 [(2, 2)]
0 [(0, 3)]
1 [(1, 4)]
2 [(2, 5)]
0 [(0, 6)]

Sorted:
[(0, 0),
(0, 3),
(0, 6),
(1, 1),
(1, 4),
(2, 2),
(2, 5)]

Grouped, sorted:
0 [(0, 0), (0, 3), (0, 6)]
1 [(1, 1), (1, 4)]
2 [(2, 2), (2, 5)]
``````

## 联结输入

`accumulate()`函数将输入序列的第 n 和第 n+1 个元素传入给定函数，产出返回值。缺省情况下，函数将返回两个输入参数的和， 所以 `accumulate()` 可以用来得到一个数字序列的累加和。

itertools_accumulate.py

``````from itertools import *

print(list(accumulate(range(5))))
print(list(accumulate('abcde')))
``````

``````\$ python3 itertools_accumulate.py

[0, 1, 3, 6, 10]
['a', 'ab', 'abc', 'abcd', 'abcde']
``````

itertools_accumulate_custom.py

``````from itertools import *

def f(a, b):
print(a, b)
return b + a + b

print(list(accumulate('abcde', f)))
``````

``````\$ python3 itertools_accumulate_custom.py

a b
bab c
cbabc d
dcbabcd e
['a', 'bab', 'cbabc', 'dcbabcd', 'edcbabcde']
``````

`product()` 常用来取代对多个序列的嵌套 `for` 循环，返回一个包含所有输入组合的笛卡儿积的迭代器。

itertools_product.py

``````from itertools import *
import pprint

FACE_CARDS = ('J', 'Q', 'K', 'A')
SUITS = ('H', 'D', 'C', 'S')

DECK = list(
product(
chain(range(2, 11), FACE_CARDS),
SUITS,
)
)

for card in DECK:
print('{:>2}{}'.format(*card), end=' ')
if card[1] == SUITS[-1]:
print()
``````

`product()` 产出的每个元素是一个元组，其中的成员依次取自传入的各序列。第一个返回的元组的成员依次是传入的各序列的第一个元素。最后一个传入 `product()` 的序列将首先迭代，然后是倒数第二个，依次类推。这样得到的结果将对第一个序列有序，然后对第二个序列有序，等等。

``````\$ python3 itertools_product.py

2H  2D  2C  2S
3H  3D  3C  3S
4H  4D  4C  4S
5H  5D  5C  5S
6H  6D  6C  6S
7H  7D  7C  7S
8H  8D  8C  8S
9H  9D  9C  9S
10H 10D 10C 10S
JH  JD  JC  JS
QH  QD  QC  QS
KH  KD  KC  KS
``````

itertools_product_ordering.py

``````from itertools import *
import pprint

FACE_CARDS = ('J', 'Q', 'K', 'A')
SUITS = ('H', 'D', 'C', 'S')

DECK = list(
product(
SUITS,
chain(range(2, 11), FACE_CARDS),
)
)

for card in DECK:
print('{:>2}{}'.format(card[1], card[0]), end=' ')
if card[1] == FACE_CARDS[-1]:
print()
``````

``````\$ python3 itertools_product_ordering.py

2H  3H  4H  5H  6H  7H  8H  9H 10H  JH  QH  KH  AH
2D  3D  4D  5D  6D  7D  8D  9D 10D  JD  QD  KD  AD
2C  3C  4C  5C  6C  7C  8C  9C 10C  JC  QC  KC  AC
2S  3S  4S  5S  6S  7S  8S  9S 10S  JS  QS  KS  AS
``````

itertools_product_repeat.py

``````from itertools import *

def show(iterable):
for i, item in enumerate(iterable, 1):
print(item, end=' ')
if (i % 3) == 0:
print()
print()

print('Repeat 2:\n')
show(list(product(range(3), repeat=2)))

print('Repeat 3:\n')
show(list(product(range(3), repeat=3)))
``````

``````\$ python3 itertools_product_repeat.py

Repeat 2:

(0, 0) (0, 1) (0, 2)
(1, 0) (1, 1) (1, 2)
(2, 0) (2, 1) (2, 2)

Repeat 3:

(0, 0, 0) (0, 0, 1) (0, 0, 2)
(0, 1, 0) (0, 1, 1) (0, 1, 2)
(0, 2, 0) (0, 2, 1) (0, 2, 2)
(1, 0, 0) (1, 0, 1) (1, 0, 2)
(1, 1, 0) (1, 1, 1) (1, 1, 2)
(1, 2, 0) (1, 2, 1) (1, 2, 2)
(2, 0, 0) (2, 0, 1) (2, 0, 2)
(2, 1, 0) (2, 1, 1) (2, 1, 2)
(2, 2, 0) (2, 2, 1) (2, 2, 2)
``````

`permutations()` 函数产出输入序列的所有给定长度的排列。默认返回全排列（与原序列长度相等）。

itertools_permutations.py

``````from itertools import *

def show(iterable):
first = None
for i, item in enumerate(iterable, 1):
if first != item[0]:
if first is not None:
print()
first = item[0]
print(''.join(item), end=' ')
print()

print('All permutations:\n')
show(permutations('abcd'))

print('\nPairs:\n')
show(permutations('abcd', r=2))
``````

``````\$ python3 itertools_permutations.py

All permutations:

dabc dacb dbac dbca dcab dcba

Pairs:

ba bc bd
ca cb cd
da db dc
``````

itertools_combinations.py

``````from itertools import *

def show(iterable):
first = None
for i, item in enumerate(iterable, 1):
if first != item[0]:
if first is not None:
print()
first = item[0]
print(''.join(item), end=' ')
print()

print('Unique pairs:\n')
show(combinations('abcd', r=2))
``````

`permutations()` 不同，`combinations()``r` 参数不能省略。

``````\$ python3 itertools_combinations.py

Unique pairs:

bc bd
cd
``````

itertools_combinations_with_replacement.py

``````from itertools import *

def show(iterable):
first = None
for i, item in enumerate(iterable, 1):
if first != item[0]:
if first is not None:
print()
first = item[0]
print(''.join(item), end=' ')
print()

print('Unique pairs:\n')
show(combinations_with_replacement('abcd', r=2))
``````

``````\$ python3 itertools_combinations_with_replacement.py

Unique pairs:

bb bc bd
cc cd
dd
``````