004.01 不同 Python 数据类型的搜寻

建檔日期: 2019/12/08

更新日期: None

相关软件信息:

Win 10 Python 3.7.2

说明:所有内容欢迎引用,只需注明来源及作者,本文内容如有错误或用词不当,敬请指正.

主题: 004.01 不同Python数据类型的搜寻

最近在作资料搜索比对的案子的时候, 发现大量的数据在搜索比对时, 速度变的非常慢, 慢到完全无法接受, 我想要的是’立即’有结果, 结果却是要等好几小时, 晕 ! 虽然以Python来说, 肯定比不上C或Assembly语言, 但是还是要想办法提升一下速度. 以下是在一万笔数据中, 找一万笔数据的各种方法以及所需的时间, 虽然最后一个方法index_list_sort(), 速度快了多, 但是我还是觉得不够快, 而且这里还只是整数的搜索, 如果是字符串呢? 如果是副字符串呢? 各位如果有更好的方法, 也请提示, 谢谢 !

结果:

0:00:04.734338 : index_sequence
0:00:01.139984 : index_list
0:00:00.330116 : index_np
0:00:00.233343 : index_np_sort
0:00:00.223401 : index_dict
0:00:00.213462 : index_set
0:00:00.007977 : index_list_sort

代码:

from datetime import datetime
import numpy as np
import bisect
import time
import random
import inspect
import copy

size        = 10000
value       = size-1
db          = random.sample(range(size), size)
db_sort     = copy.deepcopy(db)
db_sort.sort()
db_set      = set(db)
db_dict     = {db[i]:i for i in range(size)}
db_np       = np.array(db)
value       = [i for i in range(size)]

def call(func):
    # Call function and calculate execution time, then print duration and function name
    start_time = datetime.now()
    func()
    print(datetime.now() - start_time,':',func.__name__)

def do_something():
    # Do something here, it may get duration different when multi-loop method used
    for i in range(1000):
        pass

def index_sequence():
    # List unsort and just by Python without any method used or built-in function.
    for i in range(size):
        for j in range(size):
            if value[j] == db[i]:
                index = j
                do_something()
                break

def index_list():
    # Unsorted list, use list.index()
    for i in range(size):
        try:
            index = db.index(value[i])
        except:
            index = -1
        if index >= 0:
            do_something()
def index_np():
    # By using numpy and np(where)
    for i in range(size):
        result = np.where(db_np==value[i])
        if len(result[0])!=0:
            do_something()

def index_np_sort():
    # By using numpy and sorted numpy array
    for i in range(size):
        result = np.searchsorted(db_np, value[i])
        if result != size:
            do_something()

def index_list_sort():
    # By using bisect library
    for i in range(size):
        index = bisect.bisect_left(db, value[i])
        if index < size-1 and value[index]==db[index]:
            do_something()

def index_set():
    # Set serach
    for i in range(size):
        if value[i] in db_set:
            do_something()

def index_dict():
    # Dictionary search
    for i in range(size):
        try:
            index = db_dict[value[i]]
        except:
            index = -1
        if index >= 0:
            do_something()
# Test execution time
call(index_sequence)
call(index_list)
call(index_np)
call(index_np_sort)
call(index_dict)
call(index_set)
call(index_list_sort)
本作品采用《CC 协议》,转载必须注明作者和本文链接
Jason Yang
讨论数量: 0
(= ̄ω ̄=)··· 暂无内容!

讨论应以学习和精进为目的。请勿发布不友善或者负能量的内容,与人为善,比聪明更重要!