杂谈：跳表（SkipList）与Redis

Aroad 的个人博客 / 0 / 0 / 创建于 3年前 / 更新于 3年前

跳表（SkipList）与Redis

接上回说到并且简单分析了一下mysql为何会使用B+树作为索引

结尾挖了个坑问mysql为何不使用跳表
现在来补上

如有错误请指出一起交流学习

什么是跳表

简单来说跳表就是加了n层索引的联表

详细可查看跳表wiki 或其他文章

这是一个有序链表

Laravel

如果我要找15 那么需要 1->3->5->7->9->11->13->15 8次

那么事件复杂度最坏情况就为O(n)

如果我往上加一层索引存在一个down元素指向下层节点

跳表的查找

那么我们的路径就是 1-> 5 -> 9 -> 13->13 -> 15 路径如图所示

Laravel

到达13时与17对比了一下而后确认区间在13-17之间而后向下查找找到15

那这样便只找了6次

那么我们继续在一级索引上方加入一个索引

Laravel

那么查找路径便会成 1->9->9->13->13->15

Laravel

好同样也是6次=。= 因为数据量太小翻车了(懒得再画图了)

但如果是找17 / 9这些节点对比原始链表与一级索引是有提升的

而由于跳表的查找思想有些类似于二分查找所以他的时间复杂度为近似于O(logN)

至于插入/删除则同样也是O(logN)

简单来说下插入与删除

插入与删除

寻找到应该在原始链表的区间地方插入

而后通过抛硬币的形式（随机0，1）随机循环出有多少个正面（1）设为k

当然k不能大于一个值，而后则插入多少层索引

删除则只需删除所有上层的节点即可

还是不太懂可以去查一下资料此处不再赘述

好那么问题来了

为何mysql使用B+树作为索引而不使用跳表

跳表同样查找的时间复杂度为O(logN) 为何mysql使用B+树作为索引呢，而且跳表还比B+树简单易懂

仍是因为层高的问题，为了保证跳表的查询效率间隔一个便会向上建一层索引（以便尽量形成二分的情况）

那么同样2千万条数据

B+树在3-4层便可以存下，而跳表需要2^24 24层才能存下

也就是说如果选择跳表那么需要24次IO才能找到数据对比B+树的3-5次谁优谁劣显而易见

为何Redis Zset使用跳表而不使用其他结构呢

引用一下redis开发者 Antirez大佬的原话链接

There are a few reasons:

1) They are not very memory intensive. It’s up to you basically. Changing parameters about the probability of a node to have a given number of levels will make then less memory intensive than btrees.

2) A sorted set is often target of many ZRANGE or ZREVRANGE operations, that is, traversing the skip list as a linked list. With this operation the cache locality of skip lists is at least as good as with other kind of balanced trees.

3) They are simpler to implement, debug, and so forth. For instance thanks to the skip list simplicity I received a patch (already in Redis master) with augmented skip lists implementing ZRANK in O(log(N)). It required little changes to the code.

About the Append Only durability & speed, I don’t think it is a good idea to optimize Redis at cost of more code and more complexity for a use case that IMHO should be rare for the Redis target (fsync() at every command). Almost no one is using this feature even with ACID SQL databases, as the performance hint is big anyway.

About threads: our experience shows that Redis is mostly I/O bound. I’m using threads to serve things from Virtual Memory. The long term solution to exploit all the cores, assuming your link is so fast that you can saturate a single core, is running multiple instances of Redis (no locks, almost fully scalable linearly with number of cores), and using the “Redis Cluster” solution that I plan to develop in the future.

翻译一下跳表足够简单 Antirez很喜欢很满意

当然除此之外因为内存中的IO相对于磁盘来说足够快所以相对于IO次数多并不会造成速度上过大的差距

Redis还用了什么数据结构

拓展一下随便聊聊也是顺便做个笔记

引用一下

Redis五种常见数据结构	内部使用数据结构
string	int \ embstr \ raw \| (随value的类型与长度切换)
list	ziplist(压缩列表) \ linkedlist(双向链表)
hash	ziplist \ hashable
zset	ziplist \ skiplist
set	intset \ hash