PHP 实现 Snowflake 生成分布式唯一 ID

Twitter 的 snowflake 在分布式生成唯一 UUID 应用还是蛮广泛的,基于 snowflake 的一些变种的算法网上也有不少。使用 snowflake 生成 UUID 很多都是在分布式场景下使用,我看了下网上有其中有几篇 PHP 实现的都没有考虑到线程安全。现在 PHP 有了 Swoole 的锁和协程的加持,对于我们开发线程安全和高并发模拟还是很方便的,这里用 PHP 结合 Swoole 来学习下实现最简单的 snowflake(好久没写 PHP,感觉没有 IDE 真写不了 PHP 了)。

先来看以下 snowflake 的结构:
1543222572214

生成的数值是 64 位,分成 4 个部分:

  • 第一个 bit 为符号位,最高位为 0 表示正数
  • 第二部分 41 个 bit 用于记录生成 ID 时候的时间戳,单位为毫秒,所以该部分表示的数值范围为 2^41 - 1(69 年),它是相对于某一时间的偏移量
  • 第三部分的 10 个 bit 表示工作节点的 ID,表示数值范围为 2^10 - 1,相当于支持 1024 个节点
  • 第四部分 12 个 bit 表示每个工作节点没毫秒生成的循环自增 id,最多可以生成 2^12 -1 个 id,超出归零等待下一毫秒重新自增

先贴下代码:

<?php

class Snowflake
{
    const EPOCH = 1543223810238;    // 起始时间戳,毫秒

    const SEQUENCE_BITS = 12;   //序号部分12位
    const SEQUENCE_MAX = -1 ^ (-1 << self::SEQUENCE_BITS);  // 序号最大值

    const WORKER_BITS = 10; // 节点部分10位
    const WORKER_MAX = -1 ^ (-1 << self::WORKER_BITS);  // 节点最大数值

    const TIME_SHIFT = self::WORKER_BITS + self::SEQUENCE_BITS; // 时间戳部分左偏移量
    const WORKER_SHIFT = self::SEQUENCE_BITS;   // 节点部分左偏移量

    protected $timestamp;   // 上次ID生成时间戳
    protected $workerId;    // 节点ID
    protected $sequence;    // 序号
    protected $lock;        // Swoole 互斥锁

    public function __construct($workerId)
    {
        if ($workerId < 0 || $workerId > self::WORKER_MAX) {
            trigger_error("Worker ID 超出范围");
            exit(0);
        }

        $this->timestamp = 0;
        $this->workerId = $workerId;
        $this->sequence = 0;
        $this->lock = new swoole_lock(SWOOLE_MUTEX);
    }

    /**
     * 生成ID
     * @return int
     */
    public function getId()
    {
        $this->lock->lock();    // 这里一定要记得加锁
        $now = $this->now();
        if ($this->timestamp == $now) {
            $this->sequence++;

            if ($this->sequence > self::SEQUENCE_MAX) {
                // 当前毫秒内生成的序号已经超出最大范围,等待下一毫秒重新生成
                while ($now <= $this->timestamp) {
                    $now = $this->now();
                }
            }
        } else {
            $this->sequence = 0;
        }

        $this->timestamp = $now;    // 更新ID生时间戳

        $id = (($now - self::EPOCH) << self::TIME_SHIFT) | ($this->workerId << self::WORKER_SHIFT) | $this->sequence;
        $this->lock->unlock();  //解锁

        return $id;
    }

    /**
     * 获取当前毫秒
     * @return string
     */
    public function now()
    {
        return sprintf("%.0f", microtime(true) * 1000);
    }

}

其实逻辑并不复杂,解释一下代码中的位运算:

-1 ^ (-1 << self::SEQUENCE_BITS)
就是-1的二进制表示为1的补码,其实等同于 :
2**self::SEQUENCE_BITS - 1

最后部分左移后或运算:

(($now - self::EPOCH) << self::TIME_SHIFT) | ($this->workerId << self::WORKER_SHIFT) | $this->sequence;

这里主要是对除了第一位符号位以外的三个部分进行左移相应的偏移量使其归位,并通过或运算重新整合成上面 snowflake 的结构,比如我们用 3 部分 4 位来演示一下该归并操作:

0000 0000 0010  --左移0位--> 0000 0000 0010
0000 0000 0100  --左移4位--> 0000 0100 0000 --或操作-->1000 0100 0010
0000 0000 1000  --左移8位--> 1000 0000 0000

下面借助 Swoole 的协程和 channel 来暴力测试一下,看看生成的 ID 是否会出现重复的状况:

$snowflake = new Snowflake(1);

$chan = new chan(100000);
$n = 100000;

for ($i = 0; $i < $n; $i++) {
    go(function () use ($snowflake, $chan) {
        $id = $snowflake->getId();
        $chan->push($id);
    });
}

go(function () use ($chan, $n) {
    $arr = [];
    for ($i = 0; $i < $n; $i++) {
        $id = $chan->pop();  // PHP Swoole的channel一定要写在go(func)的协程里面!?
        if (in_array($id, $arr)) {
            exit("ID 已存在");
        }
        array_push($arr, $id);
    }
});

$chan->close();

echo "ok";

跑了一下,确实不会出现重复的 ID,对了,我用 Golang 同样实现了 snowflake 并协程序方式跑了同样的测试,PHP 的执行时间是大约 12 秒左右,Golang 只需要 1 秒。文章有什么错误还请指正,谢谢。

转载请注明: 转载自Ryan 是菜鸟 | LNMP 技术栈笔记

如果觉得本篇文章对您十分有益,何不 打赏一下

谢谢打赏

本文链接地址: PHP 实现 Snowflake 生成分布式唯一 ID

php
本作品采用《CC 协议》,转载必须注明作者和本文链接
本帖由系统于 5年前 自动加精
讨论数量: 7

可以,可以。小伙子可以把 golang 的贴上来,我们看下。

5年前 评论

@Dragonbuf
package snowflake

import (
"encoding/base64"
"encoding/binary"
"errors"
"fmt"
"strconv"
"sync"
"time"
)

var (
// Epoch is set to the twitter snowflake epoch of Nov 04 2010 01:42:54 UTC
// You may customize this to set a different epoch for your application.
Epoch int64 = 1543223810238

// Number of bits to use for Node
// Remember, you have a total 22 bits to share between Node/Step
NodeBits uint8 = 10

// Number of bits to use for Step
// Remember, you have a total 22 bits to share between Node/Step
StepBits uint8 = 12

nodeMax   int64 = -1 ^ (-1 << NodeBits)
nodeMask  int64 = nodeMax << StepBits
stepMask  int64 = -1 ^ (-1 << StepBits)
timeShift uint8 = NodeBits + StepBits
nodeShift uint8 = StepBits

)

const encodeBase32Map = "ybndrfg8ejkmcpqxot1uwisza345h769"

var decodeBase32Map [256]byte

const encodeBase58Map = "123456789abcdefghijkmnopqrstuvwxyzABCDEFGHJKLMNPQRSTUVWXYZ"

var decodeBase58Map [256]byte

// A JSONSyntaxError is returned from UnmarshalJSON if an invalid ID is provided.
type JSONSyntaxError struct{ original []byte }

func (j JSONSyntaxError) Error() string {
return fmt.Sprintf("invalid snowflake ID %q", string(j.original))
}

// Create a map for decoding Base58. This speeds up the process tremendously.
func init() {

for i := 0; i < len(encodeBase58Map); i++ {
    decodeBase58Map[i] = 0xFF
}

for i := 0; i < len(encodeBase58Map); i++ {
    decodeBase58Map[encodeBase58Map[i]] = byte(i)
}

for i := 0; i < len(encodeBase32Map); i++ {
    decodeBase32Map[i] = 0xFF
}

for i := 0; i < len(encodeBase32Map); i++ {
    decodeBase32Map[encodeBase32Map[i]] = byte(i)
}

}

// ErrInvalidBase58 is returned by ParseBase58 when given an invalid []byte
var ErrInvalidBase58 = errors.New("invalid base58")

// ErrInvalidBase32 is returned by ParseBase32 when given an invalid []byte
var ErrInvalidBase32 = errors.New("invalid base32")

// A Node struct holds the basic information needed for a snowflake generator
// node
type Node struct {
mu sync.Mutex
time int64
node int64
step int64
}

// An ID is a custom type used for a snowflake ID. This is used so we can
// attach methods onto the ID.
type ID int64

// NewNode returns a new snowflake node that can be used to generate snowflake
// IDs
func NewNode(node int64) (*Node, error) {

// re-calc in case custom NodeBits or StepBits were set
nodeMax = -1 ^ (-1 << NodeBits)
nodeMask = nodeMax << StepBits
stepMask = -1 ^ (-1 << StepBits)
timeShift = NodeBits + StepBits
nodeShift = StepBits

if node < 0 || node > nodeMax {
    return nil, errors.New("Node number must be between 0 and " + strconv.FormatInt(nodeMax, 10))
}

return &Node{
    time: 0,
    node: node,
    step: 0,
}, nil

}

// Generate creates and returns a unique snowflake ID
func (n *Node) Generate() ID {

n.mu.Lock()

now := time.Now().UnixNano() / 1000000

if n.time == now {
    n.step = (n.step + 1) & stepMask

    if n.step == 0 {
        for now <= n.time {
            now = time.Now().UnixNano() / 1000000
        }
    }
} else {
    n.step = 0
}

n.time = now

r := ID((now-Epoch)<<timeShift |
    (n.node << nodeShift) |
    (n.step),
)

n.mu.Unlock()
return r

}

// Int64 returns an int64 of the snowflake ID
func (f ID) Int64() int64 {
return int64(f)
}

// String returns a string of the snowflake ID
func (f ID) String() string {
return strconv.FormatInt(int64(f), 10)
}

// Base2 returns a string base2 of the snowflake ID
func (f ID) Base2() string {
return strconv.FormatInt(int64(f), 2)
}

// Base36 returns a base36 string of the snowflake ID
func (f ID) Base36() string {
return strconv.FormatInt(int64(f), 36)
}

// Base32 uses the z-base-32 character set but encodes and decodes similar
// to base58, allowing it to create an even smaller result string.
// NOTE: There are many different base32 implementations so becareful when
// doing any interoperation interop with other packages.
func (f ID) Base32() string {

if f < 32 {
    return string(encodeBase32Map[f])
}

b := make([]byte, 0, 12)
for f >= 32 {
    b = append(b, encodeBase32Map[f%32])
    f /= 32
}
b = append(b, encodeBase32Map[f])

for x, y := 0, len(b)-1; x < y; x, y = x+1, y-1 {
    b[x], b[y] = b[y], b[x]
}

return string(b)

}

// ParseBase32 parses a base32 []byte into a snowflake ID
// NOTE: There are many different base32 implementations so becareful when
// doing any interoperation interop with other packages.
func ParseBase32(b []byte) (ID, error) {

var id int64

for i := range b {
    if decodeBase32Map[b[i]] == 0xFF {
        return -1, ErrInvalidBase32
    }
    id = id*32 + int64(decodeBase32Map[b[i]])
}

return ID(id), nil

}

// Base58 returns a base58 string of the snowflake ID
func (f ID) Base58() string {

if f < 58 {
    return string(encodeBase58Map[f])
}

b := make([]byte, 0, 11)
for f >= 58 {
    b = append(b, encodeBase58Map[f%58])
    f /= 58
}
b = append(b, encodeBase58Map[f])

for x, y := 0, len(b)-1; x < y; x, y = x+1, y-1 {
    b[x], b[y] = b[y], b[x]
}

return string(b)

}

// ParseBase58 parses a base58 []byte into a snowflake ID
func ParseBase58(b []byte) (ID, error) {

var id int64

for i := range b {
    if decodeBase58Map[b[i]] == 0xFF {
        return -1, ErrInvalidBase58
    }
    id = id*58 + int64(decodeBase58Map[b[i]])
}

return ID(id), nil

}

// Base64 returns a base64 string of the snowflake ID
func (f ID) Base64() string {
return base64.StdEncoding.EncodeToString(f.Bytes())
}

// Bytes returns a byte slice of the snowflake ID
func (f ID) Bytes() []byte {
return []byte(f.String())
}

// IntBytes returns an array of bytes of the snowflake ID, encoded as a
// big endian integer.
func (f ID) IntBytes() [8]byte {
var b [8]byte
binary.BigEndian.PutUint64(b[:], uint64(f))
return b
}

// Time returns an int64 unix timestamp of the snowflake ID time
func (f ID) Time() int64 {
return (int64(f) >> timeShift) + Epoch
}

// Node returns an int64 of the snowflake ID node number
func (f ID) Node() int64 {
return int64(f) & nodeMask >> nodeShift
}

// Step returns an int64 of the snowflake step (or sequence) number
func (f ID) Step() int64 {
return int64(f) & stepMask
}

// MarshalJSON returns a json byte array string of the snowflake ID.
func (f ID) MarshalJSON() ([]byte, error) {
buff := make([]byte, 0, 22)
buff = append(buff, '"')
buff = strconv.AppendInt(buff, int64(f), 10)
buff = append(buff, '"')
return buff, nil
}

// UnmarshalJSON converts a json byte array of a snowflake ID into an ID type.
func (f *ID) UnmarshalJSON(b []byte) error {
if len(b) < 3 || b[0] != '"' || b[len(b)-1] != '"' {
return JSONSyntaxError{b}
}

i, err := strconv.ParseInt(string(b[1:len(b)-1]), 10, 64)
if err != nil {
    return err
}

*f = ID(i)
return nil

}

自己动手 丰衣足食

5年前 评论

@wujingke 不是啊,小伙子。我是想看下 差10倍的性能是怎么产生的。

5年前 评论

10倍性能差 有点皮

5年前 评论
Oraoto
 if ($this->timestamp == $now) {
            $this->sequence++;

            if ($this->sequence > self::SEQUENCE_MAX) {
                // 当前毫秒内生成的序号已经超出最大范围,等待下一毫秒重新生成
                while ($now <= $this->timestamp) {
                    $now = $this->now();
                }
            }
        } else {
            $this->sequence = 0;
        }

~sequence~达到最大时,只等待下一毫秒,没有重置为0。

twitter是这样实现的:

if (lastTimestamp == timestamp) {
      // +1, 取最后SEQUENCE_BITS位,为0说明到最大了,等下一毫秒,而且sequence也刚好重置为0
      sequence = (sequence + 1) & sequenceMask 
      if (sequence == 0) { 
        timestamp = tilNextMillis(lastTimestamp)
      }
    } else {
      sequence = 0
    }
4年前 评论

弱弱问一下

-1 ^ (-1 << self::WORKER_BITS) 和 1 << self::WORKER_BITS - 1

这两种写法应该一样吧,前面一种有啥优势吗,我不太懂这种异或写法

3年前 评论

讨论应以学习和精进为目的。请勿发布不友善或者负能量的内容,与人为善,比聪明更重要!