mysql 同步 es 详细步骤讲解

github 地址


  1. 当前环境需要 mysqldump 工具取导出数据
  2. 阿里云的数据库需要 skip_master_data = true
  3. 源代码里 是使用 update 方法取修改文档数据,而不是 index 方法
  4. master 节点 binglog 日志设长点 不然的话 等 dump 完 之前的同步位点的 log 日志就会丢失


  • 配置文件
    # MySQL address, user and password
    # user must have replication privilege in MySQL.
    my_addr = "localhost"
    my_user = "root"
    my_pass = "root"
    my_charset = "utf8mb4"
    # Set true when elasticsearch use https
    #es_https = false
    # Elasticsearch address
    es_addr = "elasticsearch.addr"
    # Elasticsearch user and password, maybe set by shield, nginx, or x-pack
    es_user = ""
    es_pass = ""
    # Path to store data, like, if not set or empty,
    # we must use this to support breakpoint resume syncing. 
    # TODO: support other storage, like etcd. 
    data_dir = "./postion"
    # Inner Http status address
    stat_addr = ""
    stat_path = "/metrics"
    # pseudo server id like a slave 
    server_id = 1
    # mysql or mariadb
    flavor = "mysql"
    # mysqldump execution path
    # if not set or empty, ignore mysqldump.
    mysqldump = "mysqldump"
    # if we have no privilege to use mysqldump with --master-data,
    # we must skip it.
    skip_master_data = false
    # minimal items to be inserted in one bulk
    bulk_size = 128
    # force flush the pending requests if we don't have enough items >= bulk_size
    flush_bulk_time = "200ms"
    # Ignore table without primary key
    skip_no_pk_table = true
    # MySQL data source
    schema = "a"
    tables = ["b"]
    # Below is for special rule mapping
    schema = "a"
    table = "b"
    index = "es_index_v1"
    type = "_doc"
    id = ["id"]
    filter = ["id","refine_timestamp"]
    # Map column `id` to ES field `es_id`
    refine_timestamp = "refineTimestamp"
本作品采用《CC 协议》,转载必须注明作者和本文链接
讨论数量: 2


1年前 评论
CrazyZard (楼主) 1年前
