我的编程空间,编程开发者的网络收藏夹
学习永远不晚

DataX 3.0 在Windows下基于MySQL做数据迁移示例

短信预约 -IT技能 免费直播动态提醒
省份

北京

  • 北京
  • 上海
  • 天津
  • 重庆
  • 河北
  • 山东
  • 辽宁
  • 黑龙江
  • 吉林
  • 甘肃
  • 青海
  • 河南
  • 江苏
  • 湖北
  • 湖南
  • 江西
  • 浙江
  • 广东
  • 云南
  • 福建
  • 海南
  • 山西
  • 四川
  • 陕西
  • 贵州
  • 安徽
  • 广西
  • 内蒙
  • 西藏
  • 新疆
  • 宁夏
  • 兵团
手机号立即预约

请填写图片验证码后获取短信验证码

看不清楚,换张图片

免费获取短信验证码

DataX 3.0 在Windows下基于MySQL做数据迁移示例

在 Windows 安装 Datax:

Datax 官网:https://github.com/alibaba/DataX

环境要求:

JDK(1.8以上,推荐1.8,并配置好环境变量)

Python(网上推荐Python2.7.X,本人Python3.10亲测没问题。并配置好环境变量)

Maven(建议不低于maven3.5,并配置好环境变量)

Datax 安装:

下载地址:https://datax-opensource.oss-cn-hangzhou.aliyuncs.com/202303/datax.tar.gz

下载后解压至本地某个目录,进入bin目录。我本地解压的路径(D:\datax)

测试Datax安装

执行cmd命令进入Datax下的bin目录。我Datax解压到 D:\datax,所以具体进入路径是 D:\datax\bin

cd D:\datax\bin

第一步先输入CHCP 65001命令防止控制台中文乱码,第二步再进行自检脚本命令

CHCP 65001

python .\datax.py ..\job\job.json

会得到如下结果:

DataX (DATAX-OPENSOURCE-3.0), From Alibaba !Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved.2023-06-14 13:09:32.800 [main] INFO  MessageSource - JVM TimeZone: GMT+08:00, Locale: zh_CN2023-06-14 13:09:32.803 [main] INFO  MessageSource - use Locale: zh_CN timeZone: sun.util.calendar.ZoneInfo[id="GMT+08:00",offset=28800000,dstSavings=0,useDaylight=false,transitions=0,lastRule=null]2023-06-14 13:09:32.813 [main] INFO  VMInfo - VMInfo# operatingSystem class => sun.management.OperatingSystemImpl2023-06-14 13:09:32.818 [main] INFO  Engine - the machine info  =>        osInfo: Oracle Corporation 1.8 25.361-b09        jvmInfo:        Windows 10 amd64 10.0        cpu num:        8        totalPhysicalMemory:    -0.00G        freePhysicalMemory:     -0.00G        maxFileDescriptorCount: -1        currentOpenFileDescriptorCount: -1        GC Names        [PS MarkSweep, PS Scavenge]        MEMORY_NAME                    | allocation_size                | init_size        PS Eden Space                  | 256.00MB                       | 256.00MB        Code Cache                     | 240.00MB                       | 2.44MB        Compressed Class Space         | 1,024.00MB                     | 0.00MB        PS Survivor Space              | 42.50MB                        | 42.50MB        PS Old Gen                     | 683.00MB                       | 683.00MB        Metaspace                      | -0.00MB                        | 0.00MB2023-06-14 13:09:32.828 [main] INFO  Engine -{        "setting":{                "speed":{                        "channel":1                },                "errorLimit":{                        "record":0,                        "percentage":0.02                }        },        "content":[                {                        "reader":{    "name":"streamreader",    "parameter":{            "column":[                    {"value":"DataX","type":"string"                    },                    {"value":19890604,"type":"long"                    },                    {"value":"1989-06-04 00:00:00","type":"date"                    },                    {"value":true,"type":"bool"                    },                    {"value":"test","type":"bytes"                    }            ],            "sliceRecordCount":100000    }                        },                        "writer":{    "name":"streamwriter",    "parameter":{            "print":false,            "encoding":"UTF-8"    }                        }                }        ]}2023-06-14 13:09:32.846 [main] INFO  PerfTrace - PerfTrace traceId=job_-1, isEnable=false2023-06-14 13:09:32.846 [main] INFO  JobContainer - DataX jobContainer starts job.2023-06-14 13:09:32.847 [main] INFO  JobContainer - Set jobId = 02023-06-14 13:09:32.863 [job-0] INFO  JobContainer - jobContainer starts to do prepare ...2023-06-14 13:09:32.864 [job-0] INFO  JobContainer - DataX Reader.Job [streamreader] do prepare work .2023-06-14 13:09:32.864 [job-0] INFO  JobContainer - DataX Writer.Job [streamwriter] do prepare work .2023-06-14 13:09:32.864 [job-0] INFO  JobContainer - jobContainer starts to do split ...2023-06-14 13:09:32.865 [job-0] INFO  JobContainer - Job set Channel-Number to 1 channels.2023-06-14 13:09:32.865 [job-0] INFO  JobContainer - DataX Reader.Job [streamreader] splits to [1] tasks.2023-06-14 13:09:32.865 [job-0] INFO  JobContainer - DataX Writer.Job [streamwriter] splits to [1] tasks.2023-06-14 13:09:32.889 [job-0] INFO  JobContainer - jobContainer starts to do schedule ...2023-06-14 13:09:32.892 [job-0] INFO  JobContainer - Scheduler starts [1] taskGroups.2023-06-14 13:09:32.893 [job-0] INFO  JobContainer - Running by standalone Mode.2023-06-14 13:09:32.900 [taskGroup-0] INFO  TaskGroupContainer - taskGroupId=[0] start [1] channels for [1] tasks.2023-06-14 13:09:32.904 [taskGroup-0] INFO  Channel - Channel set byte_speed_limit to -1, No bps activated.2023-06-14 13:09:32.904 [taskGroup-0] INFO  Channel - Channel set record_speed_limit to -1, No tps activated.2023-06-14 13:09:32.913 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[0] attemptCount[1] is started2023-06-14 13:09:33.240 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[0] is successed, used[328]ms2023-06-14 13:09:33.241 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] completed it's tasks.2023-06-14 13:09:42.913 [job-0] INFO  StandAloneJobContainerCommunicator - Total 100000 records, 2600000 bytes | Speed 253.91KB/s, 10000 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 0.018s |  All Task WaitReaderTime 0.026s | Percentage 100.00%2023-06-14 13:09:42.914 [job-0] INFO  AbstractScheduler - Scheduler accomplished all tasks.2023-06-14 13:09:42.914 [job-0] INFO  JobContainer - DataX Writer.Job [streamwriter] do post work.2023-06-14 13:09:42.914 [job-0] INFO  JobContainer - DataX Reader.Job [streamreader] do post work.2023-06-14 13:09:42.915 [job-0] INFO  JobContainer - DataX jobId [0] completed successfully.2023-06-14 13:09:42.915 [job-0] INFO  HookInvoker - No hook invoked, because base dir not exists or is a file: D:\datax\hook2023-06-14 13:09:42.917 [job-0] INFO  JobContainer -         [total cpu info] =>                averageCpu                     | maxDeltaCpu                    | minDeltaCpu                -1.00%                         | -1.00%                         | -1.00%         [total gc info] =>                 NAME                 | totalGCCount       | maxDeltaGCCount    | minDeltaGCCount    | totalGCTime        | maxDeltaGCTime     | minDeltaGCTime                 PS MarkSweep         | 0                  | 0                  | 0                  | 0.000s             | 0.000s             | 0.000s                 PS Scavenge          | 0                  | 0                  | 0                  | 0.000s             | 0.000s             | 0.000s2023-06-14 13:09:42.917 [job-0] INFO  JobContainer - PerfTrace not enable!2023-06-14 13:09:42.917 [job-0] INFO  StandAloneJobContainerCommunicator - Total 100000 records, 2600000 bytes | Speed 253.91KB/s, 10000 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 0.018s |  All Task WaitReaderTime 0.026s | Percentage 100.00%2023-06-14 13:09:42.918 [job-0] INFO  JobContainer -任务启动时刻                    : 2023-06-14 13:09:32任务结束时刻                    : 2023-06-14 13:09:42任务总计耗时                    :                 10s任务平均流量                    :          253.91KB/s记录写入速度                    :          10000rec/s读出记录总数                    :              100000读写失败总数                    :                   0

到此,安装完毕。

数据迁移:

MySQL_A 数据库 对 MySQL_B数据库 做数据迁移示例:

MySQL_A 和 MySQL_B 需要提前有相同的数据库表、字段、和字段类型。(如果是其它数据库之间的数据对接,请大家自行百度各数据库之间数据类型映射关系。)

题外话:这里简单的列举下MySQL和达梦数据库的数据类型映射关系,大家仅供参考

MySQL 数据类型达梦数据库数据类型
INTINT
VARCHARVARCHAR
TEXTCLOB
DATETIMETIMESTAMP
DECIMALDECIMAL
TINYINTSMALLINT

回归正题,MySQL_A 和 MySQL_B 都需要有如下 sys_role 表:

CREATE TABLE `sys_role` (
  `id` int(11) NOT NULL AUTO_INCREMENT COMMENT 'id-主键 自增长',
  `name` varchar(255) NOT NULL COMMENT '角色名称',
  `create_user_id` int(11) NOT NULL DEFAULT '0' COMMENT '创建人id',
  `create_time` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT '创建时间',
  `update_user_id` int(11) DEFAULT NULL COMMENT '修改人id',
  `update_time` datetime DEFAULT NULL COMMENT '修改时间',
  `remark` varchar(255) DEFAULT NULL COMMENT '备注',
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=utf8mb4;

假设 MySQL_A 有如下数据,需要向B做数据迁移

 开始用Datax做数据迁移:

进入到datax安装目录的job文件夹下,我的是在D:\datax\job

文件夹内有一个job示例文件,我们参考示例文件新建一个job,此处示例为新建 sys_role.json,文件内容如下:

{  "job": {    "content": [      {        "reader": {          "name": "mysqlreader",                     // 数据源读取插件名,使用MySQL作为数据源          "parameter": {            "username": "root",                       // MySQL用户名            "password": "root",                       // MySQL密码            "column": ["*"],                           // 需要读取的字段列表,这里表示读取所有字段            "connection": [              {                "jdbcUrl": ["jdbc:mysql://localhost:3306/Mysql_A"],   // MySQL连接URL,指定要读取的数据库为Mysql_A                "table": ["sys_role"]                   // 要读取的表名为sys_role              }            ]          }        },        "writer": {          "name": "mysqlwriter",                      // 数据目标写入插件名,使用MySQL作为数据目标          "parameter": {            "username": "root",                        // MySQL用户名            "password": "root",                        // MySQL密码            "column": ["*"],// 需要写入的字段列表,这里表示写入所有字段            "preSql": ["DELETE FROM sys_role"],         // 在写入数据前执行的SQL语句,此处表示先删除sys_role表中的数据            "connection": [              {                "jdbcUrl": "jdbc:mysql://localhost:3306/Mysql_B",  // MySQL连接URL,指定要写入的数据库为Mysql_B                "table": ["sys_role"],                   // 要写入的表名为sys_role                "postSql": [],// 在写入数据后执行的SQL语句,此处为空                "column": ["id", "name", "create_user_id", "create_time","update_user_id", "update_time", "remark"]  // 指定要写入的字段列表              }            ]          }        }      }    ],    "setting": {      "speed": {        "channel": 3      // 并发通道数,表示同时读取和写入的任务数      },      "errorLimit": {        "record": 0,      // 允许的最大错误记录数,0表示不允许出现错误记录        "percentage": 0.02// 允许的最大错误记录百分比,0.02表示最大允许2%的错误记录      }    }  }}

执行如下命令,进行数据迁移

python ..\bin\datax.py .\sys_role.json

成功执行后的结果示例:

PS D:\datax\bin> python .\datax.py ..\job\job.jsonDataX (DATAX-OPENSOURCE-3.0), From Alibaba !Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved.2023-06-14 13:09:32.800 [main] INFO  MessageSource - JVM TimeZone: GMT+08:00, Locale: zh_CN2023-06-14 13:09:32.803 [main] INFO  MessageSource - use Locale: zh_CN timeZone: sun.util.calendar.ZoneInfo[id="GMT+08:00",offset=28800000,dstSavings=0,useDaylight=false,transitions=0,lastRule=null]2023-06-14 13:09:32.813 [main] INFO  VMInfo - VMInfo# operatingSystem class => sun.management.OperatingSystemImpl2023-06-14 13:09:32.818 [main] INFO  Engine - the machine info  =>        osInfo: Oracle Corporation 1.8 25.361-b09        jvmInfo:        Windows 10 amd64 10.0        cpu num:        8        totalPhysicalMemory:    -0.00G        freePhysicalMemory:     -0.00G        maxFileDescriptorCount: -1        currentOpenFileDescriptorCount: -1        GC Names        [PS MarkSweep, PS Scavenge]        MEMORY_NAME                    | allocation_size                | init_size        PS Eden Space                  | 256.00MB                       | 256.00MB        Code Cache                     | 240.00MB                       | 2.44MB        Compressed Class Space         | 1,024.00MB                     | 0.00MB        PS Survivor Space              | 42.50MB                        | 42.50MB        PS Old Gen                     | 683.00MB                       | 683.00MB        Metaspace                      | -0.00MB                        | 0.00MB2023-06-14 13:09:32.828 [main] INFO  Engine -{        "setting":{                "speed":{                        "channel":1                },                "errorLimit":{                        "record":0,                        "percentage":0.02                }        },        "content":[                {                        "reader":{    "name":"streamreader",    "parameter":{            "column":[                    {"value":"DataX","type":"string"                    },                    {"value":19890604,"type":"long"                    },                    {"value":"1989-06-04 00:00:00","type":"date"                    },                    {"value":true,"type":"bool"                    },                    {"value":"test","type":"bytes"                    }            ],            "sliceRecordCount":100000    }                        },                        "writer":{    "name":"streamwriter",    "parameter":{            "print":false,            "encoding":"UTF-8"    }                        }                }        ]}2023-06-14 13:09:32.846 [main] INFO  PerfTrace - PerfTrace traceId=job_-1, isEnable=false2023-06-14 13:09:32.846 [main] INFO  JobContainer - DataX jobContainer starts job.2023-06-14 13:09:32.847 [main] INFO  JobContainer - Set jobId = 02023-06-14 13:09:32.863 [job-0] INFO  JobContainer - jobContainer starts to do prepare ...2023-06-14 13:09:32.864 [job-0] INFO  JobContainer - DataX Reader.Job [streamreader] do prepare work .2023-06-14 13:09:32.864 [job-0] INFO  JobContainer - DataX Writer.Job [streamwriter] do prepare work .2023-06-14 13:09:32.864 [job-0] INFO  JobContainer - jobContainer starts to do split ...2023-06-14 13:09:32.865 [job-0] INFO  JobContainer - Job set Channel-Number to 1 channels.2023-06-14 13:09:32.865 [job-0] INFO  JobContainer - DataX Reader.Job [streamreader] splits to [1] tasks.2023-06-14 13:09:32.865 [job-0] INFO  JobContainer - DataX Writer.Job [streamwriter] splits to [1] tasks.2023-06-14 13:09:32.889 [job-0] INFO  JobContainer - jobContainer starts to do schedule ...2023-06-14 13:09:32.892 [job-0] INFO  JobContainer - Scheduler starts [1] taskGroups.2023-06-14 13:09:32.893 [job-0] INFO  JobContainer - Running by standalone Mode.2023-06-14 13:09:32.900 [taskGroup-0] INFO  TaskGroupContainer - taskGroupId=[0] start [1] channels for [1] tasks.2023-06-14 13:09:32.904 [taskGroup-0] INFO  Channel - Channel set byte_speed_limit to -1, No bps activated.2023-06-14 13:09:32.904 [taskGroup-0] INFO  Channel - Channel set record_speed_limit to -1, No tps activated.2023-06-14 13:09:32.913 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[0] attemptCount[1] is started2023-06-14 13:09:33.240 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[0] is successed, used[328]ms2023-06-14 13:09:33.241 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] completed it's tasks.2023-06-14 13:09:42.913 [job-0] INFO  StandAloneJobContainerCommunicator - Total 100000 records, 2600000 bytes | Speed 253.91KB/s, 10000 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 0.018s |  All Task WaitReaderTime 0.026s | Percentage 100.00%2023-06-14 13:09:42.914 [job-0] INFO  AbstractScheduler - Scheduler accomplished all tasks.2023-06-14 13:09:42.914 [job-0] INFO  JobContainer - DataX Writer.Job [streamwriter] do post work.2023-06-14 13:09:42.914 [job-0] INFO  JobContainer - DataX Reader.Job [streamreader] do post work.2023-06-14 13:09:42.915 [job-0] INFO  JobContainer - DataX jobId [0] completed successfully.2023-06-14 13:09:42.915 [job-0] INFO  HookInvoker - No hook invoked, because base dir not exists or is a file: D:\datax\hook2023-06-14 13:09:42.917 [job-0] INFO  JobContainer -         [total cpu info] =>                averageCpu                     | maxDeltaCpu                    | minDeltaCpu                -1.00%                         | -1.00%                         | -1.00%         [total gc info] =>                 NAME                 | totalGCCount       | maxDeltaGCCount    | minDeltaGCCount    | totalGCTime        | maxDeltaGCTime     | minDeltaGCTime                 PS MarkSweep         | 0                  | 0                  | 0                  | 0.000s             | 0.000s             | 0.000s                 PS Scavenge          | 0                  | 0                  | 0                  | 0.000s             | 0.000s             | 0.000s2023-06-14 13:09:42.917 [job-0] INFO  JobContainer - PerfTrace not enable!2023-06-14 13:09:42.917 [job-0] INFO  StandAloneJobContainerCommunicator - Total 100000 records, 2600000 bytes | Speed 253.91KB/s, 10000 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 0.018s |  All Task WaitReaderTime 0.026s | Percentage 100.00%2023-06-14 13:09:42.918 [job-0] INFO  JobContainer -任务启动时刻                    : 2023-06-14 13:09:32任务结束时刻                    : 2023-06-14 13:09:42任务总计耗时                    :                 10s任务平均流量                    :          253.91KB/s记录写入速度                    :          10000rec/s读出记录总数                    :              100000读写失败总数                    :                   0

此时,MySQL_B服务,也有了相同的数据,迁移成功。

如果中间发生错误,会对整个事务进行回滚。所以,如果有多个表需要同步,建议拆成多个xxx.json 文件去配置同步。

来源地址:https://blog.csdn.net/qq_38164123/article/details/131205621

免责声明:

① 本站未注明“稿件来源”的信息均来自网络整理。其文字、图片和音视频稿件的所属权归原作者所有。本站收集整理出于非商业性的教育和科研之目的,并不意味着本站赞同其观点或证实其内容的真实性。仅作为临时的测试数据,供内部测试之用。本站并未授权任何人以任何方式主动获取本站任何信息。

② 本站未注明“稿件来源”的临时测试数据将在测试完成后最终做删除处理。有问题或投稿请发送至: 邮箱/279061341@qq.com QQ/279061341

DataX 3.0 在Windows下基于MySQL做数据迁移示例

下载Word文档到电脑,方便收藏和打印~

下载Word文档

编程热搜

目录