站保站

服务市场
  • 网站市场
  • 单机游戏
  • 平台大厅
  • 转让市场
  • 发卡市场
  • 广告市场
  • 下载市场
  • 收录市场
  • 本站平台
    平台客服
    微信Q群



    平台微博/weibo    平台微信/公众号    平台抖音/快手   
    曝光台    保障    地图   
    上传资源 快速赚钱
    站保站    登录      |  注册  |  

    只需一步,快速开始!

     找回密码   |   协议
    热门搜索: 网站开发 App报毒 挖矿源码 代办资质

    挑战在MySQL 8.0+ZFS上创建10亿张表

    • 时间:2019-08-24 07:00 编辑:老叶茶馆_ 来源: 阅读:2926
    • 扫一扫,手机访问
    摘要:

    导读

    作者:Alexander Rubin

    发布时间:2018-10-22

    文章关键字:Big Data, MySQL, MySQL 8.0, ZFS,MySQL Scalability, Scalability

    翻译团队:天一阁(冷锋,神谕,Judy)

    原文:

    https://www.percona.com/blog/2018/10/22/one-billion-tables-in-mysql-8-0-with-zfs/

    提示:本文建议横屏观看,效果更佳


    The short version

    摘要


    I created > one billion InnoDB tables in MySQL 8.0 (tables, not rows) just for fun. Here is the proof: 

    我在 MySQL8.0上创建了10亿+张InnoDB表(注意是表而不是行),如下:

    1. $ mysql -A
    2. Welcome to the MySQL monitor. Commands end with ; or \g.
    3. Your MySQL connection id is 1425329
    4. Server version: 8.0.12 MySQL Community Server - GPL
    5. Copyright (c) 2000, 2018, Oracle and/or its affiliates. All rights reserved.
    6. Oracle is a registered trademark of Oracle Corporation and/or its
    7. affiliates. Other names may be trademarks of their respective
    8. owners.
    9. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
    10. mysql> select count(*) from information_schema.tables;
    11. +------------+
    12. | count(*) |
    13. +------------+
    14. | 1011570298 |
    15. +------------+
    16. 1 row in set (6 hours 57 min 6.31 sec)

    Yes, it took 6 hours and 57 minutes to count them all! 

    是的,它耗费了6小时57分钟去统计表数目!


    Why does anyone need one billion tables?

    谁会需要创建10亿+张表?

    In my previous blog post, I created and tested MySQL 8.0 with 40 million tables (that was a real case study). The One Billion Tables project is not a real world scenario, however. I was challenged by Billion Tables Project (BTP) in PostgreSQL, and decided to repeat it with MySQL, creating 1 billion InnoDB tables.

    在我之前的文章中,我创建和测试了MySQL 8.0上创建4000w张表(这是一个真实的案例)。不过10亿张表不是真实的案例场景,是因为我想挑战下在PG上创建了10亿张表的测试,所以准备在MySQL下创建下10亿张InnoDB表。

    As an aside: I think MySQL 8.0 is the first MySQL version where creating 1 billion InnoDB tables is even practically possible. 

    注:我认为MySQL8.0才是第一个具有创建10亿张InnoDB表可能性的MySQL版本。

    Challenges with one billion InnoDB tables

    挑战10亿张InnoDB表

    Disk space

    磁盘空间

    The first and one of the most important challenges is disk space. InnoDB allocates data pages on disk when creating .ibd files. Without disk level compression we need > 25Tb of disk. The good news: we have ZFS which provides transparent disk compression. Here’s how the disk utilization looks:

    首先面临第一个也是最重要的挑战就是磁盘空间。创建.ibd文件时,InnoDB在磁盘上分配数据页。如果不做磁盘压缩,我们至少需要25T的存储容量。不过好消息是:我们的ZFS提供透明的磁盘压缩。以下是磁盘利用率的表现:

    Actual data (apparent-size): 

    实际大小:

    1. # du -sh --apparent-size /mysqldata/
    2. 26T /mysqldata/

    Compressed data: 

    压缩后:

    1. # du -sh /mysqldata/
    2. 2.4T /mysqldata/

    Compression ratio: 

    压缩率:

    1. # zfs get compression,compressratio
    2. ...
    3. mysqldata/mysql/data compressratio 7.14x -
    4. mysqldata/mysql/data compression gzip inherited from mysqldata/mysql

    (Looks like the compression ratio reported is not 100% correct, we expect ~10x compression ratio.) 

    (看起来报告不是100%准确,我们达到了10倍+的压缩率)


    Too many tiny files

    许多小文件

    This is usually the big issue with databases that create a file per table. With MySQL 8.0 we can create a shared tablespace and “assign” a table to it. I created a tablespace per database, and created 1000 tables in each database.

    为每张表要创建一个表空间文件,这是大问题。不过在MySQL 8.0中,我们可以创建一个通用表空间(General Tablespace)并在创建表时将表”分配“到表空间上。这里我为每个database创建一个通用表空间,每个database上创建了1000张表。

    The result: 

    结果就是:

    1. mysql> select count(*) from information_schema.schema;
    2. +----------+
    3. | count(*) |
    4. +----------+
    5. | 1011575 |
    6. +----------+
    7. 1 row in set (1.31 sec)

    Creating tables

    创建表

    Another big challenge is how to create tables fast enough so it will not take months. I have used three approaches:

    1. Disabled all possible consistency checks in MySQL, and decreased the innodb page size to 4K (these config options are NOT for production use)

    2. Created tables in parallel: as the mutex contention bug in MySQL 8.0 has been fixed, creating tables in parallel works fine.

    3. Use local NVMe cards on top of an AWS ec2 i3.8xlarge instance

    另一个挑战点就是如何快速的创建表从而避免我们要耗费数月的时间。我用了三个锦囊妙计:

    1. 禁用MySQL里面一切可能的一致性检测,减小innodb的page大小为4k(这些配置更改不适合生产环境)

    2. 并发创建表。因为之前MySQL 8.0中的互斥量争用问题已经得到修复,所以并发创建表表现良好。

    3. 在AWS ec2 i3.8 xlarge的实例上使用本地的NVMe卡

    my.cnf config file (I repeat: do not use this in production): 

    my.cnf的配置信息如下(重申一遍:不要直接用在生产上):

    1. [mysqld]
    2. default-authentication-plugin = mysql_native_password
    3. performance_schema=0
    4. datadir=/mysqldata/mysql/data
    5. socket=/mysqldata/mysql/data/mysql.sock
    6. log-error = /mysqldata/mysql/log/error.log
    7. skip-log-bin=1
    8. innodb_log_group_home_dir = /mysqldata/mysql/log/
    9. innodb_doublewrite = 0
    10. innodb_checksum_algorithm=none
    11. innodb_log_checksums=0
    12. innodb_flush_log_at_trx_commit=0
    13. innodb_log_file_size=2G
    14. innodb_buffer_pool_size=100G
    15. innodb_page_size=4k
    16. innodb_flush_method=nosync
    17. innodb_io_capacity_max=20000
    18. innodb_io_capacity=5000
    19. innodb_buffer_pool_instances=32
    20. innodb_stats_persistent = 0
    21. tablespace_definition_cache = 524288
    22. schema_definition_cache = 524288
    23. table_definition_cache = 524288
    24. table_open_cache=524288
    25. table_open_cache_instances=32
    26. open-files-limit=1000000

    ZFS pool:

    1. # zpool status
    2. pool: mysqldata
    3. state: ONLINE
    4. scan: scrub repaired 0B in 1h49m with 0 errors on Sun Oct 14 02:13:17 2018
    5. config:
    6. NAME STATE READ WRITE CKSUM
    7. mysqldata ONLINE 0 0 0
    8. nvme0n1 ONLINE 0 0 0
    9. nvme1n1 ONLINE 0 0 0
    10. nvme2n1 ONLINE 0 0 0
    11. nvme3n1 ONLINE 0 0 0
    12. errors: No known data errors

    A simple “deploy” script to create tables in parallel (includes the sysbench table structure): 

    一个简单的并发创建表的脚本(表结构使用了sysbench里面的表):

    1. #/bin/bash
    2. function do_db {
    3. db_exist=$(mysql -A -s -Nbe "select 1 from information_schema.schemata where schema_name = '$db'")
    4. if [ "$db_exist" == "1" ]; then echo "Already exists: $db"; return 0; fi;
    5. tbspace="create database $db; use $db; CREATE TABLESPACE $db ADD DATAFILE '$db.ibd' engine=InnoDB";
    6. #echo "Tablespace $db.ibd created!"
    7. tables=""
    8. for i in {1..1000}
    9. do
    10. table="CREATE TABLE sbtest$i ( id int(10) unsigned NOT NULL AUTO_INCREMENT, k int(10) unsigned NOT NULL DEFAULT '0', c varchar(120) NOT NULL DEFAULT '', pad varchar(60) NOT NULL DEFAULT '', PRIMARY KEY (id), KEY k_1 (k) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 tablespace $db;"
    11. tables="$tables; $table;"
    12. done
    13. echo "$tbspace;$tables" | mysql
    14. }
    15. c=0
    16. echo "starting..."
    17. c=$(mysql -A -s -Nbe "select max(cast(SUBSTRING_INDEX(schema_name, '_', -1) as unsigned)) from information_schema.schemata where schema_name like 'sbtest_%'")
    18. for m in {1..100000}
    19. do
    20. echo "m=$m"
    21. for i in {1..30}
    22. do
    23. let c=$c+1
    24. echo $c
    25. db="sbtest_$c"
    26. do_db &
    27. done
    28. wait
    29. done

    How fast did we create tables? Here are some stats: 

    我们创建表有多快呢?可以通过下面的状态量观测:

    1. # mysqladmin -i 10 -r ex|grep Com_create_table
    2. ...
    3. | Com_create_table | 6497 |
    4. | Com_create_table | 6449

    So we created ~650 tables per second. The average, above, is per 10 seconds. 

    我们约每秒创建650张表,上面是每10秒创建的表数量。

    Counting the tables

    统计表数量

    It took > 6 hours to do “count(*) from information_schema.tables”! Here is why:

    之前我们通过"count(*) from information_schema.tables"方式查看表数量耗费了6个多小时。因为:

    • MySQL 8.0 uses a new data dictionary (this is great as it avoids creating 1 billion frm files). Everything is stored in this file:

    • MySQL 8.0 使用了一个新的数据字典(这很妙,避免创建10亿个frm文件)。所有的内容都存储在下面这个文件里:

    1. # ls -lah /mysqldata/mysql/data/mysql.ibd
    2. -rw-r----- 1 mysql mysql 6.1T Oct 18 15:02 /mysqldata/mysql/data/mysql.ibd

    • The information_schema.tables is actually a view:

    • information_schema.tables实际上是一个视图:

    1. mysql> show create table information_schema.tables\G
    2. *************************** 1. row ***************************
    3. View: TABLES
    4. Create View: CREATE ALGORITHM=UNDEFINED DEFINER=`mysql.infoschema`@`localhost` SQL SECURITY DEFINER VIEW `information_schema`.`TABLES` AS select `cat`.`name` AS `TABLE_CATALOG`,`sch`.`name` AS `TABLE_SCHEMA`,`tbl`.`name` AS `TABLE_NAME`,`tbl`.`type` AS `TABLE_TYPE`,if((`tbl`.`type` = 'BASE TABLE'),`tbl`.`engine`,NULL) AS `ENGINE`,if((`tbl`.`type` = 'VIEW'),NULL,10) AS `VERSION`,`tbl`.`row_format` AS `ROW_FORMAT`,internal_table_rows(`sch`.`name`,`tbl`.`name`,if(isnull(`tbl`.`partition_type`),`tbl`.`engine`,''),`tbl`.`se_private_id`,(`tbl`.`hidden` <> 'Visible'),`ts`.`se_private_data`,coalesce(`stat`.`table_rows`,0),coalesce(cast(`stat`.`cached_time` as unsigned),0)) AS `TABLE_ROWS`,internal_avg_row_length(`sch`.`name`,`tbl`.`name`,if(isnull(`tbl`.`partition_type`),`tbl`.`engine`,''),`tbl`.`se_private_id`,(`tbl`.`hidden` <> 'Visible'),`ts`.`se_private_data`,coalesce(`stat`.`avg_row_length`,0),coalesce(cast(`stat`.`cached_time` as unsigned),0)) AS `AVG_ROW_LENGTH`,internal_data_length(`sch`.`name`,`tbl`.`name`,if(isnull(`tbl`.`partition_type`),`tbl`.`engine`,''),`tbl`.`se_private_id`,(`tbl`.`hidden` <> 'Visible'),`ts`.`se_private_data`,coalesce(`stat`.`data_length`,0),coalesce(cast(`stat`.`cached_time` as unsigned),0)) AS `DATA_LENGTH`,internal_max_data_length(`sch`.`name`,`tbl`.`name`,if(isnull(`tbl`.`partition_type`),`tbl`.`engine`,''),`tbl`.`se_private_id`,(`tbl`.`hidden` <> 'Visible'),`ts`.`se_private_data`,coalesce(`stat`.`max_data_length`,0),coalesce(cast(`stat`.`cached_time` as unsigned),0)) AS `MAX_DATA_LENGTH`,internal_index_length(`sch`.`name`,`tbl`.`name`,if(isnull(`tbl`.`partition_type`),`tbl`.`engine`,''),`tbl`.`se_private_id`,(`tbl`.`hidden` <> 'Visible'),`ts`.`se_private_data`,coalesce(`stat`.`index_length`,0),coalesce(cast(`stat`.`cached_time` as unsigned),0)) AS `INDEX_LENGTH`,internal_data_free(`sch`.`name`,`tbl`.`name`,if(isnull(`tbl`.`partition_type`),`tbl`.`engine`,''),`tbl`.`se_private_id`,(`tbl`.`hidden` <> 'Visible'),`ts`.`se_private_data`,coalesce(`stat`.`data_free`,0),coalesce(cast(`stat`.`cached_time` as unsigned),0)) AS `DATA_FREE`,internal_auto_increment(`sch`.`name`,`tbl`.`name`,if(isnull(`tbl`.`partition_type`),`tbl`.`engine`,''),`tbl`.`se_private_id`,(`tbl`.`hidden` <> 'Visible'),`ts`.`se_private_data`,coalesce(`stat`.`auto_increment`,0),coalesce(cast(`stat`.`cached_time` as unsigned),0),`tbl`.`se_private_data`) AS `AUTO_INCREMENT`,`tbl`.`created` AS `CREATE_TIME`,internal_update_time(`sch`.`name`,`tbl`.`name`,if(isnull(`tbl`.`partition_type`),`tbl`.`engine`,''),`tbl`.`se_private_id`,(`tbl`.`hidden` <> 'Visible'),`ts`.`se_private_data`,coalesce(cast(`stat`.`update_time` as unsigned),0),coalesce(cast(`stat`.`cached_time` as unsigned),0)) AS `UPDATE_TIME`,internal_check_time(`sch`.`name`,`tbl`.`name`,if(isnull(`tbl`.`partition_type`),`tbl`.`engine`,''),`tbl`.`se_private_id`,(`tbl`.`hidden` <> 'Visible'),`ts`.`se_private_data`,coalesce(cast(`stat`.`check_time` as unsigned),0),coalesce(cast(`stat`.`cached_time` as unsigned),0)) AS `CHECK_TIME`,`col`.`name` AS `TABLE_COLLATION`,internal_checksum(`sch`.`name`,`tbl`.`name`,if(isnull(`tbl`.`partition_type`),`tbl`.`engine`,''),`tbl`.`se_private_id`,(`tbl`.`hidden` <> 'Visible'),`ts`.`se_private_data`,coalesce(`stat`.`checksum`,0),coalesce(cast(`stat`.`cached_time` as unsigned),0)) AS `CHECKSUM`,if((`tbl`.`type` = 'VIEW'),NULL,get_dd_create_options(`tbl`.`options`,if((ifnull(`tbl`.`partition_expression`,'NOT_PART_TBL') = 'NOT_PART_TBL'),0,1))) AS `CREATE_OPTIONS`,internal_get_comment_or_error(`sch`.`name`,`tbl`.`name`,`tbl`.`type`,`tbl`.`options`,`tbl`.`comment`) AS `TABLE_COMMENT` from (((((`mysql`.`tables` `tbl` join `mysql`.`schemata` `sch` on((`tbl`.`schema_id` = `sch`.`id`))) join `mysql`.`catalogs` `cat` on((`cat`.`id` = `sch`.`catalog_id`))) left join `mysql`.`collations` `col` on((`tbl`.`collation_id` = `col`.`id`))) left join `mysql`.`tablespaces` `ts` on((`tbl`.`tablespace_id` = `ts`.`id`))) left join `mysql`.`table_stats` `stat` on(((`tbl`.`name` = `stat`.`table_name`) and (`sch`.`name` = `stat`.`schema_name`)))) where (can_access_table(`sch`.`name`,`tbl`.`name`) and is_visible_dd_object(`tbl`.`hidden`))
    5. character_set_client: utf8
    6. collation_connection: utf8_general_ci

    and the explain plan looks like this: 

    而且通过explain看到它的执行计划如下:

    1. mysql> explain select count(*) from information_schema.tables \G
    2. *************************** 1. row ***************************
    3. id: 1
    4. select_type: SIMPLE
    5. table: cat
    6. partitions: NULL
    7. type: index
    8. possible_keys: PRIMARY
    9. key: name
    10. key_len: 194
    11. ref: NULL
    12. rows: 1
    13. filtered: 100.00
    14. Extra: Using index
    15. *************************** 2. row ***************************
    16. id: 1
    17. select_type: SIMPLE
    18. table: tbl
    19. partitions: NULL
    20. type: ALL
    21. possible_keys: schema_id
    22. key: NULL
    23. key_len: NULL
    24. ref: NULL
    25. rows: 1023387060
    26. filtered: 100.00
    27. Extra: Using where; Using join buffer (Block Nested Loop)
    28. *************************** 3. row ***************************
    29. id: 1
    30. select_type: SIMPLE
    31. table: sch
    32. partitions: NULL
    33. type: eq_ref
    34. possible_keys: PRIMARY,catalog_id
    35. key: PRIMARY
    36. key_len: 8
    37. ref: mysql.tbl.schema_id
    38. rows: 1
    39. filtered: 11.11
    40. Extra: Using where
    41. *************************** 4. row ***************************
    42. id: 1
    43. select_type: SIMPLE
    44. table: stat
    45. partitions: NULL
    46. type: eq_ref
    47. possible_keys: PRIMARY
    48. key: PRIMARY
    49. key_len: 388
    50. ref: mysql.sch.name,mysql.tbl.name
    51. rows: 1
    52. filtered: 100.00
    53. Extra: Using index
    54. *************************** 5. row ***************************
    55. id: 1
    56. select_type: SIMPLE
    57. table: ts
    58. partitions: NULL
    59. type: eq_ref
    60. possible_keys: PRIMARY
    61. key: PRIMARY
    62. key_len: 8
    63. ref: mysql.tbl.tablespace_id
    64. rows: 1
    65. filtered: 100.00
    66. Extra: Using index
    67. *************************** 6. row ***************************
    68. id: 1
    69. select_type: SIMPLE
    70. table: col
    71. partitions: NULL
    72. type: eq_ref
    73. possible_keys: PRIMARY
    74. key: PRIMARY
    75. key_len: 8
    76. ref: mysql.tbl.collation_id
    77. rows: 1
    78. filtered: 100.00
    79. Extra: Using index

    Conclusions

    结论

    1. I have created more than 1 billion real InnoDB tables with indexes in MySQL 8.0, just for fun, and it worked. It took ~2 weeks to create.

    2. Probably MySQL 8.0 is the first version where it is even practically possible to create billion InnoDB tables

    3. ZFS compression together with NVMe cards makes it reasonably cheap to do, for example, by using i3.4xlarge or i3.8xlarge instances on AWS.


    1. 只是因为个人兴趣,我在MySQL 8.0上创建了10亿张InnoDB表和索引,我成功了。它花费了我大约2周的时间。

    2. 大概率MySQL 8.0是MySQL里面第一个支持能够创建10亿张InnoDB表的版本。

    3. ZFS 的压缩再结合NVMe卡,可以降低成本。例如,选择AWS的i3.4xlarge或者i3.8xlarge实例。

    更多译文:

    排除故障指南:MySQL运行内存不足时应采取的措施?

    MySQL 8.0中没FRM文件了还怎么恢复表的DDL信息

    为什么要避免使用“CREATE TABLE AS SELECT”语句

    如何最快恢复逻辑备份

    优化InnoDB主键

    END


    640?wx_fmt=png

    640?wx_fmt=png

    640?wx_fmt=gif

    扫码加入MySQL技术Q群

    (群号:529671799)

       

    640?wx_fmt=jpeg


    • 全部评论(0)
    • 最新

    信息加载中,请等待

    微信客服(速回)

    微信客服(慢回)



    企业微信客服二维码
    联系我们
    平台客服: 平台QQ客服

    平台电话:400电话迁移中!

    平台邮箱:28292383@qq.com

    工作时间:周一至周五:早10:00 晚:18:00

    营业执照     网站ICP备案:鲁ICP备20027607号-1     鲁公网安备:37068702000078号     增值电信业务经营许可证、在线数据与交易处理业务许可证:鲁B2-20200681      © 2016-2024 站保站  https://www.zhanbaozhan.com/ 版权所有!      平台规范:   关于我们   广告合作   隐私条款   免责声明   法律声明   服务条款   网站地图   平台工单!