建表
创建一个新表,想让他的字符集是 gbk,怎么弄?
尝试1:
失败!原因:
set names gbk; 等价于:
set character_set_client = gbk;
set character_set_connection = gbk;
set character_set_results = gbk;
尝试2:
失败!原因:
charactor_set_server 控制的是创建的 database 字符集。
建表字符集受控于 character_set_database。
改用下面的方式就可以:控制 db 的字符集,间接实现控制表的字符集。
尝试3
成功!
原因:character_set_database 能改变当前数据库的字符集,影响建表字符集。
数据收发
下面三个变量控制连接字符集
set character_set_client = gbk;
set character_set_connection = gbk;
set character_set_results = gbk;
表示:客户端使用的字符集、server 使用的字符集、结果集使用的字符集。
如果三者都是一样,可以用 set names gbk
来简写。
疑问:character_set_client 和 character_set_results 可以不一样?
client 是utf8(表示它发来的query字符串的字符集是utf8)
result 是 gbk (表示query执行结果的字符集会转成 gbk)
client 发 utf8,收 gbk,实际有这种场景?
**回答:**有!比如你要写一个 transfer data as it is 的程序,你就不会去转换字符集。PHP 程序里,character_set_client 设置成 UTF8,去查数据库里不同表,不同表的字符集不同,character_set_results 可以设置为 NULL,这样可以把不同表的数据“原汁原味”地返回给客户端。至于外面的客户端如何用,他们自有办法(比如,原样存入文件给客户下载)。 参考这篇文章(要梯子)
操作接口
允许直接设置数字!(是 collation 的值,不是 charset!)
mysql> SET @@character_set_client = 48;
Query OK, 0 rows affected (0.01 sec)
mysql> SET @@character_set_client = 2;
Query OK, 0 rows affected (0.00 sec)
mysql> show variables like 'character_set_client';
+----------------------+--------+
| Variable_name | Value |
+----------------------+--------+
| character_set_client | latin2 |
+----------------------+--------+
1 row in set (0.03 sec)
mysql> SET @@character_set_client = 48;
Query OK, 0 rows affected (0.00 sec)
mysql> show variables like 'character_set_client';
+----------------------+--------+
| Variable_name | Value |
+----------------------+--------+
| character_set_client | latin1 |
+----------------------+--------+
1 row in set (0.00 sec)
mysql> SET @@character_set_client = 45;
Query OK, 0 rows affected (0.00 sec)
mysql> show variables like 'character_set_client';
+----------------------+---------+
| Variable_name | Value |
+----------------------+---------+
| character_set_client | utf8mb4 |
+----------------------+---------+
1 row in set (0.00 sec)
mysql> SET @@character_set_client = 90;
ERROR 1231 (42000): Variable 'character_set_client' can't be set to the value of '90'
mysql> SET @@character_set_client = 0;
ERROR 1115 (42000): Unknown character set: '0'
mysql> SHOW COLLATION;
+--------------------------+----------+-----+---------+----------+---------+
| Collation | Charset | Id | Default | Compiled | Sortlen |
+--------------------------+----------+-----+---------+----------+---------+
| big5_chinese_ci | big5 | 1 | Yes | Yes | 1 |
| big5_bin | big5 | 84 | | Yes | 1 |
| dec8_swedish_ci | dec8 | 3 | Yes | Yes | 1 |
| dec8_bin | dec8 | 69 | | Yes | 1 |
| cp850_general_ci | cp850 | 4 | Yes | Yes | 1 |
| cp850_bin | cp850 | 80 | | Yes | 1 |
| hp8_english_ci | hp8 | 6 | Yes | Yes | 1 |
| hp8_bin | hp8 | 72 | | Yes | 1 |
| koi8r_general_ci | koi8r | 7 | Yes | Yes | 1 |
| koi8r_bin | koi8r | 74 | | Yes | 1 |
| latin1_german1_ci | latin1 | 5 | | Yes | 1 |
| latin1_swedish_ci | latin1 | 8 | Yes | Yes | 1 |
| latin1_danish_ci | latin1 | 15 | | Yes | 1 |
| latin1_german2_ci | latin1 | 31 | | Yes | 2 |
| latin1_bin | latin1 | 47 | | Yes | 1 |
| latin1_general_ci | latin1 | 48 | | Yes | 1 |
| latin1_general_cs | latin1 | 49 | | Yes | 1 |
| latin1_spanish_ci | latin1 | 94 | | Yes | 1 |
| latin2_czech_cs | latin2 | 2 | | Yes | 4 |
| latin2_general_ci | latin2 | 9 | Yes | Yes | 1 |
| latin2_hungarian_ci | latin2 | 21 | | Yes | 1 |
| latin2_croatian_ci | latin2 | 27 | | Yes | 1 |
| latin2_bin | latin2 | 77 | | Yes | 1 |
| swe7_swedish_ci | swe7 | 10 | Yes | Yes | 1 |
| swe7_bin | swe7 | 82 | | Yes | 1 |
| ascii_general_ci | ascii | 11 | Yes | Yes | 1 |
| ascii_bin | ascii | 65 | | Yes | 1 |
| ujis_japanese_ci | ujis | 12 | Yes | Yes | 1 |
| ujis_bin | ujis | 91 | | Yes | 1 |
| sjis_japanese_ci | sjis | 13 | Yes | Yes | 1 |
| sjis_bin | sjis | 88 | | Yes | 1 |
| hebrew_general_ci | hebrew | 16 | Yes | Yes | 1 |
| hebrew_bin | hebrew | 71 | | Yes | 1 |
| tis620_thai_ci | tis620 | 18 | Yes | Yes | 4 |
| tis620_bin | tis620 | 89 | | Yes | 1 |
| euckr_korean_ci | euckr | 19 | Yes | Yes | 1 |
| euckr_bin | euckr | 85 | | Yes | 1 |
| koi8u_general_ci | koi8u | 22 | Yes | Yes | 1 |
| koi8u_bin | koi8u | 75 | | Yes | 1 |
| gb2312_chinese_ci | gb2312 | 24 | Yes | Yes | 1 |
| gb2312_bin | gb2312 | 86 | | Yes | 1 |
| greek_general_ci | greek | 25 | Yes | Yes | 1 |
| greek_bin | greek | 70 | | Yes | 1 |
| cp1250_general_ci | cp1250 | 26 | Yes | Yes | 1 |
| cp1250_czech_cs | cp1250 | 34 | | Yes | 2 |
| cp1250_croatian_ci | cp1250 | 44 | | Yes | 1 |
| cp1250_bin | cp1250 | 66 | | Yes | 1 |
| cp1250_polish_ci | cp1250 | 99 | | Yes | 1 |
| gbk_chinese_ci | gbk | 28 | Yes | Yes | 1 |
| gbk_bin | gbk | 87 | | Yes | 1 |
| latin5_turkish_ci | latin5 | 30 | Yes | Yes | 1 |
| latin5_bin | latin5 | 78 | | Yes | 1 |
| armscii8_general_ci | armscii8 | 32 | Yes | Yes | 1 |
| armscii8_bin | armscii8 | 64 | | Yes | 1 |
| utf8_general_ci | utf8 | 33 | Yes | Yes | 1 |
| utf8_bin | utf8 | 83 | | Yes | 1 |
| utf8_unicode_ci | utf8 | 192 | | Yes | 8 |
| utf8_icelandic_ci | utf8 | 193 | | Yes | 8 |
| utf8_latvian_ci | utf8 | 194 | | Yes | 8 |
| utf8_romanian_ci | utf8 | 195 | | Yes | 8 |
| utf8_slovenian_ci | utf8 | 196 | | Yes | 8 |
| utf8_polish_ci | utf8 | 197 | | Yes | 8 |
| utf8_estonian_ci | utf8 | 198 | | Yes | 8 |
| utf8_spanish_ci | utf8 | 199 | | Yes | 8 |
| utf8_swedish_ci | utf8 | 200 | | Yes | 8 |
| utf8_turkish_ci | utf8 | 201 | | Yes | 8 |
| utf8_czech_ci | utf8 | 202 | | Yes | 8 |
| utf8_danish_ci | utf8 | 203 | | Yes | 8 |
| utf8_lithuanian_ci | utf8 | 204 | | Yes | 8 |
| utf8_slovak_ci | utf8 | 205 | | Yes | 8 |
| utf8_spanish2_ci | utf8 | 206 | | Yes | 8 |
| utf8_roman_ci | utf8 | 207 | | Yes | 8 |
| utf8_persian_ci | utf8 | 208 | | Yes | 8 |
| utf8_esperanto_ci | utf8 | 209 | | Yes | 8 |
| utf8_hungarian_ci | utf8 | 210 | | Yes | 8 |
| utf8_sinhala_ci | utf8 | 211 | | Yes | 8 |
| utf8_general_mysql500_ci | utf8 | 223 | | Yes | 1 |
| ucs2_general_ci | ucs2 | 35 | Yes | Yes | 1 |
| ucs2_bin | ucs2 | 90 | | Yes | 1 |
| ucs2_unicode_ci | ucs2 | 128 | | Yes | 8 |
| ucs2_icelandic_ci | ucs2 | 129 | | Yes | 8 |
| ucs2_latvian_ci | ucs2 | 130 | | Yes | 8 |
| ucs2_romanian_ci | ucs2 | 131 | | Yes | 8 |
| ucs2_slovenian_ci | ucs2 | 132 | | Yes | 8 |
| ucs2_polish_ci | ucs2 | 133 | | Yes | 8 |
| ucs2_estonian_ci | ucs2 | 134 | | Yes | 8 |
| ucs2_spanish_ci | ucs2 | 135 | | Yes | 8 |
| ucs2_swedish_ci | ucs2 | 136 | | Yes | 8 |
| ucs2_turkish_ci | ucs2 | 137 | | Yes | 8 |
| ucs2_czech_ci | ucs2 | 138 | | Yes | 8 |
| ucs2_danish_ci | ucs2 | 139 | | Yes | 8 |
| ucs2_lithuanian_ci | ucs2 | 140 | | Yes | 8 |
| ucs2_slovak_ci | ucs2 | 141 | | Yes | 8 |
| ucs2_spanish2_ci | ucs2 | 142 | | Yes | 8 |
| ucs2_roman_ci | ucs2 | 143 | | Yes | 8 |
| ucs2_persian_ci | ucs2 | 144 | | Yes | 8 |
| ucs2_esperanto_ci | ucs2 | 145 | | Yes | 8 |
| ucs2_hungarian_ci | ucs2 | 146 | | Yes | 8 |
| ucs2_sinhala_ci | ucs2 | 147 | | Yes | 8 |
| ucs2_general_mysql500_ci | ucs2 | 159 | | Yes | 1 |
| cp866_general_ci | cp866 | 36 | Yes | Yes | 1 |
| cp866_bin | cp866 | 68 | | Yes | 1 |
| keybcs2_general_ci | keybcs2 | 37 | Yes | Yes | 1 |
| keybcs2_bin | keybcs2 | 73 | | Yes | 1 |
| macce_general_ci | macce | 38 | Yes | Yes | 1 |
| macce_bin | macce | 43 | | Yes | 1 |
| macroman_general_ci | macroman | 39 | Yes | Yes | 1 |
| macroman_bin | macroman | 53 | | Yes | 1 |
| cp852_general_ci | cp852 | 40 | Yes | Yes | 1 |
| cp852_bin | cp852 | 81 | | Yes | 1 |
| latin7_estonian_cs | latin7 | 20 | | Yes | 1 |
| latin7_general_ci | latin7 | 41 | Yes | Yes | 1 |
| latin7_general_cs | latin7 | 42 | | Yes | 1 |
| latin7_bin | latin7 | 79 | | Yes | 1 |
| utf8mb4_general_ci | utf8mb4 | 45 | Yes | Yes | 1 |
| utf8mb4_bin | utf8mb4 | 46 | | Yes | 1 |
| utf8mb4_unicode_ci | utf8mb4 | 224 | | Yes | 8 |
| utf8mb4_icelandic_ci | utf8mb4 | 225 | | Yes | 8 |
| utf8mb4_latvian_ci | utf8mb4 | 226 | | Yes | 8 |
| utf8mb4_romanian_ci | utf8mb4 | 227 | | Yes | 8 |
| utf8mb4_slovenian_ci | utf8mb4 | 228 | | Yes | 8 |
| utf8mb4_polish_ci | utf8mb4 | 229 | | Yes | 8 |
| utf8mb4_estonian_ci | utf8mb4 | 230 | | Yes | 8 |
| utf8mb4_spanish_ci | utf8mb4 | 231 | | Yes | 8 |
| utf8mb4_swedish_ci | utf8mb4 | 232 | | Yes | 8 |
| utf8mb4_turkish_ci | utf8mb4 | 233 | | Yes | 8 |
| utf8mb4_czech_ci | utf8mb4 | 234 | | Yes | 8 |
| utf8mb4_danish_ci | utf8mb4 | 235 | | Yes | 8 |
| utf8mb4_lithuanian_ci | utf8mb4 | 236 | | Yes | 8 |
| utf8mb4_slovak_ci | utf8mb4 | 237 | | Yes | 8 |
| utf8mb4_spanish2_ci | utf8mb4 | 238 | | Yes | 8 |
| utf8mb4_roman_ci | utf8mb4 | 239 | | Yes | 8 |
| utf8mb4_persian_ci | utf8mb4 | 240 | | Yes | 8 |
| utf8mb4_esperanto_ci | utf8mb4 | 241 | | Yes | 8 |
| utf8mb4_hungarian_ci | utf8mb4 | 242 | | Yes | 8 |
| utf8mb4_sinhala_ci | utf8mb4 | 243 | | Yes | 8 |
| cp1251_bulgarian_ci | cp1251 | 14 | | Yes | 1 |
| cp1251_ukrainian_ci | cp1251 | 23 | | Yes | 1 |
| cp1251_bin | cp1251 | 50 | | Yes | 1 |
| cp1251_general_ci | cp1251 | 51 | Yes | Yes | 1 |
| cp1251_general_cs | cp1251 | 52 | | Yes | 1 |
| utf16_general_ci | utf16 | 54 | Yes | Yes | 1 |
| utf16_bin | utf16 | 55 | | Yes | 1 |
| utf16_unicode_ci | utf16 | 101 | | Yes | 8 |
| utf16_icelandic_ci | utf16 | 102 | | Yes | 8 |
| utf16_latvian_ci | utf16 | 103 | | Yes | 8 |
| utf16_romanian_ci | utf16 | 104 | | Yes | 8 |
| utf16_slovenian_ci | utf16 | 105 | | Yes | 8 |
| utf16_polish_ci | utf16 | 106 | | Yes | 8 |
| utf16_estonian_ci | utf16 | 107 | | Yes | 8 |
| utf16_spanish_ci | utf16 | 108 | | Yes | 8 |
| utf16_swedish_ci | utf16 | 109 | | Yes | 8 |
| utf16_turkish_ci | utf16 | 110 | | Yes | 8 |
| utf16_czech_ci | utf16 | 111 | | Yes | 8 |
| utf16_danish_ci | utf16 | 112 | | Yes | 8 |
| utf16_lithuanian_ci | utf16 | 113 | | Yes | 8 |
| utf16_slovak_ci | utf16 | 114 | | Yes | 8 |
| utf16_spanish2_ci | utf16 | 115 | | Yes | 8 |
| utf16_roman_ci | utf16 | 116 | | Yes | 8 |
| utf16_persian_ci | utf16 | 117 | | Yes | 8 |
| utf16_esperanto_ci | utf16 | 118 | | Yes | 8 |
| utf16_hungarian_ci | utf16 | 119 | | Yes | 8 |
| utf16_sinhala_ci | utf16 | 120 | | Yes | 8 |
| cp1256_general_ci | cp1256 | 57 | Yes | Yes | 1 |
| cp1256_bin | cp1256 | 67 | | Yes | 1 |
| cp1257_lithuanian_ci | cp1257 | 29 | | Yes | 1 |
| cp1257_bin | cp1257 | 58 | | Yes | 1 |
| cp1257_general_ci | cp1257 | 59 | Yes | Yes | 1 |
| utf32_general_ci | utf32 | 60 | Yes | Yes | 1 |
| utf32_bin | utf32 | 61 | | Yes | 1 |
| utf32_unicode_ci | utf32 | 160 | | Yes | 8 |
| utf32_icelandic_ci | utf32 | 161 | | Yes | 8 |
| utf32_latvian_ci | utf32 | 162 | | Yes | 8 |
| utf32_romanian_ci | utf32 | 163 | | Yes | 8 |
| utf32_slovenian_ci | utf32 | 164 | | Yes | 8 |
| utf32_polish_ci | utf32 | 165 | | Yes | 8 |
| utf32_estonian_ci | utf32 | 166 | | Yes | 8 |
| utf32_spanish_ci | utf32 | 167 | | Yes | 8 |
| utf32_swedish_ci | utf32 | 168 | | Yes | 8 |
| utf32_turkish_ci | utf32 | 169 | | Yes | 8 |
| utf32_czech_ci | utf32 | 170 | | Yes | 8 |
| utf32_danish_ci | utf32 | 171 | | Yes | 8 |
| utf32_lithuanian_ci | utf32 | 172 | | Yes | 8 |
| utf32_slovak_ci | utf32 | 173 | | Yes | 8 |
| utf32_spanish2_ci | utf32 | 174 | | Yes | 8 |
| utf32_roman_ci | utf32 | 175 | | Yes | 8 |
| utf32_persian_ci | utf32 | 176 | | Yes | 8 |
| utf32_esperanto_ci | utf32 | 177 | | Yes | 8 |
| utf32_hungarian_ci | utf32 | 178 | | Yes | 8 |
| utf32_sinhala_ci | utf32 | 179 | | Yes | 8 |
| binary | binary | 63 | Yes | Yes | 1 |
| geostd8_general_ci | geostd8 | 92 | Yes | Yes | 1 |
| geostd8_bin | geostd8 | 93 | | Yes | 1 |
| cp932_japanese_ci | cp932 | 95 | Yes | Yes | 1 |
| cp932_bin | cp932 | 96 | | Yes | 1 |
| eucjpms_japanese_ci | eucjpms | 97 | Yes | Yes | 1 |
| eucjpms_bin | eucjpms | 98 | | Yes | 1 |
+--------------------------+----------+-----+---------+----------+---------+
197 rows in set (0.00 sec)
mysql>
mysql>
mysql> SET @@character_set_client = 3;
Query OK, 0 rows affected (0.00 sec)
mysql> show variables like 'character_set_client';
+----------------------+-------+
| Variable_name | Value |
+----------------------+-------+
| character_set_client | dec8 |
+----------------------+-------+
1 row in set (0.01 sec)
mysql> SET @@character_set_client = 96;
Query OK, 0 rows affected (0.00 sec)
mysql> show variables like 'character_set_client';
+----------------------+-------+
| Variable_name | Value |
+----------------------+-------+
| character_set_client | cp932 |
+----------------------+-------+
1 row in set (0.00 sec)
参考文献
讲明白了 character_set_client 和 character_set_connection 的用途:
https://stackoverflow.com/questions/16082480/what-is-the-purpose-of-character-set-connection
参考资料
character_set_server and collation_server
- The character_set_server and collation_server system variables indicate the server character set and collation. See Section 10.3.2, “Server Character Set and Collation”.
- The character_set_database and collation_database system variables indicate the character set and collation of the default database. See Section 10.3.3, “Database Character Set and Collation”.
These session system variable values are initialized at connect time, but can be changed within the session. The server takes the character_set_client system variable to be the character set in which statements are sent by the client.Some character sets cannot be used as the client character set. See Impermissible Client Character Sets.
What character set should the server translate statements to after receiving them?
- To determine this, the server uses the character_set_connection and collation_connection system variables:
- The server converts statements sent by the client from character_set_client to character_set_connection. Exception: For string literals that have an introducer such as _utf8mb4 or _latin2, the introducer determines the character set. See Section 10.3.8, “Character Set Introducers”.
- collation_connection is important for comparisons of literal strings. For comparisons of strings with column values, collation_connection does not matter because columns have their own collation, which has a higher collation precedence (see Section 10.8.4, “Collation Coercibility in Expressions”).
What character set should the server translate query results to before shipping them back to the client?
The character_set_results system variable indicates the character set in which the server returns query results to the client. This includes result data such as column values, result metadata such as column names, and error messages.
To tell the server to perform no conversion of result sets or error messages, set character_set_results to NULL or binary:
OceanBase(admin@test)>SHOW SESSION VARIABLES LIKE 'character\_set\_%';
+--------------------------+---------+
| Variable_name | Value |
+--------------------------+---------+
| character_set_client | utf8mb4 | <=== 客户端字符集
| character_set_connection | utf8mb4 | <=== observer 收到连接上的请求后,应该把请求转换成什么字符集来处理(observer 字符集)
| character_set_results | utf8mb4 | <=== 结果应该转成什么字符集
| character_set_server | utf8mb4 | <===
| character_set_database | utf8mb4 | <===
| character_set_filesystem | binary |
| character_set_system | utf8mb4 |
+--------------------------+---------+
7 rows in set (0.00 sec)
OceanBase(admin@test)>SHOW SESSION VARIABLES LIKE 'collation\_%';
+----------------------+--------------------+
| Variable_name | Value |
+----------------------+--------------------+
| collation_connection | utf8mb4_general_ci | <=== 连接上的排序方式
| collation_database | utf8mb4_general_ci | <=== database 排序方式?
| collation_server | utf8mb4_general_ci | <=== observer 排序方式?
+----------------------+--------------------+
3 rows in set (0.00 sec)
* mysql test --default-character-set=latin1
mysql> show variables like 'chara%';
+--------------------------+----------------------------------+
| Variable_name | Value |
+--------------------------+----------------------------------+
| character_set_client | latin1 |
| character_set_connection | latin1 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | latin1 |
| character_set_server | utf8mb4 |
| character_set_system | utf8 |
| character_sets_dir | /usr/local/mysql/share/charsets/ |
+--------------------------+----------------------------------+
8 rows in set (0.00 sec)
mysql> show variables like 'coll%';
+----------------------+--------------------+
| Variable_name | Value |
+----------------------+--------------------+
| collation_connection | latin1_swedish_ci |
| collation_database | latin1_swedish_ci |
| collation_server | utf8mb4_general_ci | <=== server 的 collation 还是很坚定的
+----------------------+--------------------+
3 rows in set (0.00 sec)
* mysql test ----default-character-set=utf8 (默认也是 utf8)
mysql> show variables like 'chara%';
+--------------------------+----------------------------------+
| Variable_name | Value |
+--------------------------+----------------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8mb4 |
| character_set_system | utf8 |
| character_sets_dir | /usr/local/mysql/share/charsets/ |
+--------------------------+----------------------------------+
8 rows in set (0.00 sec)
mysql> show variables like 'coll%';
+----------------------+--------------------+
| Variable_name | Value |
+----------------------+--------------------+
| collation_connection | utf8_general_ci |
| collation_database | latin1_swedish_ci |
| collation_server | utf8mb4_general_ci |
+----------------------+--------------------+
3 rows in set (0.00 sec)