hive分区上传数据
目录
hive分区上传数据
一、开启HIVE中分区表支持中文字段
二、分区表操作
1.建表语句
2.分区表插入数据
3.查询分区
4.删除分区
5.恢复被删除分区
6.添加分区
7.创建多级分区(插入数据与单级分区类似)
一、开启HIVE中分区表支持中文字段
1.根据HIVE开启分区支持中文.txt中的操作步骤去SQL中执行 修改HIVE元数据库中的编码格式
修改hive database 编码
alter database hive default character set utf8;
修改 table 编码
alter table PARTITIONS default character set utf8;
alter table PARTITION_KEY_VALS default character set utf8;
alter table SDS default character set utf8;
修改 字段 编码
alter table PARTITIONS modify column PART_name varchar(190) character set utf8;
alter table PARTITION_KEY_VALS modify column PART_KEY_VAL varchar(256) character set utf8;
alter table SDS modify column LOCATION varchar(4000) character set utf8;
2.插入数据
insert into table filetest.partition_student PARTITION(gender="女生") select "1500100002","吕金鹏",24,"文科六班";
二、分区表操作
首先 我们先创建一个新的数据库来学习我们的分区表操作
CREATE DATABASE learn2;
use learn2;
1.建表语句
CREATE TABLE IF NOT EXISTS learn2.partition_student
(
id STRING COMMENT "学生ID",
name STRING COMMENT "学生姓名",
age int COMMENT "年龄”,
clazz STRING COMMENT “班级"
) PARTITIONED BY (clazz STRING COMMENT "班级")
ROW FORMAT DELIMITED FIELDS TERMINATED BY ","
STORED AS TEXTFILE;
2.分区表插入数据
1)load data local inpath "本地路径" into table 表名 PARTITION(分区字段 = 值);
load data local inpath "/usr/local/soft/hive-3.1.2/data/文科一班.txt" into table learn2.partition_student PARTITION(clazz="文科一班");
load data local inpath "/usr/local/soft/hive-3.1.2/data/文科二班.txt" into table learn2.partition_student PARTITION(clazz="文科二班");
select * from learn2.partition_student;
2)覆盖原先分区中的数据
load data local inpath "本地路径" overwrite into table 表名 PARTITION(分区字段 = 值);
load data local inpath "/usr/local/soft/hive-3.1.2/data/新文科一班.txt" overwrite into table learn2.partition_student PARTITION(clazz="文科一班");
3)-put 方法上传
dfs -put /usr/local/soft/hive-3.1.2/data/理科一班.txt /user/hive/warehouse/learn2.db/partition_student2/clazz=理科二班/;
4)添加动态分区
set hive.exec.dynamic.partition=true; --设置开启动态分区
set hive.exec.dynamic.partition.mode=nostrict; --设置动态分区的模式为非严格模式
set hive.exec.max.dynamic.partitions.pernode=1000; --设置分区的最大分区数
插入格式:
insert into table 表名 PARTITION(分区字段) select查询语句;
分区规则:默认是查询语句后几列
CREATE TABLE IF NOT EXISTS learn2.partition_student3
(
id STRING COMMENT "学生ID",
name STRING COMMENT "学生姓名",
age int COMMENT "年龄”,
clazz STRING COMMENT “班级"
) PARTITIONED BY (clazz STRING COMMENT "班级")
ROW FORMAT DELIMITED FIELDS TERMINATED BY ","
STORED AS TEXTFILE;
insert into table learn2.partition_student3 PARTITION(clazz) select id,name,age,clazz from learn2.partition_student;
select * from learn2.partition_student3;
注意;如果出现以下错误信息
Dynamic partition strict mode requires at least one static partition column
.To turn this off set hive.exec.dynamic.partition.mode=nonstrict
解决方式:执行开启动态分区设置
set hive.exec.dynamic.partition=true; --设置开启动态分区
set hive.exec.dynamic.partition.mode=nostrict; --设置动态分区的模式为非严格模式
set hive.exec.max.dynamic.partitions.pernode=1000; --设置分区的最大分区数
3.查询分区
show partitions 表名;
show partitions learn2.partition_student;
4.删除分区
1)alter table 表名 drop PARTITION(分区字段 = 值);
alter table learn2.partition_student drop PARTITION(clazz="文科二班");
CREATE EXTERNAL TABLE IF NOT EXISTS learn2.partition_student2
(
id STRING COMMENT "学生ID",
name STRING COMMENT "学生姓名",
age int COMMENT "年龄”,
clazz STRING COMMENT “班级"
) PARTITIONED BY (clazz STRING COMMENT "班级")
ROW FORMAT DELIMITED FIELDS TERMINATED BY ","
STORED AS TEXTFILE;
load data local inpath "/usr/local/soft/hive-3.1.2/data/文科一班.txt" into table learn2.partition_student2 PARTITION(clazz="文科一班");
load data local inpath "/usr/local/soft/hive-3.1.2/data/文科二班.txt" into table learn2.partition_student2 PARTITION(clazz="文科二班");
alter table learn2.partition_student2 drop PARTITION(clazz="文科二班");
show partitions learn2.partition_student2;
注意:如果分区表是外部表 那么删除分区操作只能删除HIVE中的元数据 数据依然存在
2)强制删除分区
dfs -rmr /user/hive/warehouse/learn2.db/partition_student2/clazz="文科二班";
删除后源数据还在 然后我们再删除一下源数据
alter table learn2.partition_student2 drop PARTITION(clazz="文科二班");
5.恢复被删除分区
msck repair table 表名;
6.添加分区
1)alter table 表名 add PARTITION(分区字段 = 值);
alter table learn2.partition_student add PARTITION(clazz="理科一班");
7.创建多级分区(插入数据与单级分区类似)
1)创建表
CREATE TABLE IF NOT EXISTS learn2.partition_student4
(
id STRING COMMENT "学生ID",
name STRING COMMENT "学生姓名",
age int COMMENT "年龄"
) PARTITIONED BY (clazz STRING COMMENT "班级",gender STRING COMMENT "性别")
ROW FORMAT DELIMITED FIELDS TERMINATED BY ","
STORED AS TEXTFILE;
2)查看表结构
desc learn2.partition_student4
3)插入多级分区数据数据
load data local inpath "/usr/local/soft/hive-3.1.2/data/文科一班女.txt" into table learn2.partition_student4 PARTITION(clazz="文科一班",gender="女");
load data local inpath "/usr/local/soft/hive-3.1.2/data/文科二班男.txt" into table learn2.partition_student4 PARTITION(clazz="文科二班",gender="男");
4)show partitions learn2.partition_student4;
5)特点:
a.查询和操作分区时 格式为:clazz=文科一班/gender=女
b.在HDFS上多级分区表现为多层级目录
c.根据具体的数据情况进行区分 尽量将大一级划分放在前