HIVE WORDCOUNT
目录
HIVE WORDCOUNT
一、WORDCOUNT
1.我们先创建一个新的数据库
2.创建表并插入数据
3.统计WORDCOUNT
4.UNION ALL 用法
5.WITH AS 用法
1.WORDCOUNT
1)我们先创建一个新的数据库
create database learn3;
use learn3;
2)创建表并插入数据
CREATE TABLE learn3.wordcount(
word STRING COMMENT "单词"
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS TEXTFILE;
INSERT INTO TABLE learn3.wordcount (word) VALUES ("hello,word"),("hello,java"),("hive,hello");
3)统计WORDCOUNT
select split(word,",")[0] clo1
from learn3.wordcount;
select split(word,",")[1] clo2
from learn3.wordcount;
select split(word,",")[0] clo1
,split(word,",")[1] clo2
from learn3.wordcount;
select count(*) as num
,c1.clo1 as clo from
(
select split(word,",")[0] clo1
,split(word,",")[1] clo2
from learn3.wordcount
) c1 group by clo1;
select count(*) as num
,c2.clo2 as clo from
(
select split(word,",")[0] clo1
,split(word,",")[1] clo2
from learn3.wordcount
) c2 group by clo2;
select count(*) as num
,c1.clo1 as clo from
(
select split(word,",")[0] clo1
,split(word,",")[1] clo2
from learn3.wordcount
) c1 group by clo1
UNION ALL
select count(*) as num
,c2.clo2 as clo from
(
select split(word,",")[0] clo1
,split(word,",")[1] clo2
from learn3.wordcount
) c2 group by clo2;
select sum(num)
,c.clo from
(
select count(*) as num
,c1.clo1 as clo from
(
select split(word,",")[0] clo1
,split(word,",")[1] clo2
from learn3.wordcount
) c1 group by clo1
UNION ALL
select count(*) as num
,c2.clo2 as clo from
(
select split(word,",")[0] clo1
,split(word,",")[1] clo2
from learn3.wordcount
) c2 group by clo2) c group by c.clo;
2.UNION ALL 用法
在 Hive 中,`UNION ALL` 用于合并多个查询结果集,包括所有重复的行。
假设我们有两张表 `table1` 和 `table2`,它们具有相同的列结构。现在,我们想要将这两张表中的数据合并成一个结果集,而不去除任何重复的行,就可以使用 `UNION ALL`。
下面是一个简单的示例:
```sql
SELECT * FROM table1
UNION ALL
SELECT * FROM table2;
```
这个查询将会返回 `table1` 和 `table2` 中所有的行,包括重复的行。如果你只想返回不重复的行,可以使用 `UNION` 而不是 `UNION ALL`。
需要注意的是,使用 `UNION ALL` 时,两个查询的列数和数据类型必须完全相同。
3.WITH AS 用法
格式:
WITH table1 AS (
SELECT 查询语句1
)
, table2 AS (
SELECT 查询语句2
)
[INSERT INTO TABLE] SELECT
FROM
WITH split_res AS
(
select split(word,",")[0] as clo1,
split(word,",")[1] as clo2
from learn3.wordcount
)
,c1 AS
(
select clo1 as clo,count(*) as num
from split_res group by clo1
)
,c2 AS
(
select clo2 as clo,count(*) as num
from split_res group by clo2
)
select
c.clo,sum(num)
from
(
select * from c1
UNION ALL
select * from c2
) c group by clo;