ThinkChat🤖让你学习和工作更高效,注册即送10W Token,即刻开启你的AI之旅 广告
[TOC] ## 语法结构 ~~~ SELECT [ALL | DISTINCT] select_expr, select_expr, ... FROM table_reference [WHERE where_condition] [GROUP BY col_list [HAVING condition]] [CLUSTER BY col_list | [DISTRIBUTE BY col_list] [SORT BY| ORDER BY col_list] ] [LIMIT number] ~~~ ## 案例 ### 查询时创建外部表 ~~~ create external table student_ext(Sno int,Sname string,Sex string,Sage int,Sdept string) row format delimited fields terminated by ',' location '/stu'; ~~~ ### where和分组 ~~~ //where查询 select * from student_ext where sno=95020; //分组 select sex,count(*) from student_ext group by sex; ~~~ ### 分区和排序 ~~~ //分区,排序,但是这个只有1个reduce,没意义,cluster by后面可以跟多个 select * from student_ext cluster by sex; ~~~ ### 设置reduce ~~~ //设置4个reduce //这样每个reduce自己内部会排序 hive> set mapred.reduce.tasks=4; hive> create table tt_1 as select * from student_ext cluster by sno; //查看结果,这个tt_1文件夹下面有4个文件 dfs -cat /user/hive/warehouse/db1.db/tt_1/000000_0; //这个结果和上面一样,分成4个reduce create table tt_2 as select * from student_ext distribute by sno sort by sno; //排序可以按照其他方式排序 create table tt_3 as select * from student_ext distribute by sno sort by sage; ~~~ # 排序 ## 全局排序 order By全局排序,一个reduce asc升序 desc降序 ## sortBy 每个mapreduce内部排序 对全局来说不是排序 ## DistributeBy 分区排序 类似MR中的partition,进行分区,结合sort by使用 注意:hive要求DistributeBy语句要写在sortby语句之前 ## ClusterBy 当DistributeBy和sortBy字段相同时,可以使用ClusterBy方式. **ClusterBy除了DistributeBy的功能外还有sortby功能.但是排序只能是倒排序,不能指定排序规则是ASC和DESC**