徐州市制作网站的公司,铜陵网站建设公司,莱芜雪野湖风景区,如何把自己做的网站放到网上一、目的
当Hive的计算引擎是spark或mr时#xff0c;发现海豚调度HQL任务的脚本并不同#xff0c;mr更简洁
二、Hive的计算引擎是Spark时
#xff08;一#xff09;海豚调度脚本
#! /bin/bash source /etc/profile
nowdatedate --date0 days ago %Y%m%d y…一、目的
当Hive的计算引擎是spark或mr时发现海豚调度HQL任务的脚本并不同mr更简洁
二、Hive的计算引擎是Spark时
一海豚调度脚本
#! /bin/bash source /etc/profile
nowdatedate --date0 days ago %Y%m%d yesdatedate -d yesterday %Y-%m-%d
hive -e use hurys_dc_dwd;
set hive.vectorized.execution.enabledfalse; set hive.auto.convert.joinfalse; set mapreduce.map.memory.mb10150; set mapreduce.map.java.opts-Xmx6144m; set mapreduce.reduce.memory.mb10150; set mapreduce.reduce.java.opts-Xmx8120m; set hive.exec.dynamic.partition.modenonstrict; set hive.exec.dynamic.partitiontrue; set hive.exec.paralleltrue; set hive.support.concurrencyfalse; set mapreduce.map.memory.mb4128; set hive.vectorized.execution.enabledfalse;
set hive.exec.dynamic.partitiontrue; set hive.exec.dynamic.partition.modenonstrict; set hive.exec.max.dynamic.partitions.pernode1000; set hive.exec.max.dynamic.partitions1500;
insert overwrite table dwd_evaluation partition(day$yesdate) select device_no, cycle, lane_num, create_time, lane_no, volume, queue_len_max, sample_num, stop_avg, delay_avg, stop_rate, travel_dist, travel_time_avg from hurys_dc_ods.ods_evaluation where volume is not null and date(create_time) $yesdate group by device_no, cycle, lane_num, create_time, lane_no, volume, queue_len_max, sample_num, stop_avg, delay_avg, stop_rate, travel_dist, travel_time_avg
二任务流执行结果
调度执行成功时间需要1m29s
三、Hive的计算引擎是MR时
一海豚调度脚本
#! /bin/bash source /etc/profile
nowdatedate --date0 days ago %Y%m%d yesdatedate -d yesterday %Y-%m-%d
hive -e use hurys_dc_dwd;
set hive.exec.dynamic.partitiontrue; set hive.exec.dynamic.partition.modenonstrict; set hive.exec.max.dynamic.partitions.pernode1000; set hive.exec.max.dynamic.partitions1500;
insert overwrite table dwd_evaluation partition(day$yesdate) select device_no, cycle, lane_num, create_time, lane_no, volume, queue_len_max, sample_num, stop_avg, delay_avg, stop_rate, travel_dist, travel_time_avg from hurys_dc_ods.ods_evaluation where volume is not null and date(create_time) $yesdate group by device_no, cycle, lane_num, create_time, lane_no, volume, queue_len_max, sample_num, stop_avg, delay_avg, stop_rate, travel_dist, travel_time_avg
二任务流执行结果 调度执行成功时间需要1m3s
四、脚本区别
计算引擎为spark时脚本比计算引擎为mr多而且spark运行速度比mr慢
set hive.vectorized.execution.enabledfalse; set hive.auto.convert.joinfalse; set mapreduce.map.memory.mb10150; set mapreduce.map.java.opts-Xmx6144m; set mapreduce.reduce.memory.mb10150; set mapreduce.reduce.java.opts-Xmx8120m; set hive.exec.dynamic.partition.modenonstrict; set hive.exec.dynamic.partitiontrue; set hive.exec.paralleltrue; set hive.support.concurrencyfalse; set mapreduce.map.memory.mb4128; set hive.vectorized.execution.enabledfalse;
mr为计算引擎时任务流脚本不能添加上面这些优化语句不然会报错 在海豚调度HiveSQL任务流推荐使用mr作为Hive的计算引擎。
不仅不需要安装spark而且脚本简洁、任务执行速度快