broker load卡住一个小时

你好,请问下部署了一个broker节点吗?可以curl下errorurl看到具体的异常信息。

你好,这个报错是有脏数据或者列分隔符不对,目标表字段是3列吗?

如果有脏数据的话,可以配置下 max_filter_ratio=0.x,可以过滤对应比例的脏数据。

orc格式的导入指定下导入格式吧,默认是csv,

load label wn.agg_mis_r_sales_a_achment_lep_prem_index_m_202112
(
data infile("hdfs://hdfs01-shyp-sx-stg/user/hive/warehouse/sx_adm_safe.db/agg_mis_r_sales_a_achment_lep_prem_index_m/month=202112/data_from=PREM/margin_version=CUR/*")
into table agg_mis_r_sales_a_achment_lep_prem_index_m
COLUMNS TERMINATED BY "\\x01"
FORMAT AS "orc"
(几十个hive表字段,省略)
)
WITH BROKER 'hdfs_broker'

(
"username" = "hadoop",
"password" = "Bigdata123$"
)

确认下表中字段个数和hdfs文件中字段个数一致吗?一致的话用下面的sql试试

load label wn.agg_mis_r_sales_a_achment_lep_prem_index_m_202112
(
data infile("hdfs://hdfs01-shyp-sx-stg/user/hive/warehouse/sx_adm_safe.db/agg_mis_r_sales_a_achment_lep_prem_index_m/month=202112/data_from=PREM/margin_version=CUR/*")
into table agg_mis_r_sales_a_achment_lep_prem_index_m
COLUMNS TERMINATED BY "\\x01"
(几十个hive表字段,省略)
)
WITH BROKER 'hdfs_broker'

(
    "username" = "hadoop",
    "password" = "Bigdata123$"
)
PROPERTIES
(
    "timeout" = "3600",
    "max_filter_ratio" = "0.1"
);

用这个sql跑下

load label wn.test_202201
(
data infile("hdfs://hdfs01-shyp-sx-stg/user/hive/warehouse/wn.db/test3/month=*/*")
into table test
COLUMNS TERMINATED BY "\x01"
FORMAT AS "orc"
(name,id,month)
COLUMNS FROM PATH AS (month)
set
(log=name,
id=id,
month=month)
)
WITH BROKER 'hdfs_broker'
(
"username" = "hadoop",
"password" = "Bigdata123$"
)
PROPERTIES
(
"max_filter_ratio" = "1"
)