hive multiple inserts 一次扫描多次插入

hive multiple inserts可以仅执行一次扫描,将这次扫描的数据分别根据不同的条件插入到多张表或者同一张表的不同分区中。但是不能插入同一张表!

Hive extension (multiple inserts):
FROM from_statement
INSERT OVERWRITE TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...) [IF NOT EXISTS]] select_statement1
[INSERT OVERWRITE TABLE tablename2 [PARTITION ... [IF NOT EXISTS]] select_statement2]
[INSERT INTO TABLE tablename2 [PARTITION ...] select_statement2] ...;

这个语法可以降低扫描次数,如果分开多次插入,就会需要扫描多次。如果写成multiple inserts就只会扫表一次。官网解释如下:

Multiple insert clauses (also known as Multi Table Insert) can be specified in the same query.

Multi Table Inserts minimize the number of data scans required. Hive can insert data into multiple tables by scanning the input data just once (and applying different query operators) to the input data.

以下是执行计划截图:


测试情况:在20亿的数据中,执行扫描插入到六张表,只花了六分钟。

SSL服务器的url重写,导致系统登录后跳转到http协议

首先描述下我们的部署架构: 在F5-#1上配置虚拟服务器VS-#1以及负载均衡池(包含10.1.1.1、10.1.1.2),并将VS-#1 443端口的https请求解密为http请求后转发至池内服务器80端口。

用户通过浏览器访问系统的时候,是使用https访问到ssl服务器,然后ssl服务器使用url重写,将请求报文修改为http协议,并负载均衡转发到后台服务器。这样就造成一个问题:后台tomcat容易并不知道自己需要使用https,只知道自己是使用的是http。也是就说:httpservletrequest.getSchema()得到的是http,而不是https,所以这样就导致spring security的重定向到系统首页的url地址是http协议,而不是https协议。

解决方案:

让tomcat知道自己位于proxy(h5负载均衡)后面。在tomcat的sever.xml里面的connector里面生命proxy的相关信息,包括proxyName,proxyPort,schema,secure等等。参考网址:http://www.thecodingforums.com/threads/load-balancing-an-https-java-web-application-in-tomcat.145712/

proxyName=”https://www.test.com” proxyPort=”443″ scheme=”https” secure=”true”

kafka cdh kerberos 安全认证后,使用java程序demo

折腾了两天,终于在启用了kerberos安全认证的cdh5.13上,完成了java版的生产者,消费者DEMO.一种功德无量的感觉油然而生。

生产者代码:

消费者代码:

配置文件:


代码看起来非常简单,但是从0开始会有好多问题。以下是问题记录:

1、Cannot locate default realm)

解决方法:
因为没有在krb5.conf文件里面没有加上default_realm。加上就好了。。

[libdefaults]
default_realm = test.COM

2、The group coordinator is not available
消费者程序,在启动后一直报上面的错误信息。后来检索后,hdp对问题的解释如下:

参考网站:https://community.hortonworks.com/content/supportkb/175137/error-the-group-coordinator-is-not-available-when.html

Cause:
When using bootstrap-server parameter, the connection is through the Brokers instead of Zookeeper. The Brokers use __consumer_offsets to store information about committed offsets for each topic:partition per group of consumers (groupID). In this case, __consumer_offsets was pointing to invalid Broker IDs. Hence, the above the exception was displayed. To check if the Broker IDs are correct for this topic, execute the following command:

所以我根据上面的提示,删除了对应的__consumer_offsets,没有任何改动,程序就成功了。
操作指令: 
rmr /kafka/brokers/topics/__consumer_offsets

3、在程序开发的时候,务必把log4j的日志级别设置为debug。因为很多异常信息,发现只有在log4j中打印出来,整个程序死活都不会抛出异常。

4、这个问题有点恶心,日志报错为:Caused by: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Server not found in Kerberos database (7) – LOOKING_UP_SERVER)]

Caused by: KrbException: Identifier doesn’t match expected value (906)

这个问题报错得相当恶心,因为集群内都是可以运行的,集群外就是不行。从日志上看,kerberos服务是成功登录的,但是到了和kafka通讯的时候,就报错了。
后来这个问题既不是没有安装kerberos,也不是因为没有安装JCE。而是没有在hosts文件没有配置集群内服务器的hosts。。。。不知道为什么报错要这么不明显。。哎。反正加了集群内所有服务器的hosts就搞定了

5、Network is unreachable (connect failed) 认证失败

null credentials from Ticket Cache
[Krb5LoginModule] authentication failed
Network is unreachable (connect failed)
10:17:32,300 INFO KafkaProducer:341 – [Producer clientId=producer-1] Closing the Kafka producer with timeoutMillis = 0 ms.
10:17:32,319 DEBUG KafkaProducer:177 – [Producer clientId=producer-1] Kafka producer has been closed
Exception in thread “main” java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.eclipse.jdt.internal.jarinjarloader.JarRsrcLoader.main(JarRsrcLoader.java:61)
Caused by: org.apache.kafka.common.KafkaException: Failed to construct kafka producer
at org.apache.kafka.clients.producer.KafkaProducer.(KafkaProducer.java:441)
at org.apache.kafka.clients.producer.KafkaProducer.(KafkaProducer.java:297)
at cn.com.bmsoft.kafka.client.SimpleProducer.main(SimpleProducer.java:27)
… 5 more
Caused by: org.apache.kafka.common.KafkaException: javax.security.auth.login.LoginException: Network is unreachable (connect failed)
at org.apache.kafka.common.network.SaslChannelBuilder.configure(SaslChannelBuilder.java:112)
at org.apache.kafka.common.network.ChannelBuilders.create(ChannelBuilders.java:114)
at org.apache.kafka.common.network.ChannelBuilders.clientChannelBuilder(ChannelBuilders.java:61)
at org.apache.kafka.clients.ClientUtils.createChannelBuilder(ClientUtils.java:86)
at org.apache.kafka.clients.producer.KafkaProducer.(KafkaProducer.java:398)
… 7 more
Caused by: javax.security.auth.login.LoginException: Network is unreachable (connect failed)
at com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:808)
at com.sun.security.auth.module.Krb5LoginModule.login(Krb5LoginModule.java:617)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at javax.security.auth.login.LoginContext.invoke(LoginContext.java:755)
at javax.security.auth.login.LoginContext.access$000(LoginContext.java:195)
at javax.security.auth.login.LoginContext$4.run(LoginContext.java:682)
at javax.security.auth.login.LoginContext$4.run(LoginContext.java:680)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.login.LoginContext.invokePriv(LoginContext.java:680)
at javax.security.auth.login.LoginContext.login(LoginContext.java:587)
at org.apache.kafka.common.security.authenticator.AbstractLogin.login(AbstractLogin.java:52)
at org.apache.kafka.common.security.kerberos.KerberosLogin.login(KerberosLogin.java:98)
at org.apache.kafka.common.security.authenticator.LoginManager.(LoginManager.java:53)
at org.apache.kafka.common.security.authenticator.LoginManager.acquireLoginManager(LoginManager.java:82)
at org.apache.kafka.common.network.SaslChannelBuilder.configure(SaslChannelBuilder.java:103)
… 11 more
Caused by: java.net.SocketException: Network is unreachable (connect failed)
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at sun.security.krb5.internal.TCPClient.(NetClient.java:63)
at sun.security.krb5.internal.NetClient.getInstance(NetClient.java:43)
at sun.security.krb5.KdcComm$KdcCommunication.run(KdcComm.java:393)
at sun.security.krb5.KdcComm$KdcCommunication.run(KdcComm.java:364)
at java.security.AccessController.doPrivileged(Native Method)
at sun.security.krb5.KdcComm.send(KdcComm.java:348)
at sun.security.krb5.KdcComm.sendIfPossible(KdcComm.java:253)
at sun.security.krb5.KdcComm.send(KdcComm.java:229)
at sun.security.krb5.KdcComm.send(KdcComm.java:200)
at sun.security.krb5.KrbAsReqBuilder.send(KrbAsReqBuilder.java:316)
at sun.security.krb5.KrbAsReqBuilder.action(KrbAsReqBuilder.java:361)
at com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:776)
… 28 more

首先我检查了到krb服务器的88端口,以及kafka所在的9092端口,全部都是通的,所以不存在网络错误。。最后问题还是处在了dns上。
解决:
修改krb5.conf的kdc和admin_server的值为/etc/hosts文件中对应的域名。

TEST.COM = {
kdc = 10.1.1.1
admin_server = 10.1.1.1
}
修改为:
TEST.COM = {
kdc = KERB_SERVER
admin_server = KERB_SERVER
}