cdh 启用kerberos后的各种问题!!真的好累

在CDH启用了kerberos认证后,各种组件不断各种错误。。。很多都是网上查不到。不知道是我们安装不对还是什么情况。
现在记录下来以备将来使用。

一、在从hive建立映射表到hbase的时候报错,执行的语句如下:
CREATE EXTERNAL TABLE hbase_family_base(rk string,
id string,
agent_code string,
cust_ecif_id string,
real_name string,
gender string,
birthday string,
age string,
certi_type string,
certi_code string,
job_id string,
job_zh string,
relatives_ecif_id string,
relatives_real_name string,
relatives_gender string,
relatives_birthday string,
relatives_age string,
relatives_certi_type string,
relatives_certi_code string,
relatives_job_id string,
relatives_job_zh string,
relation string,
policy_num string)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,family_base_test:id,
family_base_test:agent_code,
family_base_test:cust_ecif_id,
family_base_test:real_name,
family_base_test:gender,
family_base_test:birthday,
family_base_test:age,
family_base_test:certi_type,
family_base_test:certi_code,
family_base_test:job_id,
family_base_test:job_zh,
family_base_test:relatives_ecif_id,
family_base_test:relatives_real_name,
family_base_test:relatives_gender,
family_base_test:relatives_birthday,
family_base_test:relatives_age,
family_base_test:relatives_certi_type,
family_base_test:relatives_certi_code,
family_base_test:relatives_job_id,
family_base_test:relatives_job_zh,
family_base_test:relation,
family_base_test:policy_num string")
TBLPROPERTIES("hbase.table.name" = "hbase_family_base_test","hbase.mapred.output.outputtable" = "hbase_family_base_test");

然后报错信息如下:
INFO : Completed executing command(queryId=hive_20181124160404_28d8cae7-f4c3-46bc-ad48-5882f81289c0); Time taken: 48.329 seconds
Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=36, exceptions:
Sat Nov 24 16:05:39 CST 2018, null, java.net.SocketTimeoutException: callTimeout=60000, callDuration=68267: row 'hbase_image,,' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=node06,60020,1543042854540, seqNum=0

at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.throwEnrichedException(RpcRetryingCallerWithReadReplicas.java:320)
at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:247)
at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:62)
at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:210)
at org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:327)
at org.apache.hadoop.hbase.client.ClientScanner.nextScanner(ClientScanner.java:302)
at org.apache.hadoop.hbase.client.ClientScanner.initializeScannerInConstruction(ClientScanner.java:167)
at org.apache.hadoop.hbase.client.ClientScanner.(ClientScanner.java:162)
at org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:862)
at org.apache.hadoop.hbase.MetaTableAccessor.fullScan(MetaTableAccessor.java:602)
at org.apache.hadoop.hbase.MetaTableAccessor.tableExists(MetaTableAccessor.java:366)
at org.apache.hadoop.hbase.client.HBaseAdmin.tableExists(HBaseAdmin.java:421)
at org.apache.hadoop.hbase.client.HBaseAdmin.tableExists(HBaseAdmin.java:431)
at org.apache.hadoop.hive.hbase.HBaseStorageHandler.preCreateTable(HBaseStorageHandler.java:195)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:735)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:728)
at sun.reflect.GeneratedMethodAccessor22.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:105)
at com.sun.proxy.$Proxy20.createTable(Unknown Source)
at sun.reflect.GeneratedMethodAccessor22.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:2134)
at com.sun.proxy.$Proxy20.createTable(Unknown Source)
at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:784)
at org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:4177)
at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:311)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:214)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:99)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2052)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1748)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1501)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1285)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1280)
at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:236)
at org.apache.hive.service.cli.operation.SQLOperation.access$300(SQLOperation.java:89)
at org.apache.hive.service.cli.operation.SQLOperation$3$1.run(SQLOperation.java:301)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1917)
at org.apache.hive.service.cli.operation.SQLOperation$3.run(SQLOperation.java:314)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.SocketTimeoutException: callTimeout=60000, callDuration=68267: row 'hbase_image,,' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=node06,60020,1543042854540, seqNum=0
at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:169)
at org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture.run(ResultBoundedCompletionService.java:80)
... 3 more
Caused by: org.apache.hadoop.hbase.exceptions.ConnectionClosingException: Call to node06/10.137.65.9:60020 failed on local exception: org.apache.hadoop.hbase.exceptions.ConnectionClosingException: Connection to node06/10.137.65.9:60020 is closing. Call id=142, waitTime=1
at org.apache.hadoop.hbase.ipc.AbstractRpcClient.wrapException(AbstractRpcClient.java:289)
at org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1273)
at org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:227)
at org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:336)
at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.scan(ClientProtos.java:34094)
at org.apache.hadoop.hbase.client.ScannerCallable.openScanner(ScannerCallable.java:400)
at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:204)
at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:65)
at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:210)
at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:397)
at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:371)
at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:136)
... 4 more
Caused by: org.apache.hadoop.hbase.exceptions.ConnectionClosingException: Connection to node06/10.137.65.9:60020 is closing. Call id=142, waitTime=1
at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.cleanupCalls(RpcClientImpl.java:1085)
at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.close(RpcClientImpl.java:864)
at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.run(RpcClientImpl.java:581)

解决方法:
1、spark gateway要在每一个节点上部署(这个我不知道有没有效)
2、hbase gateway要在每一个节点上部署(部署后如下错误解除!)

二、 hive on spark 执行sql的时候,报错。报错为:
thrift:bdp01:9083无法连接

到处查找资料都找不到问题的原因。端口检查那些都是正常的。后来测试spark submit任务也是错误的。
最后执行spark-shell指令,一启动就报错。最后确认为无法初始化sqlcontext。因为spark任务一启动就会初始化sqlcontext到hive。
现在也没有明白什么情况,最后重新启用 hive cli才行了,但是这样没有安全意义了。因为kerberos建议禁用hive cli。

三、hive集成了kerberos后,当使用beeline执行sql:load local inpath 指令的时候,碰到好几个问题,如下:
1、第一个问题就是在load local inpath的时候报错:invalid path。死活找不到本地文件路径。后来才发现:
如果你在hiverserver2部署到node01节点的时候,实际执行load local inpath是指node01的本地路径,而不是你beeline执行的那个节点的本地路径。例如你将数据文件放在node17上,然后从node17执行beeline连接到node01的hiverserver2,那么load local 实际是在node01上执行,所以自然就找不到数据文件。
解决方法:
a、在node17上安装一个hiverserver2,简单粗暴!只是如果做了HA后,那么这么解决方案失效
b、不要使用load local 指令。而是首先将文件从本地通过 hdfs put 到hdfs一个路径上。再使用load inpath指令从hdfs上读取。

2、第二个问题是通过部署一个hiverserver2,解决第一个问题后,虽然能找到那个本地文件。但是报错:permission denied
。不允许读取本地文件。
后来发现:beeline在执行hive on spark的时候,无论你kinit的用户是谁,实际执行的用户是hive。所以本地数据文件必须要在hive组里面。解决方法有两个:
a、新建一个用户加载数据文件的操作系统用户,并将该系统用户加入hive组
usermod -g hive test
b、使用chown -R test:hive 数据文件目录。但是要执行这个指令必须具有chown权限的用户才行。

四、hdfs报missblock
在安装好集群后使用一段时间后,突然要演示如何删除节点,再添加回来。结果在添加节点回来的时候,hdfs报错:missing block,超过阀值99%的可用块,导致hdfs经常进入安全模式。整个hdfs集群不可用。
a、hdfs进入安全模式后,会自动修复missingblock。但是我们的集群一直报错,连接不上zookeeper,可能是我们安装的问题。

b、手动处理missing block.但是手动处理的时候一定要确认这些missingblock是否存在备份,如果没有备份,那么这些数据块就真的丢失了,因为也没法恢复。
首先查看missblock的情况:hdfs dfsadmin -report

通过查看
然后执行 hdfs fsck -delete指令。

发表评论

电子邮件地址不会被公开。 必填项已用*标注