0%

win10中IDEA导入hive源码并实现远程连接服务器元数据

windows下坑很多,比linux和mac都要麻烦,本篇文章能实现得感谢red哥的分享

一、准备工作

  1. 下载Hive源码

    本次案例,我们使用CDH版本,版本为:hive-1.1.0-cdh5.16.2-src.tar.gz

    下载链接:http://archive.cloudera.com/cdh5/cdh/5/hive-1.1.0-cdh5.16.2-src.tar.gz

  2. 编译Hive源码

    使用git-bash编译

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    mvn clean package -DskipTests=true -Phadoop-2

    ## 经过漫长的等待,出现如下界面时,表示编译成功
    [INFO] Reactor Summary:
    [INFO]
    [INFO] Hive 1.1.0-cdh5.16.2 ............................... SUCCESS [ 3.119 s]
    [INFO] Hive Classifications ............................... SUCCESS [ 2.406 s]
    [INFO] Hive Shims Common .................................. SUCCESS [ 3.327 s]
    [INFO] Hive Shims 0.23 .................................... SUCCESS [ 3.494 s]
    [INFO] Hive Shims Scheduler ............................... SUCCESS [ 2.423 s]
    [INFO] Hive Shims ......................................... SUCCESS [ 1.463 s]
    [INFO] Hive Common ........................................ SUCCESS [ 8.382 s]
    [INFO] Hive Serde ......................................... SUCCESS [ 8.001 s]
    [INFO] Hive Metastore ..................................... SUCCESS [ 28.285 s]
    [INFO] Hive Ant Utilities ................................. SUCCESS [ 1.668 s]
    [INFO] Spark Remote Client ................................ SUCCESS [ 4.915 s]
    [INFO] Hive Query Language ................................ SUCCESS [01:36 min]
    [INFO] Hive Service ....................................... SUCCESS [ 22.921 s]
    [INFO] Hive Accumulo Handler .............................. SUCCESS [ 5.496 s]
    [INFO] Hive JDBC .......................................... SUCCESS [ 5.797 s]
    [INFO] Hive Beeline ....................................... SUCCESS [ 3.957 s]
    [INFO] Hive CLI ........................................... SUCCESS [ 4.060 s]
    [INFO] Hive Contrib ....................................... SUCCESS [ 4.321 s]
    [INFO] Hive HBase Handler ................................. SUCCESS [ 5.518 s]
    [INFO] Hive HCatalog ...................................... SUCCESS [ 1.399 s]
    [INFO] Hive HCatalog Core ................................. SUCCESS [ 5.933 s]
    [INFO] Hive HCatalog Pig Adapter .......................... SUCCESS [ 4.632 s]
    [INFO] Hive HCatalog Server Extensions .................... SUCCESS [ 4.477 s]
    [INFO] Hive HCatalog Webhcat Java Client .................. SUCCESS [ 4.903 s]
    [INFO] Hive HCatalog Webhcat .............................. SUCCESS [ 7.452 s]
    [INFO] Hive HCatalog Streaming ............................ SUCCESS [ 4.306 s]
    [INFO] Hive HWI ........................................... SUCCESS [ 3.461 s]
    [INFO] Hive ODBC .......................................... SUCCESS [ 3.061 s]
    [INFO] Hive Shims Aggregator .............................. SUCCESS [ 0.840 s]
    [INFO] Hive TestUtils ..................................... SUCCESS [ 1.077 s]
    [INFO] Hive Packaging 1.1.0-cdh5.16.2 ..................... SUCCESS [ 4.194 s]
    [INFO] ------------------------------------------------------------------------
    [INFO] BUILD SUCCESS
    [INFO] ------------------------------------------------------------------------
    [INFO] Total time: 04:22 min
    [INFO] Finished at: 2020-04-12T18:50:46+08:00
    [INFO] ------------------------------------------------------------------------
  3. 将源码导入IDEA

    源码以Maven方式,导入IDEA后,等待依赖加载完成,点击Build Project编译

二、修改源码

  1. 找到hive-cli模块,在src下,新建resources目录,并标记为资源目录

  2. 拷贝集群上如下配置文件到resources目录中

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    core-site.xml
    hdfs-site.xml
    mapred-site.xml
    yarn-site.xml
    hive-site.xml

    [注]
    1. hive-site.xml 需添加 metastore 信息
    <property>
    <name>hive.metastore.uris</name>
    <value>thrift://192.168.0.50:9083</value>
    </property>
    <property>
    <name>hive.metastore.warehouse.dir</name>
    <value>/usr/hive/warehouse</value>
    </property>
    2. 服务器需启动 metastore 服务
    hive --service metastore -p 9083 &
  3. 运行CliDriver

    1
    2
    3
    4
    5
    6
    控制台输出如下信息

    WARNING: Hive CLI is deprecated and migration to Beeline is recommended.
    hive (default)> show databases;

    但我们发现输入sql语句后,回车,控制台没有响应,无任何输出
  4. DEBUG 源码

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    1. step1

    .....

    String dbSpaces = spacesForString(curDB);

    while ((line = reader.readLine(curPrompt + "> ")) != null) {
    // .....
    }

    .....

    2. class ConsoleReader {
    .....
    public readLine(...) {
    // .....
    }
    .....
    }

    DEBUG 源码到 while 循环时,我们发现,控制台输入sql后,并没有进入循环体,
    我们进入 readLine 方法后发现,类名为 ConsoleReader ,查看该类发现,使用的是JLine来处理控制台输入,进一步推测 ConsoleReader 仅适用于 UNIX/MAC 系统的控制台输入,不适用于Windows 的控制台输入
  5. 修改源码

    我们知道Hive选择的是JLine来处理控制台输入,因此,我们选择修改控制台输入方式为 Scanner
    我们对源码做如下修改

    • 变更前
    1
    2
    3
    while ((line = reader.readLine(curPrompt + "> ")) != null) {
    // ......
    }
    • 变更后
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    Scanner scanner = new Scanner(System.in);

    while (true) {
    System.out.println(curPrompt + "> ");
    line = scanner.nextLine();
    if (null == line) {
    break;
    }

    // ......
    }

6.不通过修改源码解决问题

-Djline.WindowsTerminal.directConsole=false

7.hdfs-site.xml中添加一个参数(debug四小时……)

1
2
3
4
<property>
<name>dfs.client.use.datanode.hostname</name>
<value>true</value>
</property>

三、测试

  1. 启动CliDriver

  2. 控制台输入 show databases

  3. 输入如下日志

    1
    2
    3
    4
    5
    6
    hive (default)> 
    show databases;
    OK
    database_name
    default
    Time taken: 0.267 seconds, Fetched: 7 row(s)
  4. 证明我们修改成功了,也证实了我们前面的推测