How to install parquet-tools on Ubuntu 18.04 LTS without building from source

I've seen:

And a few more on installing thrift. I would really prefer not to build thirft and then parquet-mr from source. All I want is parquet-tools.

I'm on:

$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 18.04.5 LTS
Release: 18.04
Codename: bionic
$

Things I've tried:

  • Download source from github and apache

  • Try to build from source as described here and here. I get many different errors.

  • Build from master or build from some release tags like 1.11.x. Got various errors, e.g.

    org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal org.apache.maven.plugins:maven-remote-resources-plugin:1.5:process (default) on project parquet-generator: Error rendering velocity resource. at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:215) ...
    Caused by: org.apache.maven.plugin.MojoExecutionException: Error rendering velocity resource. at org.apache.maven.plugin.resources.remote.ProcessRemoteResourcesMojo.processResourceBundles (ProcessRemoteResourcesMojo.java:1246) ...
    Caused by: java.lang.NullPointerException at java.util.Objects.requireNonNull (Objects.java:203) ...
  • Install thrift using sudo apt-get install thrift-compiler (which installs 0.9.x, which gives compilation errors while building parquet-mr)

    [DEBUG] (f) arguments = [-c, thrift -version | fgrep 'Thrift version 0.12.0' && exit 0; echo "================================================================================="; echo "========== [FATAL] Build is configured to require Thrift version 0.12.0 =========="; echo -n "========== Currently installed: "; thrift -version; echo "================================================================================="; exit 1]
  • Try to build thrift from source, I get some errors:

    checking whether we are cross compiling... configure: error: in `/home/kash/vm_share/thrift-0.13.0':
    configure: error: cannot run C compiled programs.
  • Tried to look for 0.12/13.0 of pre-built thrift but can't find it. Looks like for bionic there is only 0.9.0

Please! I just want to see the meta of a parquet file on command line.

1

1 Answer

So I finally managed to compile from source.

TL;DR

  1. Compile trift with --host=x86_64.
  2. Use apache-parquet-1.11.11 tag on parquet-mr repo instead of master.
  3. Update trift dependency version from 12 to 13 in parquet-mr/pom.xml and add maven central repo (codehaus is dead):
+ <repository>
+ <id>mvnrepository</id>
+ <url>
+ </repository>
...
- <thrift.version>0.12.0</thrift.version>
+ <thrift.version>0.13.0</thrift.version>

# install dependencies as described here:
# install thrift from source
wget -nv
tar xzf thrift-0.13.0.tar.gz
cd thrift-0.13.0
chmod +x ./configure
./configure --host=x86_64 --disable-libs
sudo make install
# build parquet-tools from source
git clone
cd parquet-mr
git checkout apache-parquet-1.11.11
# build only parquet-tools and it's dependencies
# had to skip tests because one failed
mvn package -pl parquet-tools -am -Plocal -Dmaven.test.skip=true
# Use
java -jar parquet-tools/target/parquet-tools-*.jar --help
# Or if you're lazy like me:
alias parquet-tools="java -jar $(realpath ./parquet-tools/target/parquet-tools-*.jar)"
parquet-tools -h

Your Answer

Sign up or log in

Sign up using Google Sign up using Facebook Sign up using Email and Password

Post as a guest

By clicking “Post Your Answer”, you agree to our terms of service, privacy policy and cookie policy

You Might Also Like