I've seen:
- Installing parquet-tools
- Cannot compile parquet-tools
- Could not transfer artifact (): Received fatal alert: protocol_version -> [Help 1]
- maven project failed to execute "maven-thrift-plugin"
- How to install libthrift-dev on Ubuntu?
And a few more on installing thrift. I would really prefer not to build thirft and then parquet-mr from source. All I want is parquet-tools.
I'm on:
$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 18.04.5 LTS
Release: 18.04
Codename: bionic
$Things I've tried:
Try to build from source as described here and here. I get many different errors.
Build from
masteror build from some release tags like1.11.x. Got various errors, e.g.org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal org.apache.maven.plugins:maven-remote-resources-plugin:1.5:process (default) on project parquet-generator: Error rendering velocity resource. at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:215) ... Caused by: org.apache.maven.plugin.MojoExecutionException: Error rendering velocity resource. at org.apache.maven.plugin.resources.remote.ProcessRemoteResourcesMojo.processResourceBundles (ProcessRemoteResourcesMojo.java:1246) ... Caused by: java.lang.NullPointerException at java.util.Objects.requireNonNull (Objects.java:203) ...Install thrift using
sudo apt-get install thrift-compiler(which installs0.9.x, which gives compilation errors while buildingparquet-mr)[DEBUG] (f) arguments = [-c, thrift -version | fgrep 'Thrift version 0.12.0' && exit 0; echo "================================================================================="; echo "========== [FATAL] Build is configured to require Thrift version 0.12.0 =========="; echo -n "========== Currently installed: "; thrift -version; echo "================================================================================="; exit 1]Try to build
thriftfrom source, I get some errors:checking whether we are cross compiling... configure: error: in `/home/kash/vm_share/thrift-0.13.0': configure: error: cannot run C compiled programs.Tried to look for
0.12/13.0of pre-builtthriftbut can't find it. Looks like for bionic there is only0.9.0
Please! I just want to see the meta of a parquet file on command line.
11 Answer
So I finally managed to compile from source.
TL;DR
- Compile
triftwith--host=x86_64. - Use
apache-parquet-1.11.11tag on parquet-mr repo instead ofmaster. - Update trift dependency version from 12 to 13 in
parquet-mr/pom.xmland add maven central repo (codehausis dead):
+ <repository>
+ <id>mvnrepository</id>
+ <url>
+ </repository>
...
- <thrift.version>0.12.0</thrift.version>
+ <thrift.version>0.13.0</thrift.version># install dependencies as described here:
# install thrift from source
wget -nv
tar xzf thrift-0.13.0.tar.gz
cd thrift-0.13.0
chmod +x ./configure
./configure --host=x86_64 --disable-libs
sudo make install
# build parquet-tools from source
git clone
cd parquet-mr
git checkout apache-parquet-1.11.11
# build only parquet-tools and it's dependencies
# had to skip tests because one failed
mvn package -pl parquet-tools -am -Plocal -Dmaven.test.skip=true
# Use
java -jar parquet-tools/target/parquet-tools-*.jar --help
# Or if you're lazy like me:
alias parquet-tools="java -jar $(realpath ./parquet-tools/target/parquet-tools-*.jar)"
parquet-tools -h