Our servers are running Ubuntu Linux, and the binary file is a BSON dump of a large MongoDB collection. How reliable is a tool like split? Is there a faster or better way to do this?
4 Answers
split is very reliable. We use it for porting large log files, and it worked well for up to a couple of GBs ( not 50 gb anyway ).
I believe you can try using the split for your requirement, and let us know.
Split into 5GB files
split --bytes=5G inputfile
It will split into multiple files of 5GB and name it as xaa, xab, xac, .... and so on.
Concatenate
cat x* > outfile
by this you can concatenate as single file in the other end.
3To split, split -b
To join, just cat.
AFAIK they are completely reliable, and I doubt there is something more efficient.
split & cat are totally reliable. You can additionally compress in-line like this. Suppose your input file is dump.bson:
gzip < dump.bson | split -b 32M - dump.bson.gz.And then reconstitute with this:
cat dump.bson.gz.* | gunzip > dump.bsonTip, this works just as well with xz(dec) in place of g(un)zip
If you have rar installed, it's worked very well for me:
To Separate
rar a -m0 -v5000m newfilename giantfile.foo- a = add files to archive
- m0 = no compression
- v5000m = split into chunks of 5000 megabytes
To Reassemble:
unrar x newfilename.*- x = extract
Benefits:
- CRC on the content of the split archive,
- split-file ordering kept automatically,
- multiple files and dirs can be included.