This is purely a derivative of Elephant Bird. If you want the original and complete version that speaks thrift, proto buffer and has lots of functionality, you should checkout the real Elephant Bird
Based on Elephant Bird Version: 1.2.2
Twitter's library of LZO, Thrift, and/or Protocol Buffer-related Hadoop InputFormats, OutputFormats, Writables, Pig LoadFuncs, Hive SerDe, HBase miscellanea, etc. The majority of these are in production at Twitter running over data every day.
Stripped down to just a single PigJsonLoader that is compatible with pig 0.8.
- git clone
- ant
- check out javadoc, etc.
Start your pig-script with: REGISTER lib/google-collect.jar; REGISTER lib/json-simple.jar; REGISTER lib/elephant-bird.jar; REGISTER lib/slf4j-log4j12-1.5.10.jar; REGISTER lib/slf4j-api-1.5.10.jar; REGISTER lib/log4j-1.2.15.jar;
json = LOAD 'filename' USING com.twitter.elephantbird.pig.load.PigJsonLoader();
- Pig 0.8
Apache licensed.
- JSON data
Bug fixes, features, and documentation improvements are welcome! Please fork and send me a pull request on github, and I will do my best to keep up. If you make major changes, add yourself to the contributors list below.
- Kevin Weil (@kevinweil)
- Dmitriy Ryaboy (@squarecog)
- Chuang Liu (@chuangl4)
- Florian Liebert (@floliebert)
- Ning Liang (@ningliang)
- Johan Oskarsson (@skr)
- Raghu Angadi (@raghuangadi)
- Kim Vogt (@kimsterv)
- Knut O. Hellan (@knuthellan)