By krisjava
via softwarepassion.com
Published: Nov 13 2012 / 09:17
I have created my first (longer than 3 lines) Pig script and started playing with it using the Mathematica Stack Exchange data dump as an example, the reason for it is that its actually quite small compared to some other sites, especially StackOverflow (80% of all analyzed data). Because the SE data is available in the form of XML files (after unpacking) I had to create my custom Pig Data Loader (XML Loader from piggybank didn’t work for me).....
Add your comment