By mitchp
via architects.dzone.com
Published: Dec 26 2012 / 10:14
I've been experimenting with using Pig on some Fannie-Mae MBS data lately. While I don't mind writing MapReduce programs to process data (especially the fairly simple tasks I'm doing now), I really do appreciate the "magic" Pig does under the blanket, you might say. If you don't know, Pig, a member of the Hadoop ecosystem (and now a first-class Apache project at pig.apache.org), is a framework for analyzing large data sets. In this mini-tutorial we'll see how Pig works with Hadoop and HDFS, and just how much you can accomplish with only a few lines of script. I am using Pig version 0.10.0 on Hadoop 1.1.0 (on Ubuntu 12.04, on VirtualBox 4.2.4, on Windows 7SP1, on the third floor of a tri-level at 1728 m above sea level, but that could change -- see this story about another "PIG").
Add your comment