Subversion
Written by: Lorna Jane Mitchell
Featured Refcardz: Top Refcardz:
  1. Git
  2. DNS
  3. Data Mining
  4. Spring Data
  5. Subversion
  1. Spring Data
  2. Subversion
  3. Spring Config.
  4. Spring Annotations
  5. Data Mining

Link Details

Link 831747 thumbnail
User 1050989 avatar

By pacoid
via cascading.org
Published: Aug 11 2012 / 11:37

"Cascading for the Impatient" is a series of blog posts which show how to get started using Cascading. Quickly, like yesterday. The intent is to provide useful code which can be applied elsewhere, while examining best practices for leveraging the API. In addition to showing "How", these blog posts also discuss the "Why" of engineering trade-offs involved in deploying robust MapReduce apps at scale, and test-driven development (TDD) for "Big Data". Cascading is an open source project which provides an API for MapReduce apps. Think of it as enterprise-grade workflow orchestration atop Hadoop, with bindings in Java, Scala, Clojure, Python, and Ruby. In the "Impatient" series, we look in detail at a program which starts out as the simplest Cascading app possible, a distributed file copy. Then we add a few changes to the code in each additional post, until we have a full TF*IDF implementation for Hadoop.
  • 7
  • 0
  • 918
  • 1066

Add your comment


Html tags not supported. Reply is editable for 5 minutes. Use [code lang="java|ruby|sql|css|xml"][/code] to post code snippets.

Voters For This Link (7)



Voters Against This Link (0)