Link Details

Link 995527 thumbnail
User 284671 avatar

By shamim_ru
Submitted: Jul 10 2013 / 05:01

One of the main disadvantage of using PIG is that, Pig always raise all the data from Cassandra Storage, and after that it can filter by your choose. It's very easy to imagine how the workload will be if you have a tons of million rows in your CF. For example, in our production environment we have always more than 300 million rows, where only 20-25 millions of rows is unprocessed. When we are executing pig script, we have got more than 5000 map tasks with all the 300 millions of rows. It's time consuming and high load batch processing we always tried to avoid but in vain. It's could be very nice if we could use CQL query in pig scripts with where clause to select and filter our data. Here benefit is clear, less data will consume, less map task and a little workload.
  • 3
  • 0
  • 391
  • 78

Add your comment

Html tags not supported. Reply is editable for 5 minutes. Use [code lang="java|ruby|sql|css|xml"][/code] to post code snippets.

Voters For This Link (2)

Voters Against This Link (0)

    Apache Hadoop
    Written by: Piotr Krewski
    Featured Refcardz: Top Refcardz:
    1. Play
    2. Akka
    3. Design Patterns
    4. OO JS
    5. Cont. Delivery
    1. Play
    2. Java Performance
    3. Akka
    4. REST
    5. Java