Link Details

Link 56051 thumbnail
User 240010 avatar

By nivanov
via jroller.com
Published: Dec 06 2007 / 03:17

I’ve had lately several interesting discussions on how to process massive amount of data on the grid (specifically, with GridGain). Imagine that you have say 100TB of data either in files (thousands of files on NAS) or in database (spread over dozens of instance and NAS). Let’s say you are storing textual blogs and you need to calculate tag cloud (i.e. find 20 most frequent tags in those blogs). What’s the best approach?
  • 15
  • 3
  • 945
  • 272

Comments

Add your comment
User 261293 avatar

joecoder replied ago:

0 votes Vote down Vote up Reply

The author discusses "affinity split", but doesn't actually describe how he increases data affinity in the scenarios he describes. He also implies that external resources like Network Accessible Storage (NAS) will scale linearly to a large number of simultaneous users. As the number of simultaneous data accesses increases on the NAS, the performance per connection will drop and you will not see the linear increase in data processing performance that the author promises.

It's also worth checking out other technologies for supporting this style of computing. For example, Java Parallel Processing Framework (JPPF), Gigaspaces and Terracotta.

User 137674 avatar

ronslow replied ago:

0 votes Vote down Vote up Reply

Wow - this technology is amazing. Absolutely zero deployment.

Add your comment


Html tags not supported. Reply is editable for 5 minutes. Use [code lang="java|ruby|sql|css|xml"][/code] to post code snippets.