By nivanov
via gridgain.blogspot.com
Published: May 08 2008 / 10:48
Recently Hadoop posted a HadoopVsGridgain comparison page on Wiki. I have always been a big fan of Hadoop. Although I believe that the product is very hard to use and API's are far from obvious, I still think they have achieved quite a lot and the fact that Yahoo Search runs on Hadoop proves that system works and scales quite well. However, this ridiculous "comparison" threw me a bit off and the only reason I can think they put it up is that GridGain started significantly cutting into their user base.
SaveShareSend
Tags: java, open source
Comments
fmoidu replied ago:
Hey folks at GridGain.
My basic question is this:
Given that it seems like a good way to achieve scalability is to put your data next to your cpu. How does gridgain distribute data across nodes?
Lets say that I have a giant data set on a mysql server ( about 300 million rows of 7 columns). If I use GridGain will pieces of that dataset be broken into chunks and sent to nodes? Or will the entire data set be transferred as needed to the various nodes?
Nikita Ivanov replied ago:
In short: we'd recommend using something like JBoss Cache, for example, that when used with GridGain will partition this dataset on the grid for in-memory caching and GridGain will provide affinity (a.k.a data-aware routing) MapReduce operations on that in-memory data.
In many real-life cases this setup can achieve linear scalability into 1000s nodes.
Regards,
Nikita Ivanov.
GridGain Systems.
yakdingo replied ago:
Have you measured "linear scalability into 1000s of nodes" while benchmarking real-life cases or is this statement an estimate based on benchmarks of smaller deployments?
Voters For This Link (11)
Voters Against This Link (4)