Link Details

Link 78891 thumbnail
User 278430 avatar

By dominik
via wiki.apache.org
Published: May 04 2008 / 05:14

Comparing GridGain to Hadoop.
  • 8
  • 4
  • 2739
  • 646

Comments

Add your comment
User 240010 avatar

Nikita Ivanov replied ago:

-1 votes Vote down Vote up Reply

If you want to do a comparison - do it at least minimally correct. The funniest part is the cost of GridGain :) If the author would look at the homepage of our website he would see that GridGain is free and licensed under LGPL.

In the same time, if one needs a professional support it is provided – while with Hadoop, you are on your own. Rest of the analysis is of the same quality. Hadoop is geared towards large data sets (which is totally supported by GridGain too). Other than that – amount of features and functionality is not even comparable between two. May be someone can point me where I can find some of the essentials in Hadoop like:
- Early and late load balancing
- AOP-based grid enabling
- Direct API for MapReduce
- Partial asynchronous reduction
- Redundant mapping
- Affinity MapReduce for all major vendors
- Automatic discovery and custom topology management
- Pluggable fault-tolerance
- Checkpoints for long running jobs
- Zero deployment model with grid-aware class loading
- SPI-based integration and customization
- Distributed task sessions for connected grid tasks
- Etc.
- Etc.

GridGain provides all that and lots more. One of the reasons that GridGain gets started every 60 seconds somewhere around the globe – in just 8 month since the project release. Think about this stat…

On a side note, Hadoop is a great framework whose main innovation is HDFS and it should stick to it.

Nikita Ivanov.
GridGain – Grid Computing Made Simple
http://www.gridgain.org

User 237121 avatar

owen.omalley replied ago:

0 votes Vote down Vote up Reply

Ok, it wasn't fair to compare costs, since the listed costs for GridGain's are for support. Commercial Hadoop support is also available. It should be noted that Hadoop's Apache license is much more business friendly than GridGain's LGPL, especially since the FSF's take on LGPL with Java code makes it viral for linking. Therefore, anyone linking with GridGain's libraries may be forced to open source their code. (http://www.gnu.org/licenses/lgpl-java.html)

The primary characteristic of map/reduce is to process and sort large distributed datasets reliably. A trivial test of Hadoop is sorting 1TB of data on 100 nodes, which takes roughly 3/4 of an hour. Hadoop runs production jobs on 2000 nodes lifting 100's of TB of data. Because of the architecture of GridGain, it can't scale to more than 1GB or so. It also doesn't provide a sort. It also severely limits scalability by having a single reducer. In most map/reduce applications, the reduce is considerably slower than the maps, therefore the application will be single threaded for the majority of its run.

Hadoop does scheduling when task slots become available and thus avoids the noise associated with GridGain's scheduling by trial and failure semantics.

Hadoop also does incremental reduction (combiners), except that it more efficient because it runs on the nodes where the map ran.

I will certainly say that GridGain's JavaDocs have a lot more fancy formatting than Hadoops.


User 241464 avatar

Dmitriy Setrakyan replied ago:

-1 votes Vote down Vote up Reply

Owen :)

1. I really don't know what you are smoking when you say "Therefore, anyone linking with GridGain's libraries may be forced to open source their code."

To those poor and misguided commercial JBoss users: Owen just made a break-through discovery in LGPL licensing, you must quickly open source all your code or face the consequences!

2. About your claim that GridGain can only scale to 1GB.
GridGain has an option not to cache results. Take a look at @GridTaskNoResultCache annotation for more info - http://www.gridgain.com/javadoc/org/gridgain/grid/GridTaskNoResultCache.html . I assume you simply didn't know.

3. And finally, GridGain is already used to process Terrabytes of data. Take a look at this presentation created by one of our users: http://videos.apnicommunity.com/Video,Item,3251103276.html

Owen, It is funny that you are trying to compare Hadoop to GridGain. Hadoop is a distributed file system, GridGain is a computation grid. If you need to store huge sets of data in some proprietary file system - use Hadoop. If you are a user of a normal Database or simply need a very powerful and easy-to-use compute grid - use GridGain.

Best,
Dmitriy Setrkayan
http://www.gridgain.com

User 237121 avatar

owen.omalley replied ago:

0 votes Vote down Vote up Reply

I'm not sure what is meant by:

| Affinity MapReduce for all major vendors

but Hadoop runs on Windows, Macs, Linux, and Solaris, which seems like all of the operating systems that currently matter.

User 241464 avatar

Dmitriy Setrakyan replied ago:

-1 votes Vote down Vote up Reply

What Nikita meant is that GridGain supports "Computation to Data" affinity - basically collocating your computation with your data. We integrate with all major data grid vendors.

Take a look here for more info:
Wiki: http://www.gridgainsystems.com/wiki/display/GG15UG/GridAffinityLoadBalancingSpi
Example: http://www.gridgainsystems.com/wiki/display/GG15UG/Affinity+MapReduce+with+JBoss+Cache

User 240010 avatar

Nikita Ivanov replied ago:

0 votes Vote down Vote up Reply

Owen,
Hadoop is great project (I am saying it without any sarcasm). No need to such over-reactions. As Denis pointed out in his blog despite the fact that we both use MapReduce term in description of our respective frameworks - we are rather very different in many ways. Not much reason to argue here...

I can dispute and refute *each* and *every* point you have made as we *specifically* designed GridGain to be generation ahead of the legacy designs like Hadoop, Sun Grid Engine, etc.. I will do that on my blog once I get a bit more time...

And, yes, our Javadoc is not only well formatted but it is probably one of the best you can find by a long mile. And the same quality of engineering you can find throughout the entire product from the design, coding, tests, and documentation.

Best,
Nikita Ivanov.

User 240010 avatar

Nikita Ivanov replied ago:

0 votes Vote down Vote up Reply

Hey Owen,
Last comment: on licensing. Following your logic Jboss (LGPL), MySQL (GPL) and Spring App Server (GPLv3), for example, should have been pretty much dead on arrival :) as they are "not business friendly" license wise. 'Nuf said...

For your reference, GridGain is dual-licensed with sources licensed with LGPL and binaries with ASL 2.0 because many Apache project already linking to GridGain.

Next time: do your homework before posting...

Nikita Ivanov.
GridGain Systems.

User 284172 avatar

acmurthy replied ago:

0 votes Vote down Vote up Reply

Hadoop isn't just a 'filesystem', HDFS and Map-Reduce are the main components of "Hadoop".

Oh, and http://developer.yahoo.com/blogs/hadoop/2008/02/yahoo-worlds-largest-production-hadoop.html ...

Add your comment


Html tags not supported. Reply is editable for 5 minutes. Use [code lang="java|ruby|sql|css|xml"][/code] to post code snippets.