If you want to do a comparison - do it at least minimally correct. The funniest part is the cost of GridGain :) If the author would look at the homepage of our website he would see that GridGain is free and licensed under LGPL.
In the same time, if one needs a professional support it is provided – while with Hadoop, you are on your own. Rest of the analysis is of the same quality. Hadoop is geared towards large data sets (which is totally supported by GridGain too). Other than that – amount of features and functionality is not even comparable between two. May be someone can point me where I can find some of the essentials in Hadoop like:
- Early and late load balancing
- AOP-based grid enabling
- Direct API for MapReduce
- Partial asynchronous reduction
- Redundant mapping
- Affinity MapReduce for all major vendors
- Automatic discovery and custom topology management
- Pluggable fault-tolerance
- Checkpoints for long running jobs
- Zero deployment model with grid-aware class loading
- SPI-based integration and customization
- Distributed task sessions for connected grid tasks
- Etc.
- Etc.
GridGain provides all that and lots more. One of the reasons that GridGain gets started every 60 seconds somewhere around the globe – in just 8 month since the project release. Think about this stat…
On a side note, Hadoop is a great framework whose main innovation is HDFS and it should stick to it.
Ok, it wasn't fair to compare costs, since the listed costs for GridGain's are for support. Commercial Hadoop support is also available. It should be noted that Hadoop's Apache license is much more business friendly than GridGain's LGPL, especially since the FSF's take on LGPL with Java code makes it viral for linking. Therefore, anyone linking with GridGain's libraries may be forced to open source their code. (http://www.gnu.org/licenses/lgpl-java.html)
The primary characteristic of map/reduce is to process and sort large distributed datasets reliably. A trivial test of Hadoop is sorting 1TB of data on 100 nodes, which takes roughly 3/4 of an hour. Hadoop runs production jobs on 2000 nodes lifting 100's of TB of data. Because of the architecture of GridGain, it can't scale to more than 1GB or so. It also doesn't provide a sort. It also severely limits scalability by having a single reducer. In most map/reduce applications, the reduce is considerably slower than the maps, therefore the application will be single threaded for the majority of its run.
Hadoop does scheduling when task slots become available and thus avoids the noise associated with GridGain's scheduling by trial and failure semantics.
Hadoop also does incremental reduction (combiners), except that it more efficient because it runs on the nodes where the map ran.
I will certainly say that GridGain's JavaDocs have a lot more fancy formatting than Hadoops.
1. I really don't know what you are smoking when you say "Therefore, anyone linking with GridGain's libraries may be forced to open source their code."
To those poor and misguided commercial JBoss users: Owen just made a break-through discovery in LGPL licensing, you must quickly open source all your code or face the consequences!
Owen, It is funny that you are trying to compare Hadoop to GridGain. Hadoop is a distributed file system, GridGain is a computation grid. If you need to store huge sets of data in some proprietary file system - use Hadoop. If you are a user of a normal Database or simply need a very powerful and easy-to-use compute grid - use GridGain.
What Nikita meant is that GridGain supports "Computation to Data" affinity - basically collocating your computation with your data. We integrate with all major data grid vendors.
Owen,
Hadoop is great project (I am saying it without any sarcasm). No need to such over-reactions. As Denis pointed out in his blog despite the fact that we both use MapReduce term in description of our respective frameworks - we are rather very different in many ways. Not much reason to argue here...
I can dispute and refute *each* and *every* point you have made as we *specifically* designed GridGain to be generation ahead of the legacy designs like Hadoop, Sun Grid Engine, etc.. I will do that on my blog once I get a bit more time...
And, yes, our Javadoc is not only well formatted but it is probably one of the best you can find by a long mile. And the same quality of engineering you can find throughout the entire product from the design, coding, tests, and documentation.
Hey Owen,
Last comment: on licensing. Following your logic Jboss (LGPL), MySQL (GPL) and Spring App Server (GPLv3), for example, should have been pretty much dead on arrival :) as they are "not business friendly" license wise. 'Nuf said...
For your reference, GridGain is dual-licensed with sources licensed with LGPL and binaries with ASL 2.0 because many Apache project already linking to GridGain.
Comments
Nikita Ivanov replied ago:
If you want to do a comparison - do it at least minimally correct. The funniest part is the cost of GridGain :) If the author would look at the homepage of our website he would see that GridGain is free and licensed under LGPL.
In the same time, if one needs a professional support it is provided – while with Hadoop, you are on your own. Rest of the analysis is of the same quality. Hadoop is geared towards large data sets (which is totally supported by GridGain too). Other than that – amount of features and functionality is not even comparable between two. May be someone can point me where I can find some of the essentials in Hadoop like:
- Early and late load balancing
- AOP-based grid enabling
- Direct API for MapReduce
- Partial asynchronous reduction
- Redundant mapping
- Affinity MapReduce for all major vendors
- Automatic discovery and custom topology management
- Pluggable fault-tolerance
- Checkpoints for long running jobs
- Zero deployment model with grid-aware class loading
- SPI-based integration and customization
- Distributed task sessions for connected grid tasks
- Etc.
- Etc.
GridGain provides all that and lots more. One of the reasons that GridGain gets started every 60 seconds somewhere around the globe – in just 8 month since the project release. Think about this stat…
On a side note, Hadoop is a great framework whose main innovation is HDFS and it should stick to it.
Nikita Ivanov.
GridGain – Grid Computing Made Simple
http://www.gridgain.org
owen.omalley replied ago:
Ok, it wasn't fair to compare costs, since the listed costs for GridGain's are for support. Commercial Hadoop support is also available. It should be noted that Hadoop's Apache license is much more business friendly than GridGain's LGPL, especially since the FSF's take on LGPL with Java code makes it viral for linking. Therefore, anyone linking with GridGain's libraries may be forced to open source their code. (http://www.gnu.org/licenses/lgpl-java.html)
The primary characteristic of map/reduce is to process and sort large distributed datasets reliably. A trivial test of Hadoop is sorting 1TB of data on 100 nodes, which takes roughly 3/4 of an hour. Hadoop runs production jobs on 2000 nodes lifting 100's of TB of data. Because of the architecture of GridGain, it can't scale to more than 1GB or so. It also doesn't provide a sort. It also severely limits scalability by having a single reducer. In most map/reduce applications, the reduce is considerably slower than the maps, therefore the application will be single threaded for the majority of its run.
Hadoop does scheduling when task slots become available and thus avoids the noise associated with GridGain's scheduling by trial and failure semantics.
Hadoop also does incremental reduction (combiners), except that it more efficient because it runs on the nodes where the map ran.
I will certainly say that GridGain's JavaDocs have a lot more fancy formatting than Hadoops.
Dmitriy Setrakyan replied ago:
Owen :)
1. I really don't know what you are smoking when you say "Therefore, anyone linking with GridGain's libraries may be forced to open source their code."
To those poor and misguided commercial JBoss users: Owen just made a break-through discovery in LGPL licensing, you must quickly open source all your code or face the consequences!
2. About your claim that GridGain can only scale to 1GB.
GridGain has an option not to cache results. Take a look at @GridTaskNoResultCache annotation for more info - http://www.gridgain.com/javadoc/org/gridgain/grid/GridTaskNoResultCache.html . I assume you simply didn't know.
3. And finally, GridGain is already used to process Terrabytes of data. Take a look at this presentation created by one of our users: http://videos.apnicommunity.com/Video,Item,3251103276.html
Owen, It is funny that you are trying to compare Hadoop to GridGain. Hadoop is a distributed file system, GridGain is a computation grid. If you need to store huge sets of data in some proprietary file system - use Hadoop. If you are a user of a normal Database or simply need a very powerful and easy-to-use compute grid - use GridGain.
Best,
Dmitriy Setrkayan
http://www.gridgain.com
owen.omalley replied ago:
I'm not sure what is meant by:
| Affinity MapReduce for all major vendors
but Hadoop runs on Windows, Macs, Linux, and Solaris, which seems like all of the operating systems that currently matter.
Dmitriy Setrakyan replied ago:
What Nikita meant is that GridGain supports "Computation to Data" affinity - basically collocating your computation with your data. We integrate with all major data grid vendors.
Take a look here for more info:
Wiki: http://www.gridgainsystems.com/wiki/display/GG15UG/GridAffinityLoadBalancingSpi
Example: http://www.gridgainsystems.com/wiki/display/GG15UG/Affinity+MapReduce+with+JBoss+Cache
Nikita Ivanov replied ago:
Owen,
Hadoop is great project (I am saying it without any sarcasm). No need to such over-reactions. As Denis pointed out in his blog despite the fact that we both use MapReduce term in description of our respective frameworks - we are rather very different in many ways. Not much reason to argue here...
I can dispute and refute *each* and *every* point you have made as we *specifically* designed GridGain to be generation ahead of the legacy designs like Hadoop, Sun Grid Engine, etc.. I will do that on my blog once I get a bit more time...
And, yes, our Javadoc is not only well formatted but it is probably one of the best you can find by a long mile. And the same quality of engineering you can find throughout the entire product from the design, coding, tests, and documentation.
Best,
Nikita Ivanov.
Nikita Ivanov replied ago:
Hey Owen,
Last comment: on licensing. Following your logic Jboss (LGPL), MySQL (GPL) and Spring App Server (GPLv3), for example, should have been pretty much dead on arrival :) as they are "not business friendly" license wise. 'Nuf said...
For your reference, GridGain is dual-licensed with sources licensed with LGPL and binaries with ASL 2.0 because many Apache project already linking to GridGain.
Next time: do your homework before posting...
Nikita Ivanov.
GridGain Systems.
acmurthy replied ago:
Hadoop isn't just a 'filesystem', HDFS and Map-Reduce are the main components of "Hadoop".
Oh, and http://developer.yahoo.com/blogs/hadoop/2008/02/yahoo-worlds-largest-production-hadoop.html ...
Voters For This Link (8)
Voters Against This Link (4)