By dgoldvekht
via gridgain.blogspot.com
Published: Apr 18 2008 / 05:35
What does fail-over in distributed grid or cluster environment really mean? In a standard notion of it, users usually expect their data or logic to automatically fail-over to a new available grid node in case of a node crash. But is this really enough? What if, for example, a grid node is still alive, but it did not have the available resources to process your job. What if I/O on that node is to slow or database connection is not available? Also, a result of a computation could be application specific. If a computation throws an exception, depending on application logic it may or may not be worth while to retry the same computation on remote node.
Add your comment