Link Details

Link 559109 thumbnail
User 427173 avatar

By gregrluck
via gregluck.com
Published: Feb 15 2011 / 14:25

Today Ehcache introduces EQL, a new query language. It looks like this: Results results = cache.createQuery().includeKeys().addCriteria(age.eq(32).and(gender.eq("male"))).execute(); In short, it lets you further offload the database. With Ehcache now supporting up to 2TB and linear scale-out you can do some of the NoSQL use cases from Ehcache.
  • 45
  • 2
  • 8213
  • 53

Comments

Add your comment
User 241464 avatar

Dmitriy Setrakyan replied ago:

-7 votes [show comment] Vote down Vote up Reply
User 207083 avatar

sharrissf replied ago:

-1 votes Vote down Vote up Reply

Ehcache Search has orderby and someone has already built an sql interface on top of ehcache search for those who find that kind of stuff useful. Voted the comment down because it is clearly an attempt by gridgain to gain attention on an ehcache blog.

User 241464 avatar

Dmitriy Setrakyan replied ago:

0 votes Vote down Vote up Reply

All I wanted is for readers to see a full picture. If you find something untrue in my comment, please let me know.

As usual, Terracotta folks view any criticism as some kind of conspiracy attempt to bring them down instead of providing a good technical answer to a very directly made technical point. The question should be not "if you need SQL", but "Why EhCache provided such an awkward search syntax to begin with?"

User 207083 avatar

sharrissf replied ago:

1 votes Vote down Vote up Reply

I think I addressed the inaccurate parts above but here again is more detail:

I was all set to write a blog on DSL/Fluent interface vs programming with Strings but I found a really good one right here:
http://www.infoq.com/articles/internal-dsls-java
It covers in excellent details the risks and downsides to string based programming and even uses SQL as an example.

1) Ehcache does support orderby contrary to your comment
2) We do have an SQL interface built on top of our API but we don't push it because we think for programming DSL's like the one Ehcache uses or the one Hibernate uses are better (For reasons why read the above linked blog).
3) It is true that, much like NoSql solutions, we don't support all SQL Db constructs. i.e. joins being the most common one. So far we have found that this is a trade-off our hundreds of thousands of users are fine with but we will be constantly monitoring for feedback.

Now that I'm reading this comment maybe I will write a blog on this interesting topic. Thanks for prodding me along.

User 241464 avatar

Dmitriy Setrakyan replied ago:

0 votes Vote down Vote up Reply

If there is SQL available for EhCache, you hide it quite well... not a single mention of it in the blog and I could not find it in Google.

Now, I can see your point about DSL, but I strongly disagree. The DSL provided by EhCache is very awkward to use and requires XML pre-configuration of standard JavaBeans, which is even more awkward.

Here is my take on it:
1. There is already a standard DSL for searching and querying - it is called SQL. Also, you don't have to write it in Java code - why not just abstract it out into a Spring XML file and then retrieve it from the code?
2. There is a point to make about Java-based DSL being refactor-safe, but in case of EhCache this is already not the case due to mandatory XML declaration of all queried Java beans.
3. SQL is way more powerful syntactically and is a lot more natural to use for querying. As you pointed out, you can do "joins", "nested queries", "function calls", etc...

To make my point stronger, here are some queries you can do with SQL in GridGain:

cache.createQuery(SQL, Person.class,  "from Person, Company where Person.companyId = Company.id  and lower(Company.name) = 'gridgain'"); 

cache.createQuery(SQL, Person.class, "from Person where age = 32 companyId in (select distinct companyId from Company where upper(name) = 'GRIDGAIN')");


--Best

User 241464 avatar

Dmitriy Setrakyan replied ago:

0 votes Vote down Vote up Reply

Ignore this post... Dzone error.

User 398943 avatar

AlexanderKl replied ago:

0 votes Vote down Vote up Reply

"The standalone Ehcache implementation does not use indexes. ... Search operations perform in O(n) time."
So, I suppose, using search feature is worse in sense of performance than manual iteration among all elements in the cache.
Are there any plans to implement indexing for standalone caches?

User 207083 avatar

sharrissf replied ago:

0 votes Vote down Vote up Reply

I wouldn't say that it is slower than iteration. Probably about the same though we can add tricks like fork/join and make sure we do things in the minimum number of accesses to reduce what the dev has to think about to reach performance. We will likely add an indexed option at some point. That said, even in DB's index's really only become useful vs. their cost after tables reach a certain size.
For small and medium size caches they just add memory overhead and cost on put.

User 240010 avatar

Nikita Ivanov replied ago:

0 votes Vote down Vote up Reply

Seriously? Are you telling us that a full scan of even a microscopic table of 100 elements is as fast as primary key lookup? Without serious indexing any querying is downright pointless (**especially so** in distributed systems).

User 207083 avatar

sharrissf replied ago:

0 votes Vote down Vote up Reply

Maybe you could run some tests to validate your claims.

Also, distributed is fully indexed.

User 240010 avatar

Nikita Ivanov replied ago:

0 votes Vote down Vote up Reply

You lost me there. How a data can be indexed when queries in a distributed fashion and not when queries locally? Furthermore, indexing is tightly coupled to query path optimization of a particular SQL engine - I just don't see it in EhCache (although I may not be seeing something)...

User 207083 avatar

sharrissf replied ago:

0 votes Vote down Vote up Reply

Also, I said it wasn't worth the overhead of keeping an index and that what we have built is as fast as iterating. I NOT say that iterating was a fast as a lookup

User 240010 avatar

Nikita Ivanov replied ago:

0 votes Vote down Vote up Reply

Steven,
With all due respect - it sounds like a lame justification for not having that particular feature. Which is fine, I'm not going to rub it in. Well, yes, if you can limit your users to cache of only dozen or so elements in it - you can skip indexing as a clear unnecessary overhead :)

User 427173 avatar

gregrluck replied ago:

1 votes Vote down Vote up Reply

The decision not to use SQL as the search language was mine. Why? Because Java is OO and the cache is OO. Adding a SQL layer in the middle gives you a double impedance mismatch. I worked on this API iteratively with Tim Eck for a few months. It was beta tested and kicked around. I have shown the API to a lot of people and they seem to like it.

In terms of the questions around use and usefulness, let's see what people do with it. On the index design issue, for the typical sizes of cache standalone it makes sense not to use one. A 100,000 entry cache will do a search standalone in 50ms. Once your data gets large and you move to a distirbuted cache, the search gets done on the server using indexes, and then if you scale out even more, concurrently on shards.

User 241464 avatar

Dmitriy Setrakyan replied ago:

0 votes Vote down Vote up Reply

Hm... We have been through this OO impedance argument with JDO fiasco and polymorphic queries. I don't think this argument flies anymore. Whether you use SQL or not, you still treat a Java Bean as a single queriable record. However, you will never be able to approach the richness or efficiency of SQL with any custom Java-based full-scan filtering... and if you ever do, then why bother... just use SQL :)

User 203384 avatar

sbtourist replied ago:

1 votes Vote down Vote up Reply

I don't really want to defend anyone here, but that was a very bad marketing pitch by the GridGain guys.

Obviously, no product is perfect and I (as anyone with just a little experience in distributed systems) could raise several doubts about GridGain SQL implementation, such as how you guys deal with network partitions when performing joins, how you provide fail-recovery with no durable storage and blah blah blah.

The point is, I personally could do that, because I'm not a competitor.
But you guys are: if you want to ask technical questions to competitors, do that politely, possibly as a comment to technical posts and without mentioning your own product, otherwise it only sounds as a marketing pitch, which is quite unprofessional.

Cheers,

Sergio B.

User 240010 avatar

Nikita Ivanov replied ago:

0 votes Vote down Vote up Reply

Sergio,
If you want marketing pitch - leave your phone number and I'll call :)

But that was a technical discussion between two (somewhat rarely) competing vendors about one particular feature. One vendor has this feature, another doesn't and justifies it by claiming "no one needs it".

Btw, other vendors that "mistakenly" implemented it are Infinispan and GigaSpaces. What a bunch of idiots they are... :) What's funny is that I can guarantee that the next version of EHCache will have that feature - and we'll see another set of justifications.

Seriously, can you guys take any non-PC conversation at all?

User 441682 avatar

edyavno replied ago:

1 votes Vote down Vote up Reply

This is a few days old, but I wanted to comment anyway.

I personally prefer this "tighter" fluent API approach to an incomplete SQL implementation for object querying (see OQL fiasco) - and yes, I'm aware that GridGain is using H2 underneath. Calling it "border-line unusable" while plugging your product quite simply reeks of arrogance.

And obviously there are quite a few people who disagree with ".. you will never be able to approach the richness or efficiency of SQL .." as attested by the numerous alternative data stores (Mongo, Redis, etc) almost none of which use SQL, but rather some form of DSL instead. In Nikita's words "What a bunch of idiots they are", right?

there's got to be a better way of making an argument about SQL vs custom DSL approach ...

- Ed Y.

User 241464 avatar

Dmitriy Setrakyan replied ago:

0 votes Vote down Vote up Reply

Now it's more than several days old, but I will still reply briefly. First of all, GridGain supports complete SQL, so let's get that out of the way.

Secondly, what EhCache provided is not DSL, but a collection of Java-based filters. If you want to see what a real DSL should look like, you should give Scala a try. Java API-based approach is way too rigid for my taste, and that's why any SQL-based or, in MongoDB case, custom DSL-based approach will always be richer and more complete.

--Best

User 441682 avatar

edyavno replied ago:

0 votes Vote down Vote up Reply

Dmitriy, you seem to be contridicting yourself - 1st saying that this is not a DSL, but then saying it's too rigid for your taste.
The fact is that it is a Java based DSL that uses fluent API approach. Within pure java it is a perfectly valid approach, and there are a number of other projects that use this type of DSL and are quite successful - e.g. JMock.

I'm familiar with what "a real DSL" should or may look like in Scala, or my favorite Groovy, however, in many environments these are still not an option.

In general, your response seem to continue the pattern that it's more inflammatory than constructive. If you can find a way to avoid it, this entire thread would've probably had a quite different and a more positive tone of discussion.

Add your comment


Html tags not supported. Reply is editable for 5 minutes. Use [code lang="java|ruby|sql|css|xml"][/code] to post code snippets.

Java Performance Optimization
Written by: Pierre-Hugues Charbonneau
Featured Refcardz: Top Refcardz:
  1. Design Patterns
  2. OO JS
  3. Cont. Delivery
  4. Java EE7
  5. HTML5 Mobile
  1. Node.js
  2. Debugging JavaScript
  3. OO JS
  4. JSON
  5. Ajax