NoSQL Zone is brought to you in partnership with:

Mark is a graph advocate and field engineer for Neo Technology, the company behind the Neo4j graph database. As a field engineer, Mark helps customers embrace graph data and Neo4j building sophisticated solutions to challenging data problems. When he's not with customers Mark is a developer on Neo4j and writes his experiences of being a graphista on a popular blog at http://markhneedham.com/blog. He tweets at @markhneedham. Mark is a DZone MVB and is not an employee of DZone and has posted 553 posts at DZone. You can read more from them at their website. View Full User Profile

Neo4j/Cypher: Returning a Row with Zero Count When No Relationship Exists

04.30.2013
| 1971 views |
  • submit to reddit

I’ve been trying to see if I can match some of the football stats that OptaJoe posts on twitter and one that I was looking at yesterday was around the number of red cards different teams have received.

1 – Sunderland have picked up their first PL red card of the season. The only team without one now are Man Utd. Angels.

To refresh this is the sub graph that we’ll need to look at to work it out:

Sent off

I started off with the following query which traverses out from each match, finds the players who were sent off in the match and then groups the sendings off by the team they were playing for:

START game = node:matches('match_id:*')
MATCH game<-[:sent_off_in]-player-[:played]->likeThis-[:in]->game, 
      likeThis-[:for]->team
RETURN team.name, COUNT(game) AS redCards
ORDER BY redCards
LIMIT 5

When we run this we get the following results:

+------------------------------+
| team.name         | redCards |
+------------------------------+
| "Sunderland"      | 1        |
| "West Ham United" | 1        |
| "Norwich City"    | 1        |
| "Reading"         | 1        |
| "Liverpool"       | 2        |
+------------------------------+
5 rows

The problem we have here is that it hasn’t returned Manchester United because they haven’t yet received any red cards and therefore none of their players match the ‘sent_off_in’ relationship.

I ran into something similar in a post I wrote about a month ago where I was working out which day of the week players scored on.

The first step towards getting Manchester United to return with a count of 0 is to make the ‘sent_off_in’ relationship optional.

However, that on its own that isn’t enough because it now returns a count of all the player performances for each team:

START game = node:matches('match_id:*')
MATCH game<-[?:sent_off_in]-player-[:played]->likeThis-[:in]->game, 
      likeThis-[:for]->team
RETURN team.name, COUNT(game) AS redCards
ORDER BY redCards ASC
LIMIT 5

+-----------------------------+
| team.name        | redCards |
+-----------------------------+
| "Chelsea"        | 448      |
| "Wigan Athletic" | 459      |
| "Fulham"         | 460      |
| "Liverpool"      | 466      |
| "Everton"        | 467      |
+-----------------------------+
5 rows

Instead what we need to do is collect up all the ‘sent_off_in’ relationships and sum them up.

We can use the COLLECT function to do that and the neat thing about COLLECT is that it doesn’t bother collecting the empty relationships so we end up with exactly what we need:

START game = node:matches('match_id:*')
MATCH game<-[r?:sent_off_in]-player-[:played]->likeThis-[:in]->game, 
      likeThis-[:for]->team
RETURN team.name, COLLECT(r) AS redCards
LIMIT 5

+-----------------------------------------------------------------------------------------------------+
| team.name          | redCards                                                                       |
+-----------------------------------------------------------------------------------------------------+
| "Wigan Athletic"   | [:sent_off_in[26443] {},:sent_off_in[37785] {}]                                |
| "Everton"          | [:sent_off_in[6795] {minute:61},:sent_off_in[21735] {},:sent_off_in[34594] {}] |
| "Newcastle United" | [:sent_off_in[434] {minute:75},:sent_off_in[32389] {},:sent_off_in[34915] {}]  |
| "Southampton"      | [:sent_off_in[49393] {minute:70},:sent_off_in[49392] {minute:82}]              |
| "West Ham United"  | [:sent_off_in[21734] {minute:67}]                                              |
+-----------------------------------------------------------------------------------------------------+
5 rows

We then just need to call the LENGTH function to work out how many red cards there are in each collection and then we’re done:

START game = node:matches('match_id:*')
MATCH game<-[r?:sent_off_in]-player-[:played]->likeThis-[:in]->game, 
      likeThis-[:for]->team
RETURN team.name, LENGTH(COLLECT(r)) AS redCards
ORDER BY redCards
LIMIT 5

+--------------------------------+
| team.name           | redCards |
+--------------------------------+
| "Manchester United" | 0        |
| "West Ham United"   | 1        |
| "Sunderland"        | 1        |
| "Norwich City"      | 1        |
| "Reading"           | 1        |
+--------------------------------+
5 rows




Published at DZone with permission of Mark Needham, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)