NoSQL Zone is brought to you in partnership with:

Max De Marzi, is a seasoned web developer. He started building websites in 1996 and has worked with Ruby on Rails since 2006. The web forced Max to wear many hats and master a wide range of technologies. He can be a system admin, database developer, graphic designer, back-end engineer and data scientist in the course of one afternoon. Max is a graph database enthusiast. He built the Neography Ruby Gem, a rest api wrapper to the Neo4j Graph Database. He is addicted to learning new things, loves a challenge and finding pragmatic solutions. Max is very easy to work with, focuses under pressure and has the patience of a rock. Max is a DZone MVB and is not an employee of DZone and has posted 60 posts at DZone. You can read more from them at their website. View Full User Profile

Permission Resolution with Neo4j - Part 1

03.19.2013
| 2530 views |
  • submit to reddit

i_can_haz_permissions

People produce a lot of content. Messages, text files, spreadsheets, presentations, reports, financials, etc, the list goes on. Usually organizations want to have a repository of all this content centralized somewhere (just in case a laptop breaks, gets lost or stolen for example). This leads to some kind of grouping and permission structure. You don’t want employees seeing each other’s HR records, unless they work for HR, same for Payroll, or unreleased quarterly numbers, etc. As this data grows it no longer becomes easy to simply navigate and a search engine is required to make sense of it all.

But what if your search engine returns 1000 results for a query and the user doing the search is supposed to only have access to see 4 things? How do you handle this? Check the user permissions on each file realtime? Slow. Pre-calculate all document permissions for a user on login? Slow and what if new documents are created or permissions change between logins? Does the system scale at 1M documents, 10M documents, 100M documents?

In the Neo4j data modeling examples there is a solution using a graph.

ACL

Users have direct permissions to documents (or folders), or via groups. Folders can have many child documents, which can have many child documents and “turtles all the way down”.

Having them in a graph makes it easy to:

  • Find all files in the directory structure
  • What files are owned by whom?
  • Who has access to a File?
  • … and so on

So back to our question. What if the search engine returned 1000 records and we needed to see which of these the user has access to. How can we do this in the graph?

Not by starting from the User and traversing to all the possible documents the user has access to (since this could take a very long time in a 100M document graph). Instead we will start with the documents and see if they have a direct permission relationship to the user or the groups the user belongs to. We do this for all the documents, and for those which we don’t have a direct relationship, we move up to the parent folder and try it again. We’ll be smart and merge any documents who share the same parent to reduce the number of checks. Doing it this way, we will traverse thousands of relationships (instead of millions) depending on how deep the document structure goes.

I’m going to walk-through the code using the Neo4j Core API inside an unmanaged extension so it can be accessed via REST and I’ll show you how I wrote a couple of performance tests (one random and one static) as well as how I generated our test data. This project is open source and available on github.

Let’s dig in. We need to create a test for our unmanaged extension, but we can’t test without some sample data, so let’s start by creating an ImpermanentGraphDatabase in MyServiceTest.java:

private ImpermanentGraphDatabase db;
private MyService service;
private ObjectMapper objectMapper = new ObjectMapper();
private static final RelationshipType KNOWS = DynamicRelationshipType.withName("KNOWS");
private static final RelationshipType SECURITY = DynamicRelationshipType.withName("SECURITY");
private static final RelationshipType IS_MEMBER_OF = DynamicRelationshipType.withName("IS_MEMBER_OF");
private static final RelationshipType HAS_CHILD_CONTENT = DynamicRelationshipType.withName("HAS_CHILD_CONTENT");

Before our tests are run, we want to populate our graph, and then tear it down:

@Before
public void setUp() {
    db = new ImpermanentGraphDatabase();
    populateDb(db);
    service = new MyService();
}
 
@After
public void tearDown() throws Exception {
    db.shutdown();
}

We’ll create a few users, groups, documents, and then link them to each other:

private void populateDb(GraphDatabaseService db) {
    Transaction tx = db.beginTx();
    try
    {
        Node personA = createPerson(db, "A");
        Node personB = createPerson(db, "B");
        Node personC = createPerson(db, "C");
        Node personD = createPerson(db, "D");
        Node doc1 = createDocument(db, "DOC1");
        Node doc2 = createDocument(db, "DOC2");
        Node doc3 = createDocument(db, "DOC3");
        Node doc4 = createDocument(db, "DOC4");
        Node doc5 = createDocument(db, "DOC5");
        Node doc6 = createDocument(db, "DOC6");
        Node doc7 = createDocument(db, "DOC7");
        Node g1 = createGroup(db, "G1");
        Node g2 = createGroup(db, "G2");
 
        personA.createRelationshipTo(personB, KNOWS);
        personB.createRelationshipTo(personC, KNOWS);
        personC.createRelationshipTo(personD, KNOWS);
 
        personA.createRelationshipTo(g1, IS_MEMBER_OF);
        personB.createRelationshipTo(g2, IS_MEMBER_OF);
 
        doc1.createRelationshipTo(doc4, HAS_CHILD_CONTENT);
        doc2.createRelationshipTo(doc5, HAS_CHILD_CONTENT);
        doc5.createRelationshipTo(doc7, HAS_CHILD_CONTENT);
 
        Relationship secA1 = createPermission(personA, doc1, "R");
        Relationship secA3 = createPermission(personA, doc3, "RW");
        Relationship secB4 = createPermission(personB, doc4, "R");
        Relationship secG2 = createPermission(g1, doc2, "R");
        Relationship secG6 = createPermission(g2, doc6, "R");
 
        tx.success();
    }
    finally
    {
        tx.finish();
    }
}

The CreatePerson, CreateDocument and CreateGroup methods look like:

private Node createPerson(GraphDatabaseService db, String uid) {
        Index<Node> people = db.index().forNodes("Users");
        Node node = db.createNode();
        node.setProperty("unique_id", uid);
        people.add(node, "unique_id", uid);
        return node;
    }

The CreatePermission method is a little bit different since we’re creating a relationship.

private Relationship createPermission(Node person, Node doc, String permission) {
    Relationship sec = person.createRelationshipTo(doc, SECURITY);
    sec.setProperty("flags", permission);
    return sec;
}

Finally we’ll write a couple of tests that once we have a “permissions” method should pass:

@Test
public void shouldRespondToPermissions() throws BadInputException, IOException {
    String ids = "A,DOC1 DOC2 DOC3 DOC4 DOC5 DOC6 DOC7";
    Response response =  service.permissions(ids, db);
    List list = objectMapper.readValue((String) response.getEntity(), List.class);
    assertEquals(new HashSet<String>(Arrays.asList("DOC1", "DOC2", "DOC3", "DOC4", "DOC5", "DOC7")), new HashSet<String>(list));
}
 
@Test
public void shouldRespondToPermissions2() throws BadInputException, IOException {
    String ids = "B,DOC1 DOC2 DOC3 DOC4 DOC5 DOC6 DOC7";
    Response response =  service.permissions(ids, db);
    List list = objectMapper.readValue((String) response.getEntity(), List.class);
    assertEquals(new HashSet<String>(Arrays.asList("DOC4", "DOC6")), new HashSet<String>(list));
}

From here we move on to the actual graph algorithm in MyService.java:

First we define our Relationship Types:

private ObjectMapper objectMapper = new ObjectMapper();
private static enum RelTypes implements RelationshipType
{
    IS_MEMBER_OF,
    SECURITY,
    HAS_CHILD_CONTENT
}

Our permissions method takes a string parameter that contains a user id a comma, and a space separated list of document ids. We take the input and initialize some variables that will hold our data as we traverse the graph.

@POST
@Path("/permissions")
public Response permissions(String body, @Context GraphDatabaseService db) throws IOException {
    String[] splits = body.split(",");
    PermissionRequest ids = new PermissionRequest(splits[0], splits[1]);
    Set<String> documents = new HashSet<String>();
    Set<Node> documentNodes = new HashSet<Node>();
    List<Node> groupNodes = new ArrayList<Node>();
    Set<Node> parentNodes = new HashSet<Node>();
    HashMap<Node, ArrayList<Node>> foldersAndDocuments = new HashMap<Node, ArrayList<Node>>();

We need to find the user, and all the documents passed in as search hits from our search engine:

IndexHits<Node> uid = db.index().forNodes("Users").get("unique_id", ids.userAccountUid);
IndexHits<Node> docids = db.index().forNodes("Documents").query("unique_id:(" + ids.documentUids + ")");
try
{
    for ( Node node : docids )
    {
        documentNodes.add(node);
    }
}
finally
{
    docids.close();
}

We will also need all the groups the user belongs:

if ( uid.size() > 0 && documentNodes.size() > 0)
    {
        Node user = uid.getSingle();
        for ( Relationship relationship : user.getRelationships(
                RelTypes.IS_MEMBER_OF, Direction.OUTGOING ) )
        {
            groupNodes.add(relationship.getEndNode());
        }

We are going to iterate through these document nodes. In the loop, we check to see if the user node has a direct relationship to the document and if so add it (and any possible children documents to a “document” list which we will return at the end.

Iterator listIterator ;
do {
    listIterator = documentNodes.iterator();
    Node document = (Node) listIterator.next();
    listIterator.remove();
 
    //Check against user
    Node found = getAllowed(document, user);
    if (found != null) {
        if (foldersAndDocuments.get(found) != null) {
            for(Node docs : foldersAndDocuments.get(found)) {
                documents.add(docs.getProperty("unique_id").toString());
            }
        } else {
            documents.add(found.getProperty("unique_id").toString());
        }
    }

If the user doesn’t have a direct connection to the document, maybe one of the groups they belong to does:

//Check against user Groups
for (Node group : groupNodes){
    found = getAllowed(document, group);
    if (found != null) {
        if (foldersAndDocuments.get(found) != null) {
            for(Node docs : foldersAndDocuments.get(found)) {
                documents.add(docs.getProperty("unique_id").toString());
            }
        } else {
            documents.add(found.getProperty("unique_id").toString());
        }
    }
}

If we had no luck setup to try the parent node of this document:

// Did not find a security relationship, go up the folder chain
Relationship parentRelationship = document.getSingleRelationship(RelTypes.HAS_CHILD_CONTENT,Direction.INCOMING);
if (parentRelationship != null){
    Node parent = parentRelationship.getStartNode();
    ArrayList<Node> myDocs = foldersAndDocuments.get(document);
    if(myDocs == null) myDocs = new ArrayList<Node>();
 
    ArrayList<Node> existingDocs = foldersAndDocuments.get(parent);
    if(existingDocs == null) existingDocs = new ArrayList<Node>();
 
    for (Node myDoc:myDocs) {
        existingDocs.add(myDoc);
    }
    if (myDocs.isEmpty()) existingDocs.add(document);
    foldersAndDocuments.put(parent, existingDocs);
    parentNodes.add(parent);
}

If we get to the end of our documents, we switch to the parents of our documents and iterate through those until there are no more parents (reached the root folder of our document tree).

    if(listIterator.hasNext() == false){
        documentNodes.clear();
 
        for( Node parentNode : parentNodes){
            documentNodes.add(parentNode);
        }
 
        parentNodes.clear();
        listIterator = documentNodes.iterator();
    }
 
} while (listIterator.hasNext());

Unless we ran into an error, we return the document list:

    } else {documents.add("Error: User or Documents not found");}
 
    uid.close();
 
    return Response.ok().entity(objectMapper.writeValueAsString(documents)).build();
}

The getAllowed method just checks to see if a security relationship with the property flag containing “R” is found:

private Node getAllowed(Node from, Node to){
        ConnectedResult connectedResult = isConnected(from, to, Direction.INCOMING, RelTypes.SECURITY);
        if (connectedResult.isConnected){
            if (connectedResult.connectedRelationship.getProperty("flags").toString().contains("R")) {
                return connectedResult.connectedRelationship.getEndNode();
            }
        }
        return null;
    }

The isConnected method lets us know if two nodes are related to each other by some directed relationship:

private ConnectedResult isConnected(Node from, Node to, Direction dir, RelationshipType type) {
        for (Relationship r : from.getRelationships(dir, type)) {
            if (r.getOtherNode(from).equals(to)) return new ConnectedResult(true, r);
        }
        return new ConnectedResult(false, null);
    }

Now when we run our tests:

passing tests

But how fast is it… and does it scale? To find out we’ll need to create a larger graph and a set of performance tests.
Stay tuned for part two.



Published at DZone with permission of Max De Marzi, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)