Blog

Supercharging Neo4j with Vecto

Mansour Alawi
23 Dec 2022

At Xpress AI, we have been building the flexible, low code, and open-source Xircuits Library for a while now. It has a growing community and gets a lot of positive feedback. In this post we’ll show how to use it together with Vecto and Neo4j.

We recently introduced a hardware accelerated and fully dynamic vector search service named Vecto which allows anyone to benefit from the latest and greatest AI models (like CLIP and S-BERT) for similarity search. It just takes a few lines of code to ingest your data into a vector space and perform dynamic similarity search queries against your data.

In graph databases, relationships make it possible to build associations and form theories around your data. The benefit is proportional to the amount and diversity of the connections between the nodes. It is possible to use different machine learning approaches to enhance graph connectivity such as Random Forest and Logistic Regression. However, these models require data and must be trained from scratch.

In this blog post we introduce a new approach to building a graph by predicting relationships in Neo4j by utilizing the non-structured data available within each nodes’ properties. Using state of the art transformer models we predict new relationships between these nodes in a way that is generalizable and scales well.  All without needing any training data.

Preparation

We will be using the publicly available Amazon E-commerce Dataset in this example. This dataset consists of more than 700k listed products. We have pre-processed the dataset into a CSV file to make it easy to insert into Neo4j.

Fig 1: Pre-processed Amazon E-commerce dataset

Adding the Data to Neo4j using Xircuits

With the help of the new Neo4j component library in Xircuits, we can add these products to a Neo4j database. We will be using the product_type column as a node label and the image_url, country, full_description, and item_name columns for the node properties.

To use the Neo4jSession component to connect to Neo4j. You can link your Neo4j URL, database name and password to their respective in-ports. To read the dataset CSV file simply add a string node with the file name and link it to the correct in-port on the ReadCSV component. Finally, use the CreateNode component to add the data as graph nodes to the Neo4j database.

Fig 2: creating neo4j session, reading CSV file and creating Neo4j nodes in Xircuits

We chose to add 4,000 products to our Neo4jdatabase. You can see that the products are only clustered based on the node label product_type without any relationship edges between the individual nodes.

Fig 3: Neo4jGraph of the Products

This data has no links we can use to add edges to the graph.  But we can use Vecto to predict relationships between the nodes so we can start to do market analysis or product recommendations and take advantage of Neo4j right away.

Ingesting the Neo4jGraph into Vecto

The graph nodes contain a full_description property that has details about each product. We can use those product details in Vecto to create a dense vector or embedding of each node based on the Sentence-BERT transformer model.  You can then store those embeddings with a reference to the node in a vector space. 

Each node in the graph has a unique ID assigned to it by Neo4j, we can use this as the reference for ingesting the data into Vecto. You can use the ExportGraphAsCSV component from the Neo4j components library to export the Neo4j graph as a CSV file called products_nodes.csv

Fig 4: Export Neo4jGraph as CSV File

As you can see in exported CSV file below, each Neo4jnode is represented by a single row with an ”_id” column that refers to the node’s unique ID in the Neo4j graph. 

Now to create a vector space in Vecto. Firstly, enter your login credentials in the Vecto login page. (If you don’t have yours yet request a demo here.)

After Logging in, Click on the Vector Spaces tab to create a new vector space and give it a name like vecto_neo4j. Choose S-BERT as the vector space’s model and click Create Vector Space. Note down your vector space ID for the authentication step later. Next, Create an authentication token for your new vector space, from the Tokens tab.  Click  Create New Token (call it something like vecto_neo4j_token) and select the vector space you created above as the vector space.  Finally click Create Token. Save this vector space token as well. 

Fig 5: Vecto Login and Create a Vector Space.

In Xircuits, find the Vecto components library and use the VectoLogin component to access the vector space. Connect the Vecto API endpoint to the cloud Vecto URL ”https://api.vecto.ai/api/v0” and the vector space ID and token you copied above to the appropriate component ports. 

Using the Vecto GetDataFromCSV component set it to read the products_nodes.csv file that was exported earlier from Neo4j. Select the full_descption column as the data to ingest and the _id column as the metadata.   

Now you are all set to start ingesting the graph nodes’ details into Vecto. Drag a IngestData component into the canvas and set the batch_size to 64, are_data_images to False and delete_ingested to True. delete_ingested is used to clear any old embeddings in the vector space. The data and metadata in-ports will be directly linked from the GetDataFromCSV component out-ports if drag out from the next triangle. (A very handy shortcut) 

Fig 6: Vecto Ingest Xircuits’ Components  

Predicting New Relationships in Neo4j using Vecto

By calculating the dot-product between a single node embedding against all the other nodes’ embeddings, we can find the most similar nodes to each particular node. The similarity score we get from the dot product can then be used to find the most similar nodes and you can create an edge for every node above a certain threshold. 

This might sound a bit complex but don’t worry. This kind of logic is all handled with a single Xircuits component, named VectoPredictNeo4jRelationships. All you need to do is to connect the data and metadata in-ports from the previously used GetDataFromCSV component out-ports. Set the similarity threshold value anywhere between 0 and 1.0. (We used 0.75), the number of similar nodes to consider (topk), and the relationship name in Neo4j database (relationship_type).  

Fig 7: Vecto Predict Neo4j relationships Xircuits’ Component

Now the Neo4jgraph has the Vecto predicted relationships between its nodes!  Super easy!

Fig 8: Neo4jGraph of the Products after Adding the Vecto Predicted Relationships

Conclusion

As you can see above, predicting relationships between Neo4j nodes with Vecto is one of the many ways you can apply vector search in your applications. Combining Xircuits, Neo4j and Vecto provides a new fast and interactive method to achieve any data processing task.  

We are happy to have you around and want to encourage everyone to have a look at our Vecto website, the Xircuits website and come hang out with us at the Xpress AI Discord server to tell us about what you are working on.