Boosting Similarity Search With Real-time Stream Processing

Session details

Status:

Declined

Speaker(s):

Fawaz Ghali (Hazelcast)

Experience level:

Intermediate

Session Track:

Open Source Best Practices

Session Type:

Standard

The goal of similarity search and vector databases is to find similar results to the search query for unstructured data, such as text, images, and videos. The unstructured data first is vectorized, and stored in a vector format. There are publicly available tools to create vectors from unstructured data; similarly, there are vector databases to store and perform similarity searches. This is important because of the rising popularity of Large Language Models (LLMs) and their combination with vector databases. Here, we present a hybrid approach by taking the strengths of vector databases and boosting them with traditional search and filtering techniques based on real-time stream processing. Vector databases are good for building high-performance vector search applications. On the other hand, stream processing can be used for real-time fast data storage for structured data (filters, tags, and contextual data). In this work, we're adding context and memory to vector databases to ingest, enrich, predict, and act on your data in a simplified but efficient approach. In this talk, we’ll focus on how Real-time compute APIs help leverage the processing capabilities of a distributed cluster, so you aren’t leaving large potential performance gains on the table. The combination of Real-time storage and computing provides a unique synergy that enables applications to address real-time use cases at any scale.

Objective of the presentation:

How to boost similarity search with stream processing.

Attendee pre-requisites - If none, enter "N/A":