DISTANCE

The SQL DISTANCE function in Polypheny is a versatile tool primarily used for comparing two arrays based on a specific metric. This function plays a pivotal role in k-nearest-neighbour (kNN) search, allowing for the identification of entries based on their distance to a specific vector.

Function Syntax

The general syntax for the DISTANCE function is as follows:

DISTANCE(<target array>, <array to compare with>, <metric> [, <weights>])
  • <target array>: The array from your table you want to compare with another array.
  • <array to compare with>: The array you are comparing the target array with.
  • <metric>: The metric used to calculate the distance. This can be one of the following: ‘L1’, ‘L2’, ‘L2 squared’, ‘Cosine’, ‘ChiSquared’.
  • <weights> (optional): An array of weights for weighted distance calculations.

Examples

Consider you have a table ProductVectors with a column ProductFeatures storing arrays representing product features. You can use the DISTANCE function to find products similar to a specific product feature vector:

SELECT id, DISTANCE(ProductFeatures, ARRAY[...], 'L2') as dist 
FROM ProductVectors 
ORDER BY dist ASC 
LIMIT 5;

This query will return the IDs of the five products whose feature vectors have the smallest L2 distance to the given array.

To use the function with weights, you could write:

SELECT id, DISTANCE(ProductFeatures, ARRAY[...], 'L2', ARRAY[...]) as dist 
FROM ProductVectors 
ORDER BY dist ASC 
LIMIT 5;

Utility

The DISTANCE function is particularly useful for scenarios where you need to identify similarities or differences between data points in multi-dimensional space. These include:

  • Recommendation Systems: You can use the DISTANCE function to recommend similar products or services to users based on their past behaviors or preferences.
  • Clustering: The function can be useful in clustering analysis where you want to group similar data points together.
  • Anomaly Detection: The DISTANCE function can help identify outliers in your dataset, which deviate significantly from the other data points.

Remember that the right choice of distance metric depends on the nature of your data and the specific requirements of your use case.

© Polypheny GmbH. All Rights Reserved.