Similarity Search for Adaptive Ellipsoid Queries Using Spatial Transformation. Yasushi Sakurai (NTT

86K downloads 86K Views 544KB Size

Spatial Variation in Search Engine Queries - PowerPoint PPT Presentation

Spatial Variation in Search Engine Queries. Lars Backstrom, Jon Kleinberg, Ravi Kumar and Jasmine Novak.

Seeds for Similarity Search - PowerPoint PPT Presentation

Seeds for Similarity Search. Presentation by: Anastasia Fedynak. Homology Search. Homology search

Similarity transformation - PowerPoint PPT Presentation

Similarity transformation. same system as(#). Controllability:. Example:. Controller Canonical

Access Structures for Angular Similarity Queries - PowerPoint PPT Presentation

Access Structures for Angular Similarity Queries. Tan Apaydin and Hakan Ferhatosmanoglu IEEE TRANSACTIONS

Continuously Adaptive Continuous Queries - PowerPoint PPT Presentation

Continuously Adaptive Continuous Queries. Sam Madden Mehul Shah DB Lunch, 11/9/01. Basic Architecture.

The Spatial Skyline Queries - PowerPoint PPT Presentation

The Spatial Skyline Queries. Mehdi Sharifzadeh and Cyrus Shahabi University of Southern California

Fast Similarity Search for Learned Metrics - PowerPoint PPT Presentation

Fast Similarity Search for Learned Metrics. Prateek Jain, Brian Kulis, and Kristen Grauman Department

A Metric Cache for Similarity Search - PowerPoint PPT Presentation

A Metric Cache for Similarity Search. fabrizio falchi claudio lucchese salvatore orlando fausto rabitti

Similarity Search for Web Services - PowerPoint PPT Presentation

Similarity Search for Web Services. Xin (Luna) Dong , Alon Halevy, Jayant Madhavan, Ema Nemes, Jun

Database Similarity Search - PowerPoint PPT Presentation

Database Similarity Search. Why do we care to align sequences?. Sequences that are similar probably

Yasushi Sakurai(NTT Cyber Space Laboratories)

Masatoshi Yoshikawa(Nara Institute of Science and Technology)

Ryoji Kataoka(NTT Cyber Space Laboratories)

Shunsuke Uemura(Nara Institute of Science and Technology)

Outline- Introduction
- STT (spatial transformation technique)
- Definition of spatial transformation
- Spatial transformation of rectangles
- Search algorithm
- MSTT (multiple STT)
- Index structure construction
- Query processing
- Dissimilarity of matrices
- Performance test
- Conclusion

- Ellipsoid query
- Search processing is performed by using quadratic form distance functions
- Distance of pand q for a query matrix M :
- represents correlations between dimensions

quadratic form

Ellipsoids

(Not necessarily aligned

to the coordinate axis)

Euclidean

circles for isosurfaces

weighted Euclidean

iso-oriented ellipsoids

Dim. 1 2 3 d

p

Euclidean distance

q

color histograms

p

Quadratic form distance

q

color histograms

Introduction- An application of a quadratic form distance function
- represent the similarity between colors i and j

- Spatial indices
- e.g. R-tree family (R*-tree, X-tree, SR-tree, A-tree)
- Based on the Euclidean distance function

Cannot be applied to ellipsoid queries

- Efficient search methods for user-adaptive ellipsoid queries
- Query matrix Mis variable

q

Related Work : Seidl and Kriegel, VLDB97- Search method based on the steepest descent method
- Works on spatial indices of R-tree family
- Calculates the exact distance of a query point and an MBR in an index structure
- …but requires high CPU cost which exceeds disk access cost

R1

p

Moves p’ toward p

iteratively

M

p’

CPU timeO(w d2)

w…number of iterations

d…dimensionality

M

M

q

q

Related Work : Ankerst et al., VLDB98- Technique that uses the MBB and MBS distance functions to reduce CPU time
- MBB and MBS distance functions

MBB(M)

MBS(M)

Related Work : Ankerst et al., VLDB98- Approximationtechnique by using the MBB and MBS distance functions
- approximation distance ： uses either MBB or MBS distance for better approximation quality
- Calculates the exact distances only if data objects or MBRs cannot be filtered by their approximation distances
- Saves CPU time by reducing the number of exact distance calculations
- …but cannot reduce the number of exact distance calculations if its approximation quality is low

- STT (Spatial Transformation Technique)
- Ellipsoid queries incura high CPU cost
- The efficiency depends on approximation quality
- STT efficiently processes ellipsoid queries because of high approximation quality
- MSTT (Multiple Spatial Transformation Technique)
- Does not use only the Euclidean distance function to make index structures
- Ellipsoid queries give various distance functions
- In MSTT, various index structures are created; the search algorithm utilizes a structure well suited to a query matrix

- Introduction
- STT (spatial transformation technique)
- Definition of spatial transformation
- Spatial transformation of rectangles
- Search algorithm
- MSTT (multiple STT)
- Index structure construction
- Query processing
- Dissimilarity of matrices
- Performance test
- Conclusion

P’

Spatial Transformation Technique (STT)- High approximation quality
- STT consumes less CPU time
- Spatial transformation
- MBRs in a quadratic form distance space are transformed into rectangles in the Euclidean distance space

S

S’

R

P

q (2, 2)

O

Spatial Transformation- Definition of spatial transformation
- p : a point in the quadratic form distance space S
- p’: a point in the Euclidean distance space S’
- The distance between q and p in S is equal to the distance betweenp’and O in S’
- Spatial transformation of p intop’

S

S’

p (4, 2)

q (2, 2)

p’ (-2, 1)

O

Spatial Transformation- Definition of spatial transformation
- dM2(p, q) : the distance of p and q in S
- EM: the eigenvector of M, LM: the eigenvalues of M
- Spatial transformation of p intop’

P’

Approximation Rectangles1. P in Sis transformed into P’ in S’

The calculation of distance between the originand polygons in high-dimensional spaces incurs a high CPU cost

2. P’is approximated by R

3. d2(R, O) is used instead of d2M(P, q)

low CPU cost

S

S’

pd

pb

R

pb’

rb

pc’

P

q (2, 2)

pd’

pa

pc

ra

pa’

O

Approximation Rectangles1. Calculates

pa : lower endpoint of the major diagonal of P

2. Creates the two matrices from the components aij of AM

- Calculates the approximation rectangleR of P’

li : the edge length of P for the i-thdimension

4.Rcan be used for search since Rtotally contains P’, that is

Search Algorithm1. Calculates the transformation matrix of M

2. Searches for similarity objects by using an index

[ Data nodes ]

- Calculates dMBB-MBS(M)(p, q)

S

q

p

Search Algorithm1. Calculates the transformation matrix of M

2. Searches for similarity objects by using an index

[ Data nodes ]

- Calculates dMBB-MBS(M)(p, q)

S

q

p

Search Algorithm1. Calculates the transformation matrix of M

2. Searches for similarity objects by using an index

[ Data nodes ]

- Calculates dMBB-MBS(M)(p, q)
- Calculates dM(P, q) if dMBB-MBS(M)(p, q) d(M)(k-NN, q)

S

q

p

Search Algorithm1. Calculates the transformation matrix of M

2. Searches for similarity objects by using an index

[ Directory nodes ]

- Calculates dMBB-MBS(M)(P, q)

S

q

P

Search Algorithm1. Calculates the transformation matrix of M

2. Searches for similarity objects by using an index

[ Directory nodes ]

- Calculates dMBB-MBS(M)(P, q)

S

q

P

P’

Search Algorithm1. Calculates the transformation matrix of M

2. Searches for similarity objects by using an index

[ Directory nodes ]

- Calculates dMBB-MBS(M)(P, q)
- Calculates d(R, O) if dMBB-MBS(M)(P, q)d(M)(k-NN, q)

S’

R

O

Search Algorithm1. Calculates the transformation matrix of M

2. Searches for similarity objects by using an index

[ Directory nodes ]

- Calculates dMBB-MBS(M)(P, q)
- Calculates d(R, O) if dMBB-MBS(M)(P, q)d(M)(k-NN, q)
- Calculates dM(P, q) if d(R, O)d(M)(k-NN, q)

S

q

P

Outline- Introduction
- STT (spatial transformation technique)
- Definition of spatial transformation
- Spatial transformation of rectangles
- Search algorithm
- MSTT (multiple STT)
- Index structure construction
- Query processing
- Dissimilarity of matrices
- Performance test
- Conclusion

- Node access problem
- If a query matrix is NOT similar to the unit matrix, it causes a large number of node accesses
- Index structures are constructed by the Euclidean distance function
- Constructs various index structures by using quadratic form distance functions
- Chooses a structure that gives sufficient search performance in query processing
- Reduces both CPU time and number of page accesses for ellipsoid queries

- Similarity of matrices
- High search performance can be expected when the query matrix and the matrix of selected index are similar.

Indices

based on Xi

X1

Xj

Xe

Matrices Xi

M

Basic Idea- Similarity of matrices
- High search performance can be expected when the query matrix and the matrix of selected index are similar.

query (q, M)

Indices

based on Xi

X1

Xsimilar

Xe

Matrices Xi

M

M’

Basic Idea- Similarity of matrices
- High search performance can be expected when the query matrix and the matrix of selected index are similar.

query (q, M)

Xsimilar

Indexing and Retrieval Mechanism- Index structure construction
- C : the matrix for constructing the index IC
- Transformation matrix
- All data points in a data set are transformed
- IC is constructed using transformed data points

- Query processing
- 1. Calculates the transformed query point
- 2. Calculates the query matrix
- 3. Performs search processing by using IC , M’, q’
- The query of M can be processed by using IC

- Flatness of a query matrix
- The variance s2M of the eigenvalues of M is called the flatness of M:

: the i-th dimensional eigenvalue

: the average of the eigenvalues of M

- The flatness of the unit matrix is 0 (search of the Euclidean space).

- Dissimilarity of M and IC
- MSTT employs s2M’ as the measure of the dissimilarity between M and IC
- s2M’ : the flatness of M’
- The effectiveness of Ic relative to M improves as s2M’ decreases

- Introduction
- STT (spatial transformation technique)
- Definition of spatial transformation
- Spatial transformation of rectangles
- Search algorithm
- MSTT (multiple STT)
- Index structure construction
- Query processing
- Dissimilarity of matrices
- Performance test
- Conclusion

- Data sets: real data set (rgb histogram of images)
- Data size: 100,000
- Dimensionality: 8 and 27
- Page size: 8 KB
- 20-nearest neighbor queries
- Evaluation is based on the average for 100 query points
- Index structure : A-tree (Sakurai et al., VLDB2000)
- CPU: SUN UltraSPARC-II 450MHz

- Query matrices for experiments
- [HSE+95] : the components of M

a : positive constant,

dw(ci ,cj ): the weighted Euclidean distance

between the color ci and cj,

w=(wr , wg , wb ) : the weightings of the red, green

and blue components in RGB color space

- a=10, wg=wb=1
- wr was varied from 1 to 1,000
- The flatness of M increases as wr becomes large

- Comparison of STT and MBB-MBS (8D)
- Both methods require the same number of page accesses since they utilize exact distance functions
- Low CPU cost : STT increases approximation quality, and reduces the number of exact calculations
- The effectiveness of STT increases with matrix flatness

CPU time (d = 8)

Number of page accesses (d = 8)

Performance of STTCPU time (d = 27)

Number of page accesses (d = 27)

- Comparison of STT and MBB-MBS (27D)
- The effectiveness of STT increases as either dimensionality or matrix flatness grows
- STT achieves a 74% reduction in CPU cost for high dimensionality and matrix flatness

- Three structures
- structure constructed by the unit matrix (Unit)
- structure constructed by the matrix wr=10
- structure constructed by the matrix wr=1000
- Performance of MSTT
- Dissimilarity : the cost of search using a structure chosen by the dissimilarity function
- Dissimilarity is not optimal, but provides good performance

CPU time (d = 8)

Number of page accesses (d = 8)

Conclusions- Search methods for user-adaptive ellipsoid queries
- STT (Spatial Transformation Technique)
- Spatial transformation ： MBRs in the quadratic form distance space are transformed into rectangles in the Euclidean distance space
- STT performs ellipsoid queries efficiently even when dimensionality or matrix flatness is high
- MSTT (Multiple Spatial Transformation Technique)
- MSTT creates various index structures; the search algorithm utilizes a structure well suited to a query matrix
- MSTT reduces both CPU time and the number of page accesses

- Eigenvalues of a query matrix
- Dimensions corresponding to small eigenvalues contribute less to approximation quality
- These dimensions are eliminated to save on CPU costs
- Calculation time for the spatial transformation of rectangles is reduced to n/d

n: the number of dimensions used

The effect of D.R. grows

as matrix flatness increases

Performance of STT (2)d = 8

d = 27

Rate of filtered exact calculations

- Percentage of filtered exact distance calculations
- The efficiency of MBB-MBS decreases as matrix flatness grows
- STT effectively filters exact distance calculations for all queries

- Low search cost
- Compared with the structure by the Euclidean distance function, MSTT reduces both CPU time and the number of page accesses
- MSTT constructs various structures
- Dissimilarity function chooses structures well suited to the query matrix.

CPU time (d = 27)

Number of page accesses (d = 27)

*When life gives you a hundred reasons to cry, show life that you have a thousand reasons to smile*

© Copyright 2015 - 2019 SLIDESILO.COM - All rights reserved.