Range Search

CS240E: Data Structures and Data Management (Enriched)

David Duan, 2019 Winter

Partition Trees

$n$ leaves where

Each leaf node corresponds to an actual item, and
Each internal node corresponds to a region.

Quadtrees

$n$ $S = \{(x_0, y_0), (x_1, y_1), \ldots, (x_{n-1}, y_{n-1})\}$ $R = [0, 2^M) \times [0, 2^M)$ ¹ $S$ as follows:

$r$ $T$ $R$ .
$R$ $0$ $1$ $r$ is a leaf that stores the point.
$R$ $R_{NE}$ $R_{NW}$ $R_{SW}$ $R_{SE}$ $T_{NE}$ $T_{NW}$ $T_{SW}$ $T_{SE}$ $r$ $T$ $T_i$ $R_i$ .
We recursively repeat this process at each subtree. Note that each recursive call ends when the regions has only one or zero point inside.

Example

Given the following points in the left, we recursively partition the square into smaller regions, until each region contains at most one point, i.e., the right picture.

The computer stores the quadtree as shown in the left, but we could ignore the empty regions/nodes and instead label the edges.

Dictionary Operations

Search: Analogous to BSTs and tries.

Insert: search for the new point (for where it should be), now split the region (repeatedly if necessary) until the new point is in a region of its own.

Range Search

$T$ $R_T$ $T$ $A$ be our query rectangle. We can classify nodes in the quadtree as follows:

\begin{align*} \color{green}{T} \color{black} &\iff (R_T \cap A \ne \varnothing) \land (R_T \cap A \ne A) \\ \color{blue}{T} \color{black} &\iff R_T \cap A = \varnothing \\ \color{magenta}{T} \color{black} &\iff R_T \subseteq A \\ \end{align*}

$T$ $R_T$ $A$ ; we need to further explore.
$T$ $R_T$ $A$ ; we know it cannot contain any point we want.
$T$ $R_T$ $A$ $R_T$ $A$ .

We have four cases to deal with:

\begin{array}{ll} &QTreeRangeSearch(T, A) \\ &T: \text{The root of a quadtree;} \quad A: \text{Query rectangle} \\ \\ &1. \quad \text{Let $R_T$ be the square associated with $T$} & \text{Start with the entire search space} \\ &2. \quad \text{if $(R_T\subseteq A)$ then report all points in $T$; return} &\text{Report all points when region is pink}\\ &3. \quad \text{if $(R_T \cap A = \varnothing)$ then return} &\text{Discard all points when region is blue}\\ &4. \quad \text{if $(T$ stores a single point $p)$ then} &\text{Base case for green nodes: explicitly check}\\ &5. \qquad \text{if $p \in A$: return $p$} \\ &6. \qquad \text{else return} \\ &7. \quad \text{for each child $v$ of $T$ do} & \text{Recursion: check each child for green nodes} \\ &8. \qquad QTreeRangeSearch(v, A) \end{array}

Base Case I: $R_T \subseteq A$ $T$ $R_T$ $T$ $A$ $T$ and return.
Base Case II: $R_T \cap A = \varnothing$ $T$ $R_T$ $T$ $A$ . We discard them (by doing nothing) and return.
Base Case III: $T$ $|T| = 1$ $T$ $R_T$ $p \in A$ . We report it if yes / discard it if no and return.
Recursion: $T$ $R_T$ $RangeSearch$ on each child (i.e., search each region).

Height Analysis

$\beta(S)$ spread factor $S$ :

\beta(S) = \frac{\text{side length of $R$}}{\text{min distance between two points in $S$}}.

Then the height of a quadtree is bounded by

h \in \Theta(\log \beta(S)).

Note that if the minimum distance between two points are small, the height could very large.

Remarks

In 3D, we have oct-trees; in 1D, we get a trie! ²

kd-Trees

Instead of blindly cutting regions into the same size (Trie), we now split the points so that the set of points is cut in roughly half (BST).

$x$ $y$ -coordinates. We repeat this procedure until every point is in its own region.

$x$ $S$ :

$|S| \leq 1$ , create a leaf and return.
$X:=QuickSelect(S, \lfloor n/2\rfloor)$ .
$S$ $x$ $S_{x < X}$ $S_{x \geq X}$ .
$y$ $S_{x < X}$ .
$y$ $S_{x \geq X}$ .

Just like quadtrees can be seen as 2D tries, kd-trees can be seen as 2D BST or decision trees.

Example

Height Analysis

$(\lfloor n/2\rfloor+1)$ $x$ $y$ $\lfloor n/2\rfloor$ points in the bottom/left region, but we don't have a clear bound on the right/top side as we can't guarantee how many are sitting on the median line. For example, consider the following extreme case:

\begin{matrix} \circ & & & & & \\ \circ & & & & & \\ \circ & & & & & \\ \circ & & & & & \\ \circ & & & & & \\ \circ & \circ & \circ & \circ & \circ & \circ\\ \end{matrix}

$6$ elements; the problem size doesn't decrease as we go deeper into the tree. This shows us that kd-trees have even worse performance than quadtrees in theory.

$x$ $y$ $\lfloor n/2 \rfloor$ $\lceil n/2 \rceil$ $O(\log n)$ .

Dictionary Operations

Search: Analogous to BSTs and tries.

Insert $O(\log n)$ , we could track our height and rebuild a subtree after insertion if needed as in scapegoat trees.

Complexity

Build:

$QuickSelect$ $O(n)$ expected.
$O(n)$ .
$T(n) = O(n) + 2T(n/2) \implies T(n) \in O(n \log n)$ expected.

Search $O(h) = O(\log n)$ given the input is nice.

Insert $O(\log n)$ .

Range Search

$RangeSearch$ procedure for kd-trees are almost identical to quadtree's:

\begin{array}{ll} &kdTree-RangeSearch(T, R, A) \\ &T: \text{The root of a quadtree}; \quad R: \text{Region associated with $T$;} &A: \text{Query rectangle} \\ \\ &1. \quad \text{Let $R_T$ be the square associated with $T$} &\text{Start with the entire search space} \\ &2. \quad \text{if $(R_T\subseteq A)$ then report all points in $T$; return} &\text{Report all points for a pink region}\\ &3. \quad \text{if $(R_T \cap A = \varnothing)$ then return} &\text{Discard all points for a blue region} \\ &4. \quad \text{if $(T$ stores a single point $p)$ then} &\text{Base case for a green region}\\ &5. \qquad \text{if $p \in A$: return $p$} \\ &6. \qquad \text{else return} \\ &7. \quad \text{if $T$ stores split is “$x < X$"?} & \text{if $T$ is splitted by $x$-coordinate here}\\ &8. \qquad R_l \to R \cap \{(x,y): x < X\} \\ &9. \qquad R_r \to R \cap \{(x,y): x \geq X\} \\ &10. \quad\;\; kdTree-RangeSearch(T.left, R_l, A) &\text{Recursion: explicitly test each child}\\ &11. \quad\;\; kdTree-RangeSearch(T.right, R_r, A) &\text{Recursion: explicitly test each child}\\ &12. \;\; \text{else} & \text{if $T$ is splitted by $y$-coordinate here}\\ &13. \quad\;\; R_l \to R \cap \{(x,y): y < Y\} \\ &14. \quad\;\; R_r \to R \cap \{(x,y): y \geq Y\} \\ &15. \quad\;\; kdTree-RangeSearch(T.left, R_l, A) &\text{Recursion: explicitly test each child}\\ &16. \quad\;\; kdTree-RangeSearch(T.right, R_r, A) &\text{Recursion: explicitly test each child}\\ \end{array}

Range Search Analysis

Recall color coding from before:

$T$ $R_T$ $A$ ; we need to further explore.
$T$ $R_T$ $A$ ; we know it cannot contain any point we want.
$T$ $R_T$ $A$ $R_T$ $A$ .

$O(\sqrt n)$ $O(\sqrt n)$ $R$ $A$ .

$A$ $2O(n/4)$ $2$ $x < X$ $y < Y$ ).

$Q(n)$ $n$ points whose region intersects a vertical segment. We have the following recursion:

Q(n) = 2 + 2 Q(n/4) \in \Theta(\sqrt n).

$O(s + \sqrt n)$ $s$ is the size of output.

Higher Dimensions

$d$ -dimensional space,

At the root the point set is partitioned based on the first coordinate,
At the second layer the partition is based on the second coordinate,
...
$d-1$ the partition is based on the last coordinate,
$d$ we start all over again, until each point has its own region.

$o(n)$ $d$ is a constant):

$O(n)$ .
$O(n \log n)$ .
$O(s + n^{1-1/d})$ .

Range Trees

We now present a new data structure which provides much faster range query operation but requires more than linear space.

(Balanced) BST Range Search

Strategy

$k_1$ $T$ $P_1$ .
$k_2$ $T$ $P_2$ .
$T$ into three groups:
1. $P_1$ $P_2$ ; check each green node to see if it is in the search range.
2. $P_1$ $P_1$ $P_2$ $P_2$ goes left.
3. $x$ coordinates are definitely in the range; the "inside nodes".
Report all allocation nodes; test each boundary node and report it if in range.

Example

$P_1 : 52 \to 35 \to 15 \to 27 \to 29$ $P_2 : 52 \to 35 \to 42 \to 46$ .

Pseudocode

$T$ $[k_1, k_2]$ in sorted order.

\begin{array}{ll} &BSTRangeSearch(T, k_1, k_2) \\ &\text{$T$: root of a BST; $\quad k_1, k_2$: search keys} \\ &\text{Returns keys in $T$ that are in range $[k_1, k_2]$ (in sorted order)} \\ &1. \quad \text{if $T = null$ then return} & \text{Base case: Empty} \\ &2. \quad \text{if $k_1 \leq key(T) \leq k_2$ then } & \text{If current node is between two bounds}\\ &3. \qquad L \leftarrow BSTRangeSearch(T.left, k_1, k_2) &\text{Search left subtree}\\ &4. \qquad R \leftarrow BSTRangeSearch(T.right, k_1, k_2) &\text{Search right subtree}\\ &5. \qquad \text{return $L \cup \{key(T)\} \cup R$}&\text{Return above results plus current key} \\ &6. \quad \text{if $key(T) < k_1$ then} &\text{If current node $<$ lower bound}\\ &7. \qquad \text{return $BSTRangeSearch(T.right, k_1, k_2)$} &\text{Only search for right subtree} \\ &8. \quad \text{if $key(T) > k_2$ then} &\text{If current node $>$ lower bound}\\ &9. \qquad \text{return $BSTRangeSearch(T.left, k_1, k_2)$} &\text{Only search for left subtree} \end{array}

Analysis

$|P_1| = |P_2| = O(\log n)$ $O(\log n)$ $O(\log n)$ $v$ $P_1$ $P_2$ $P_1$ $v$ $P_2$ $v$ $O(\log n)$ $P_1$ $P_2$ $O(\log n)$ $s$ $O(s)$ $O(\log n + s)$ time.

2D Range Trees

$n$ $P= \{(x_0, y_0), (x_1, y_1), \ldots, (x_{n-1}, y_{n-1})\}$ , a range tree is a tree of trees (a multi-level data structure), where

Primary: $x$ $x$ -coordinates as keys.
Auxiliary: $y$ $y$ -coordinates as keys.

Every node in the primary data structure has an associated data structure, which contains the same set of descendents of this node, but potentially in a different order.

Space

$O(\log n)$ $O(\log n)$ $n$ $O(n \log n)$ .

Dictionary Operations

$Search((x,y))$ $x$ $T$ $O(\log n)$ $x$ $x$ $y$ $y$ $O(\log n)$ time.

$Insert((x,y))$ $x$ $T$ $O(\log n)$ $O(\log n)$ $y$ $O(\log^2 n)$ .

No Rotation in Primary Tree

We don't want to rotate the primary tree, because then the associated trees will be messed up. Because we use scapegoat trees,

$O(\log n)$ height,
When we rebuild a subtree of the primary tree, we also rebuild all associated trees of that subtree,
$O(\log^2 n)$ .

Range Search

$x$ $T$ $O(\log n)$ .
$O(\log n)$ .
$y$ $O(\log n)$ $O(\log n)$ $O(\log^2 n)$ nodes.

$O(\log^2n + s)$ $s$ is the output size.

Higher Dimension

$d$ -dimensional space (time-space trade-off compared to kd-trees:)

	Range Trees	kD-Trees
Space	$O(n\log ^{d-1} n)$	$O(n)$
Construction	$O(n \log^{d-1} n)$	$O(n \log n)$
RangeQuery	$O(s + \log^d n)$	$O(s + n^{1-1/d})$

Three-Sided Range Query

$[x',x''] \times [y',\infty)$ .

Idea I Use associated data structures.

Primary: scapegoat tree
$x$ $O(1+s)$ time).
Query:
- $x$ $O(\log n)$
- $O(\log n)$
- $O(\log n)$ $y'$ $O(\log n(1 + s)) = O(\log n)$
- $O(\log n + s)$ $s$ is size of output.
Disadvantage: wasting space.

Idea II Use treaps: combining BST with heaps.

Only structure: treap.
Query:
- $BSTrangesearch(x1, x2)$ to get boundary and allocation ondes.
- Boundary nodes: explicitly test each.
- Allocation nodes: search in each heap.
Disadvantage: We cannot bound the height for treaps.

Idea III $x$ -coordinate used for split.

No detail for this one.

1 Note the bounding box is closed on the left and open on the right, see remark below; ideally, we want the box width to be a power of

$2$ so partition is easier. ↩

2 Closed on the left and open on the right allows

$0$ to be in the trie but not

$2^M$ . ↩