External Memory

CS240E: Data Structure and Data management (Enriched)

David Duan, 2019 Winter, Prof. Biedl

External MemoryCS240E: Data Structure and Data management (Enriched)David Duan, 2019 Winter, Prof. BiedlExternal Memory ModelMemory HierarchiesEMMExternal SortingSortingMulti-Way MergeExternal MergesortLower Bound for External SortingExternal DictionariesMulti-Way Search Tree(2,4) TreePropertiesSearchInsert(a,b) TreePropertiesAnalysis(2,4) Tree \iff Red Black TreeB-TreeB+-TreeMotivation: Storage DS vs. Decision DSB+-Tree: The Decision-Variant of a B-TreeLemma on B^+-Tree SizeWhy Do We Do This?Extendible HashingMotivationTries of BlocksOperationsSearch(k)Insert(k)Discussion

External Memory Model

Memory Hierarchies

Recall from CS251, computers have a hierarchy of different kinds of memories, which vary in terms of their size and distance from the CPU. Ordering from the fastest and smallest to the slowest and largest, we have registers, cache, internal memory, and external memory.

In most applications, only two levels really matter -- the one that can hold all the data items in our problem and the level just below that one. Bringing data items in and out of the higher memory that can hold all items will typically be the computational bottleneck in this case.

In this module, we adapt our algorithm to take the memory hierarchy into account, avoiding transfers as much as possible.


External Sorting

David: This section is not being tested on the final exam.


Given an array of numbers, we want to put them into sorted order. Assume is huge and is stored in blocks in external memory.

Recall in the RAM model, heapsort gives the optimal time () and space () performance. However, accessing a single location in external memory automatically loads a whole block, so accessing at indices that are far apart is very costly, typically one block transfer per array access.

Mergesort, on the other hand, adapts well to an array stored in external memory. It can be made even more effectively using -way merge, i.e., merging sorted runs into one sorted run.

Multi-Way Merge


  1. The standard merge (used in mergesort) uses .
  2. We could use for internal memory mergesort, but the extra time for means the overall runtime is no better.
External Mergesort

Let be the number of elements in the array stored in external memory, be the size of internal memory, and be the size of block, i.e., how many elements can we transfer from external to internal memory using one block transfer.

Step I. Create sorted runs of length .

Step II. Merge the first sorted runs using -way-merge.

Step III. Keep merging the next runs to reduce the number of runs by factor of .

Thus, the total number of block transfers is , as we need to perform rounds of -way-merge and each round of merging requires block transfers (both loading and write back).

Lower Bound for External Sorting

Theorem Sorting elements stored in external memory requires

block transfers.

The ratio is the number of external memory blocks that can fit into internal memory. This theorem is saying that the best performance we can achieve for the sorting problem is equivalent to the work of scannign through the input set (which takes transfers) at least a logarithmic number of times, where the base of this logarithm is the number of blocks that can fit into internal memory.

As a remark, -way mergesort with is optimal, because we are sorting as many (smaller) sorted runs as possible.

External Dictionaries

Multi-Way Search Tree

David: 3.1 is merely a motivation for the following data structures.

Let be a node of an ordered tree. We say that is a -node if has children. We define a multi-way search tree to be an ordered tree that has the following properties:

  1. Each internal node of has at least two children. That is, each internal node is a -node, where .
  2. Each internal node of stores a collection of itesm of the form , where is a key and is an element.
  3. Each -node of , with children , stores items , where .
  4. Define and . For each item stored at a node in the subtree of rooted at , , we have .

If we think of the set of keys stored at as including the speical keys and , then a key stored in the subtree of rooted at a child node must be "in between" two keys stored at . This simple viewpoint gives rise to the rule that a node with children stores regular keys, and it also forms the basis of the algorithm for searching in a multi-way search tree.


Let denote keys and denote pointers to subtrees.


In using a multi-way search tree in practice, we desire that it be balanced, i.e., have logarithmic height. We can maintain balance in a tree by maintaining two simple properties:

  1. Size property: every node has at most four children.

    1. -node: KVP & subtrees (possibly empty).
    2. -node: KVP & subtrees (possibly empty).
    3. -node: KVP & subtrees (possibly empty).
  2. Depth property: all the external nodes have the same depth.

    1. All empty subtrees are at the same level.

Enforcing the size property for trees keeps the size of the nodes in the multi-way search tree constant, for it allows us to represent the dictionary stored at each internal node using a constant-sized array. The depth property, on the other hand, maintains the balance in a tree, by forcing it to be a bounded depth structure.


Search is very similar to BST:


As in BST-Insert, we first find where the node should be added. However, the depth property does not allow the tree to grow downward, so we add it to the list of KVPs stored inside the current node (always a leaf) and perform "fix-ups" if necessary.

For example, if we insert


The Tree is a specific type of tree.

An tree satisfies:

  1. Each node has at least subtrees, unless it is the root, which has at least subtrees.
  2. Each node has at most subtrees.
  3. If a node has subtrees, then it stores KVPs.
  4. Empty subtrees are at the same level.
  5. The keys in the node are between the keys in the corresponding subtrees.

Additionally, we set the rule that , so operations for trees will work just like for trees, after re-defining underflow/overflow to consider above constraints.


We now analyze the runtime for general trees.




Suppose we have an tree of height . Note tha we don't count empty subtrees as a level. We want to build an tree with as few nodes as possible but still reach height to form a relationship between , the number of KVPs, and , the height of trees.

At layer 0, we have one node (root) and at least 1 KVPs (recall root is special).

At layer 1, we have at least two nodes (root is special where it can have two subtrees only), each node stores at least KVPs, so overall this layer has KVPs.

At layer 2, we have at least nodes (each node at layer has subtrees at least), each node stores at least KVPs, so overall this layer has KVPs.

At layer 2, we have at least nodes (each node at layer 2 has subtrees at least), each node stores at least KVPs, so overall this layer has KVPs.

Continue this process, at layer , we have at least nodes, each node stores at least KVPs, so overall this layer has KVPs.

Sum up the number of KVPs, we get the number of KVPs given height is at least

Rearranging the equation, the height given of KVPs is at most

Since and the bottom is a constant, we get .

Keep in mind that we have the assumption wehre and , because otherwise the insertion split wouldn't work.


Therefore, runtime for operations is worst case! The number of block transfer is also .

Tree Red Black Tree

David: You are expected to know how the conversion works but not RBT insertion.

Note that we have another data structure with worst-case performance besides AVL tree. We now show (using Red Black Tree) that trees perform better than AVL trees.

A brief introduction on Red Black Tree

In general, red-black-trees provide faster insertion and removal operations than AVL trees as fewer rotations are done due to relatively relaxed balancing.


We have seen the number of block transfers for an tree is . To minimize this, we want to maximize our base . This is the central motivation for -trees: we want to fit as many KVPs as possible into one block: a -tree of order is a --tree, where

B-Trees are basically useless in internal memory (no better than AVL or trees), but are great when used in external memory (e.g., database-related applications), as it minimizes the number of block transfers: recall each operation is done in block transfers, then gives huge saving of block transfers as it minimizes height.

Two Futher Issues

  1. What if we don't know , the block size? Then we couldn't calculuate explicitly! Solution: we could use Cache Oblivious Trees (not tested).
  2. What if the values are much bigger than the keys? For example, keys being Facebook ID and values being all the stuff our dearest Mark Zuckerberg has on us? Solution: Take a look at B+ trees.
Motivation: Storage DS vs. Decision DS

There are two variants of tree-structures:

  1. Storage-variant (Heap, BST): Every node stores a KVP.
  2. Decision-variant (Trie, kd-Tree): All KVPs at leaves; internal nodes / edges store only decisions to guide search.

For storage-variant, there usually exists an equivalent decision variant. In internal memory this wastes space (usually twice as many nodes), but we could take advantage of this in external memory.

B+-Tree: The Decision-Variant of a B-Tree

B-tree of order 5:

B+-tree of order 5:

Lemma on B^+-Tree Size

Claim. A B+ tree stores keys, aassuming .

Proof. The bottom layer stores keys; one layer above stores keys; the layer above stores keys... Thus the total number of keys is

Why Do We Do This?

Assume keys are usually fairly small, while values potentially huge. Recall in B-tree, the order is chosen such that a node fits into a block. In a B+-tree, internal nodes can store many more keys than KVPs! Thus the height will be much smaller, so we need fewer block transfer.

Extendible Hashing


We store integers in external memory. The regular hashing techniques we had before uses

But! This is amortized! Occationally you need to rehash/rebuild the entire data structure!

Rehashiing takes between and block transfers, which is too slow to our standard.

Tries of Blocks

Assume we store integers (hash values) in . Note that we can control the outputs of hash values by choosing appropriate hash function(s).


We define

In the example above, the global depth is and the top leaf has local depth . The key observation is that the local depth for a block equals the number of mask of that block, i.e., the first few bits of the block that distinguishes itself, or the bits on the search path from root to that block (in the sense of trie search).

  1. Compute the hash value.
  2. Convert to a bitstring .
  3. Trace in trie of blocks to find .
  4. The leaf you reached contains a reference to the block with in it. Load that block and search for .

Thus, the number of block transfers for search is .

  1. Do as above and load the block that should have .
  2. If the block has space, add to it and return.
  3. Otherwise, split the block and expand the trie until fits into the block.

Example: Insert(10110):


[On slides; TBC]

The number of block transfers could be bit but in practice usually 2 to 5:

  1. We usually do block transfer only and we never need a full rehash.
  2. Note that we assumed the entire trie of directory fits into internal memory.
  3. We may have empty or underfull blocks, so a bit of space wasted is inevitable.

To further save space, we could expand the trie so that all leaves have the global depth. Note that multiple leaves may point to the same block. Next, we store only the leaves in an array with size where is the global depth of the trie.

Try to think of doing insertion in a trie (instead of the optimized array thingy) when doing extendible hashing. This is probably easier for your brain.