Skip to content

segger.data.ndtree

The ndtree module provides a specialized data structure for spatially partitioning multi-dimensional data. This module implements an NDTree (N-dimensional tree) that efficiently splits spatial data into balanced regions for parallel processing.

_ndtree

NDTree

NDTree is a data structure for recursively splitting multi-dimensional data into smaller regions until each leaf node contains less than or equal to a specified number of points. It stores these regions in a balanced binary tree.

Attributes:

Name Type Description
data

The input data to be partitioned.

n

The maximum number of points allowed in a leaf node.

idx

The indices of the input data points.

boxes

A list to store the bounding boxes (as shapely polygons) of each region in the tree.

rect

The bounding box of the entire input data space.

tree

The root of the NDTree.

Source code in src/segger/data/_ndtree.py
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
class NDTree:
    """NDTree is a data structure for recursively splitting multi-dimensional data
    into smaller regions until each leaf node contains less than or equal to a
    specified number of points. It stores these regions in a balanced binary
    tree.

    Attributes:
        data: The input data to be partitioned.
        n: The maximum number of points allowed in a leaf node.
        idx: The indices of the input data points.
        boxes: A list to store the bounding boxes (as shapely polygons) of each region
            in the tree.
        rect: The bounding box of the entire input data space.
        tree: The root of the NDTree.
    """

    def __init__(self, data, n):
        """Initialize the NDTree with the given data and maximum points per leaf
        node.

        Args:
            data: The input data to be partitioned.
            n: The maximum number of points allowed in a leaf node.
        """
        self.data = np.asarray(data)
        self.n = n
        self.idx = np.arange(data.shape[0])
        self.boxes = []
        self.rect = Rectangle(data.min(0), data.max(0))
        self.tree = innernode(self.n, self.idx, self.rect, self)

__init__

__init__(data, n)

Initialize the NDTree with the given data and maximum points per leaf node.

Parameters:

Name Type Description Default
data

The input data to be partitioned.

required
n

The maximum number of points allowed in a leaf node.

required
Source code in src/segger/data/_ndtree.py
23
24
25
26
27
28
29
30
31
32
33
34
35
36
def __init__(self, data, n):
    """Initialize the NDTree with the given data and maximum points per leaf
    node.

    Args:
        data: The input data to be partitioned.
        n: The maximum number of points allowed in a leaf node.
    """
    self.data = np.asarray(data)
    self.n = n
    self.idx = np.arange(data.shape[0])
    self.boxes = []
    self.rect = Rectangle(data.min(0), data.max(0))
    self.tree = innernode(self.n, self.idx, self.rect, self)

innernode

Represent a node in the NDTree. Each node either stores a bounding box for the data it contains (leaf nodes) or splits the data into two child nodes.

Attributes:

Name Type Description
n

The maximum number of points allowed in a leaf node for this subtree.

idx

The indices of the data points in this node.

tree

The reference to the main NDTree that holds the data and bounding boxes.

rect

The bounding box of the data points in this node.

split_dim

The dimension along which the node splits the data.

split_point

The value along the split dimension used to divide the data.

less

The child node containing data points less than or equal to the split point.

greater

The child node containing data points greater than the split point.

Source code in src/segger/data/_ndtree.py
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
class innernode:
    """Represent a node in the NDTree. Each node either stores a bounding box for
    the data it contains (leaf nodes) or splits the data into two child nodes.

    Attributes:
        n: The maximum number of points allowed in a leaf node for this subtree.
        idx: The indices of the data points in this node.
        tree: The reference to the main NDTree that holds the data and bounding boxes.
        rect: The bounding box of the data points in this node.
        split_dim: The dimension along which the node splits the data.
        split_point: The value along the split dimension used to divide the data.
        less: The child node containing data points less than or equal to the split
            point.
        greater: The child node containing data points greater than the split point.
    """

    def __init__(self, n, idx, rect, tree):
        """Initialize the innernode and split the data if necessary.

        Args:
            n: The maximum number of points allowed in a leaf node.
            idx: The indices of the data points in this node.
            rect: The bounding box of the data points in this node.
            tree: The reference to the main NDTree.
        """
        self.n = n
        self.idx = idx
        self.tree = tree
        self.rect = rect
        if not n == 1:
            self.split()
        else:
            box = shapely.box(*self.rect.mins, *self.rect.maxes)
            self.tree.boxes.append(box)

    def split(self):
        """Recursively split the node's data into two child nodes along the
        dimension with the largest spread.
        """
        less = math.floor(self.n // 2)
        greater = self.n - less
        data = self.tree.data[self.idx]
        self.split_dim = np.argmax(self.rect.maxes - self.rect.mins)
        data = data[:, self.split_dim]
        self.split_point = np.quantile(data, less / (less + greater))
        mask = data <= self.split_point
        less_rect, greater_rect = self.rect.split(self.split_dim, self.split_point)
        self.less = innernode(less, self.idx[mask], less_rect, self.tree)
        self.greater = innernode(greater, self.idx[~mask], greater_rect, self.tree)

__init__

__init__(n, idx, rect, tree)

Initialize the innernode and split the data if necessary.

Parameters:

Name Type Description Default
n

The maximum number of points allowed in a leaf node.

required
idx

The indices of the data points in this node.

required
rect

The bounding box of the data points in this node.

required
tree

The reference to the main NDTree.

required
Source code in src/segger/data/_ndtree.py
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
def __init__(self, n, idx, rect, tree):
    """Initialize the innernode and split the data if necessary.

    Args:
        n: The maximum number of points allowed in a leaf node.
        idx: The indices of the data points in this node.
        rect: The bounding box of the data points in this node.
        tree: The reference to the main NDTree.
    """
    self.n = n
    self.idx = idx
    self.tree = tree
    self.rect = rect
    if not n == 1:
        self.split()
    else:
        box = shapely.box(*self.rect.mins, *self.rect.maxes)
        self.tree.boxes.append(box)

split

split()

Recursively split the node's data into two child nodes along the dimension with the largest spread.

Source code in src/segger/data/_ndtree.py
74
75
76
77
78
79
80
81
82
83
84
85
86
87
def split(self):
    """Recursively split the node's data into two child nodes along the
    dimension with the largest spread.
    """
    less = math.floor(self.n // 2)
    greater = self.n - less
    data = self.tree.data[self.idx]
    self.split_dim = np.argmax(self.rect.maxes - self.rect.mins)
    data = data[:, self.split_dim]
    self.split_point = np.quantile(data, less / (less + greater))
    mask = data <= self.split_point
    less_rect, greater_rect = self.rect.split(self.split_dim, self.split_point)
    self.less = innernode(less, self.idx[mask], less_rect, self.tree)
    self.greater = innernode(greater, self.idx[~mask], greater_rect, self.tree)