GeographicLib 2.1.2
Finding nearest neighbors
Back to Geodesics on an ellipsoid of revolution. Forward to Geodesics on a triaxial ellipsoid. Up to Contents.

The problem of finding the maritime boundary defined by the "median line" is discussed in Section 14 of

Figure 14 shows the median line which is equidistant from Britain and mainland Europe. Determining the median line involves finding, for any given P, the closest points on the coast of Britain and on the coast of mainland Europe. The operation of finding the closest in a set of points is usually referred to as the nearest neighbor problem and the NearestNeighbor class implements an efficient algorithm for solving it.

The NearestNeighbor class implements nearest-neighbor calculations using the vantage-point tree described by

Given a set of points x, y, z, …, in some space and a distance function d satisfying the metric conditions,

\[ \begin{align} d(x,y) &\ge 0,\\ d(x,y) &= 0, \ \text{iff $x = y$},\\ d(x,y) &= d(y,x),\\ d(x,z) &\le d(x,y) + d(y,z), \end{align} \]

the vantage-point (VP) tree provides an efficient way of determining nearest neighbors. The geodesic distance (implemented by the Geodesic class) satisfies these metric conditions, while the great ellipse distance and the rhumb line distance do not (they do not satisfy the last condition, the triangle inequality). Typically the cost of constructing a VP tree of N points is N log N, while the cost of a query is log N. Thus a VP tree should be used in situations where N is large and at least log N queries are to be made. The condition, N is large, should be taken to mean that \( N \gg 2^D \), where D is the dimensionality of the space.

  • This implementation includes Yianilos' upper and lower bounds for the inside and outside sets. This helps limit the number of searches (compared to just using the median distance).
  • Rather than do a depth-first or breath-first search on the tree, the nodes to be processed are put on a priority queue with the nodes most likely to contain close points processed first. Frequently, this allows nodes lower down on the priority queue to be skipped.
  • This technique also allows non-exhaustive searchs to be performed (to answer questions such as "are there any points within 1km of the query point?). - When building the tree, the first vantage point is (arbitrarily) chosen as the middle element of the set. Thereafter, the points furthest from the parent vantage point in both the inside and outside sets are selected as the children's vantage points. This information is already available from the computation of the upper and lower bounds of the children. This choice seems to lead to a reasonably optimized tree. - The leaf nodes can contain a bucket of points (instead of just a vantage point). - Coincident points are allowed in the set; these are treated as distinct points. The figure below shows the construction of the VP tree for the points making up the coastlines of Britain and Ireland (about 5000 points shown in blue). The set of points is recursively split into 2 equal "inside" and "outside" subsets based on the distance from a "vantage point". The boundaries between the inside and outside sets are shown as green circular arcs (arcs of geodesic circles). At each stage, the newly added vantage points are shown as red dots and the vantage points for the next stage are shown as red plus signs. The data is shown in the Cassini-Soldner projection with a central meridian of 5°W. @image html vptree.gif "Vantage-point tree"
Back to Geodesics on an ellipsoid of revolution. Forward to Geodesics on a triaxial ellipsoid. Up to Contents.