GeographicLib 2.1.2
GeographicLib::NearestNeighbor< dist_t, pos_t, distfun_t > Class Template Reference

Nearest-neighbor calculations. More...

#include <GeographicLib/NearestNeighbor.hpp>

Public Member Functions

 NearestNeighbor ()
 
 NearestNeighbor (const std::vector< pos_t > &pts, const distfun_t &dist, int bucket=4)
 
void Initialize (const std::vector< pos_t > &pts, const distfun_t &dist, int bucket=4)
 
dist_t Search (const std::vector< pos_t > &pts, const distfun_t &dist, const pos_t &query, std::vector< int > &ind, int k=1, dist_t maxdist=std::numeric_limits< dist_t >::max(), dist_t mindist=-1, bool exhaustive=true, dist_t tol=0) const
 
int NumPoints () const
 
void Save (std::ostream &os, bool bin=true) const
 
void Load (std::istream &is, bool bin=true)
 
void swap (NearestNeighbor &t)
 
void Statistics (int &setupcost, int &numsearches, int &searchcost, int &mincost, int &maxcost, double &mean, double &sd) const
 
void ResetStatistics () const
 

Friends

std::ostream & operator<< (std::ostream &os, const NearestNeighbor &t)
 
std::istream & operator>> (std::istream &is, NearestNeighbor &t)
 

Detailed Description

template<typename dist_t, typename pos_t, class distfun_t>
class GeographicLib::NearestNeighbor< dist_t, pos_t, distfun_t >

Nearest-neighbor calculations.

This class solves the nearest-neighbor problm using a vantage-point tree as described in Finding nearest neighbors.

This class is templated so that it can handle arbitrary metric spaces as follows:

Template Parameters
dist_tthe type used for measuring distances; it can be a real or signed integer type; in typical geodetic applications, dist_t might be double.
pos_tthe type for specifying the positions of points; geodetic application might bundle the latitude and longitude into a std::pair<dist_t, dist_t>.
distfun_tthe type of a function object which takes takes two positions (of type pos_t) and returns the distance (of type dist_t); in geodetic applications, this might be a class which is constructed with a Geodesic object and which implements a member function with a signature dist_t operator() (const pos_t&, const pos_t&) const, which returns the geodesic distance between two points.
Note
The distance measure must satisfy the triangle inequality, \( d(a,c) \le d(a,b) + d(b,c) \) for all points a, b, c. The geodesic distance (given by Geodesic::Inverse) does, while the great ellipse distance and the rhumb line distance do not. If you use the ordinary Euclidean distance, i.e., \( \sqrt{(x_a-x_b)^2 + (y_a-y_b)^2} \) for two dimensions, don't be tempted to leave out the square root in the interests of "efficiency"; the squared distance does not satisfy the triangle inequality!
This is a "header-only" implementation and, as such, depends in a minimal way on the rest of GeographicLib (the only dependency is through the use of GeographicLib::GeographicErr for handling and run-time exceptions). Therefore, it is easy to extract this class from the rest of GeographicLib and use it as a stand-alone facility.

The dist_t type must support numeric_limits queries (specifically: is_signed, is_integer, max(), digits).

The NearestNeighbor object is constructed with a vector of points (type pos_t) and a distance function (type distfun_t). However the object does not store the points. When querying the object with Search(), it's necessary to supply the same vector of points and the same distance function.

There's no capability in this implementation to add or remove points from the set. Instead Initialize() should be called to re-initialize the object with the modified vector of points.

Because of the overhead in constructing a NearestNeighbor object for a large set of points, functions Save() and Load() are provided to save the object to an external file. operator<<(), operator>>() and Boost serialization can also be used to save and restore a NearestNeighbor object. This is illustrated in the example.

Example of use:

// Example of using the GeographicLib::NearestNeighbor class. WARNING: this
// creates a file, pointset.xml or pointset.txt, in the current directory.
// Read lat/lon locations from locations.txt and lat/lon queries from
// queries.txt. For each query print to standard output: the index for the
// closest location and the distance to it. Print statistics to standard error
// at the end.
#include <iostream>
#include <exception>
#include <vector>
#include <fstream>
#include <string>
#if !defined(GEOGRAPHICLIB_HAVE_BOOST_SERIALIZATION)
#define GEOGRAPHICLIB_HAVE_BOOST_SERIALIZATION 0
#endif
#if GEOGRAPHICLIB_HAVE_BOOST_SERIALIZATION
// If Boost serialization is available, use it.
#include <boost/archive/xml_iarchive.hpp>
#include <boost/archive/xml_oarchive.hpp>
#endif
using namespace std;
using namespace GeographicLib;
// A structure to hold a geographic coordinate.
struct pos {
double _lat, _lon;
pos(double lat = 0, double lon = 0) : _lat(lat), _lon(lon) {}
};
// A class to compute the distance between 2 positions.
class DistanceCalculator {
private:
Geodesic _geod;
public:
explicit DistanceCalculator(const Geodesic& geod) : _geod(geod) {}
double operator() (const pos& a, const pos& b) const {
double d;
_geod.Inverse(a._lat, a._lon, b._lat, b._lon, d);
if ( !(d >= 0) )
// Catch illegal positions which result in d = NaN
throw GeographicErr("distance doesn't satisfy d >= 0");
return d;
}
};
int main() {
try {
// Read in locations
vector<pos> locs;
double lat, lon;
string sa, sb;
{
ifstream is("locations.txt");
if (!is.good())
throw GeographicErr("locations.txt not readable");
while (is >> sa >> sb) {
DMS::DecodeLatLon(sa, sb, lat, lon);
locs.push_back(pos(lat, lon));
}
if (locs.size() == 0)
throw GeographicErr("need at least one location");
}
// Define a distance function object
DistanceCalculator distance(Geodesic::WGS84());
// Create NearestNeighbor object
{
// Used saved object if it is available
#if GEOGRAPHICLIB_HAVE_BOOST_SERIALIZATION
ifstream is("pointset.xml");
if (is.good()) {
boost::archive::xml_iarchive ia(is);
ia >> BOOST_SERIALIZATION_NVP(pointset);
}
#else
ifstream is("pointset.txt");
if (is.good())
is >> pointset;
#endif
}
// Is the saved pointset up-to-date?
if (pointset.NumPoints() != int(locs.size())) {
// else initialize it
pointset.Initialize(locs, distance);
// and save it
#if GEOGRAPHICLIB_HAVE_BOOST_SERIALIZATION
ofstream os("pointset.xml");
if (!os.good())
throw GeographicErr("cannot write to pointset.xml");
boost::archive::xml_oarchive oa(os);
oa << BOOST_SERIALIZATION_NVP(pointset);
#else
ofstream os("pointset.txt");
if (!os.good())
throw GeographicErr("cannot write to pointset.txt");
os << pointset << "\n";
#endif
}
ifstream is("queries.txt");
double d;
int count = 0;
vector<int> k;
while (is >> sa >> sb) {
++count;
DMS::DecodeLatLon(sa, sb, lat, lon);
d = pointset.Search(locs, distance, pos(lat, lon), k);
if (k.size() != 1)
throw GeographicErr("unexpected number of results");
cout << k[0] << " " << d << "\n";
}
int setupcost, numsearches, searchcost, mincost, maxcost;
double mean, sd;
pointset.Statistics(setupcost, numsearches, searchcost,
mincost, maxcost, mean, sd);
int
totcost = setupcost + searchcost,
exhaustivecost = count * pointset.NumPoints();
cerr
<< "Number of distance calculations = " << totcost << "\n"
<< "With an exhaustive search = " << exhaustivecost << "\n"
<< "Ratio = " << double(totcost) / exhaustivecost << "\n"
<< "Efficiency improvement = "
<< 100 * (1 - double(totcost) / exhaustivecost) << "%\n";
}
catch (const exception& e) {
cerr << "Caught exception: " << e.what() << "\n";
return 1;
}
}
int main(int argc, const char *const argv[])
Definition: CartConvert.cpp:29
Header for GeographicLib::DMS class.
Header for GeographicLib::Geodesic class.
Header for GeographicLib::NearestNeighbor class.
static void DecodeLatLon(const std::string &dmsa, const std::string &dmsb, real &lat, real &lon, bool longfirst=false)
Definition: DMS.cpp:368
Geodesic calculations
Definition: Geodesic.hpp:172
static const Geodesic & WGS84()
Definition: Geodesic.cpp:89
Math::real Inverse(real lat1, real lon1, real lat2, real lon2, real &s12, real &azi1, real &azi2, real &m12, real &M12, real &M21, real &S12) const
Definition: Geodesic.hpp:680
Exception handling for GeographicLib.
Definition: Constants.hpp:316
Nearest-neighbor calculations.
void Statistics(int &setupcost, int &numsearches, int &searchcost, int &mincost, int &maxcost, double &mean, double &sd) const
dist_t Search(const std::vector< pos_t > &pts, const distfun_t &dist, const pos_t &query, std::vector< int > &ind, int k=1, dist_t maxdist=std::numeric_limits< dist_t >::max(), dist_t mindist=-1, bool exhaustive=true, dist_t tol=0) const
void Initialize(const std::vector< pos_t > &pts, const distfun_t &dist, int bucket=4)
Namespace for GeographicLib.
Definition: Accumulator.cpp:12

Definition at line 103 of file NearestNeighbor.hpp.

Constructor & Destructor Documentation

◆ NearestNeighbor() [1/2]

template<typename dist_t , typename pos_t , class distfun_t >
GeographicLib::NearestNeighbor< dist_t, pos_t, distfun_t >::NearestNeighbor ( )
inline

Default constructor for NearestNeighbor.

This is equivalent to specifying an empty set of points.

Definition at line 118 of file NearestNeighbor.hpp.

◆ NearestNeighbor() [2/2]

template<typename dist_t , typename pos_t , class distfun_t >
GeographicLib::NearestNeighbor< dist_t, pos_t, distfun_t >::NearestNeighbor ( const std::vector< pos_t > &  pts,
const distfun_t &  dist,
int  bucket = 4 
)
inline

Constructor for NearestNeighbor.

Parameters
[in]ptsa vector of points to include in the set.
[in]distthe distance function object.
[in]bucketthe size of the buckets at the leaf nodes; this must lie in [0, 2 + 4*sizeof(dist_t)/sizeof(int)] (default 4).
Exceptions
GeographicErrif the value of bucket is out of bounds or the size of pts is too big for an int.
std::bad_allocif memory for the tree can't be allocated.

pts may contain coincident points (i.e., the distance between them vanishes); these are treated as distinct.

The choice of bucket is a tradeoff between space and efficiency. A larger bucket decreases the size of the NearestNeighbor object which scales as pts.size() / max(1, bucket) and reduces the number of distance calculations to construct the object by log2(bucket) * pts.size(). However each search then requires about bucket additional distance calculations.

Warning
The distances computed by dist must satisfy the standard metric conditions. If not, the results are undefined. Neither the data in pts nor the query points should contain NaNs or infinities because such data violates the metric conditions.
The same arguments pts and dist must be provided to the Search() function.

Definition at line 149 of file NearestNeighbor.hpp.

References GeographicLib::NearestNeighbor< dist_t, pos_t, distfun_t >::Initialize().

Member Function Documentation

◆ Initialize()

template<typename dist_t , typename pos_t , class distfun_t >
void GeographicLib::NearestNeighbor< dist_t, pos_t, distfun_t >::Initialize ( const std::vector< pos_t > &  pts,
const distfun_t &  dist,
int  bucket = 4 
)
inline

Initialize or re-initialize NearestNeighbor.

Parameters
[in]ptsa vector of points to include in the tree.
[in]distthe distance function object.
[in]bucketthe size of the buckets at the leaf nodes; this must lie in [0, 2 + 4*sizeof(dist_t)/sizeof(int)] (default 4).
Exceptions
GeographicErrif the value of bucket is out of bounds or the size of pts is too big for an int.
std::bad_allocif memory for the tree can't be allocated.

See also the documentation on the constructor.

If an exception is thrown, the state of the NearestNeighbor is unchanged.

Definition at line 170 of file NearestNeighbor.hpp.

Referenced by GeographicLib::NearestNeighbor< dist_t, pos_t, distfun_t >::NearestNeighbor().

◆ Search()

template<typename dist_t , typename pos_t , class distfun_t >
dist_t GeographicLib::NearestNeighbor< dist_t, pos_t, distfun_t >::Search ( const std::vector< pos_t > &  pts,
const distfun_t &  dist,
const pos_t &  query,
std::vector< int > &  ind,
int  k = 1,
dist_t  maxdist = std::numeric_limits<dist_t>::max(),
dist_t  mindist = -1,
bool  exhaustive = true,
dist_t  tol = 0 
) const
inline

Search the NearestNeighbor.

Parameters
[in]ptsthe vector of points used for initialization.
[in]distthe distance function object used for initialization.
[in]querythe query point.
[out]inda vector of indices to the closest points found.
[in]kthe number of points to search for (default = 1).
[in]maxdistonly return points with distances of maxdist or less from query (default is the maximum dist_t).
[in]mindistonly return points with distances of more than mindist from query (default = −1).
[in]exhaustivewhether to do an exhaustive search (default true).
[in]tolthe tolerance on the results (default 0).
Returns
the distance to the closest point found (−1 if no points are found).
Exceptions
GeographicErrif pts has a different size from that used to construct the object.

The indices returned in ind are sorted by distance from query (closest first).

The simplest invocation is with just the 4 non-optional arguments. This returns the closest distance and the index to the closest point in ind0. If there are several points equally close, then ind0 gives the index of an arbirary one of them. If there's no closest point (because the set of points is empty), then ind is empty and −1 is returned.

With exhaustive = true and tol = 0 (their default values), this finds the indices of k closest neighbors to query whose distances to query are in (mindist, maxdist]. If mindist and maxdist have their default values, then these bounds have no effect. If query is one of the points in the tree, then set mindist = 0 to prevent this point (and other coincident points) from being returned.

If exhaustive = false, exit as soon as k results satisfying the distance criteria are found. If less than k results are returned then the search was exhaustive even if exhaustive = false.

If tol is positive, do an approximate search; in this case the results are to be interpreted as follows: if the k'th distance is dk, then all results with distances less than or equal dktol are correct; all others are suspect — there may be other closer results with distances greater or equal to dktol. If less than k results are found, then the search is exact.

mindist should be used to exclude a "small" neighborhood of the query point (relative to the average spacing of the data). If mindist is large, the efficiency of the search deteriorates.

Note
Only the shortest distance is returned (as as the function value). The distances to other points (indexed by indj for j > 0) can be found by invoking dist again.
Warning
The arguments pts and dist must be identical to those used to initialize the NearestNeighbor; if not, this function will return some meaningless result (however, if the size of pts is wrong, this function throw an exception).
The query point cannot be a NaN or infinite because then the metric conditions are violated.

Definition at line 258 of file NearestNeighbor.hpp.

◆ NumPoints()

template<typename dist_t , typename pos_t , class distfun_t >
int GeographicLib::NearestNeighbor< dist_t, pos_t, distfun_t >::NumPoints ( ) const
inline
Returns
the total number of points in the set.

Definition at line 355 of file NearestNeighbor.hpp.

◆ Save()

template<typename dist_t , typename pos_t , class distfun_t >
void GeographicLib::NearestNeighbor< dist_t, pos_t, distfun_t >::Save ( std::ostream &  os,
bool  bin = true 
) const
inline

Write the object to an I/O stream.

Parameters
[in,out]osthe stream to write to.
[in]binif true (the default) save in binary mode.
Exceptions
std::bad_allocif memory for the string representation of the object can't be allocated.

The counters tracking the statistics of searches are not saved; however the initializtion cost is saved. The format of the binary saves is not portable.

Note
Boost serialization can also be used to save and restore a NearestNeighbor object. This requires that the GEOGRAPHICLIB_HAVE_BOOST_SERIALIZATION macro be defined.

Definition at line 374 of file NearestNeighbor.hpp.

◆ Load()

template<typename dist_t , typename pos_t , class distfun_t >
void GeographicLib::NearestNeighbor< dist_t, pos_t, distfun_t >::Load ( std::istream &  is,
bool  bin = true 
)
inline

Read the object from an I/O stream.

Parameters
[in,out]isthe stream to read from
[in]binif true (the default) load in binary mode.
Exceptions
GeographicErrif the state read from is is illegal.
std::bad_allocif memory for the tree can't be allocated.

The counters tracking the statistics of searches are reset by this operation. Binary data must have been saved on a machine with the same architecture. If an exception is thrown, the state of the NearestNeighbor is unchanged.

Note
Boost serialization can also be used to save and restore a NearestNeighbor object. This requires that the GEOGRAPHICLIB_HAVE_BOOST_SERIALIZATION macro be defined.
Warning
The same arguments pts and dist used for initialization must be provided to the Search() function.

Definition at line 452 of file NearestNeighbor.hpp.

◆ swap()

template<typename dist_t , typename pos_t , class distfun_t >
void GeographicLib::NearestNeighbor< dist_t, pos_t, distfun_t >::swap ( NearestNeighbor< dist_t, pos_t, distfun_t > &  t)
inline

Swap with another NearestNeighbor object.

Parameters
[in,out]tthe NearestNeighbor object to swap with.

Definition at line 560 of file NearestNeighbor.hpp.

References std::swap().

Referenced by std::swap().

◆ Statistics()

template<typename dist_t , typename pos_t , class distfun_t >
void GeographicLib::NearestNeighbor< dist_t, pos_t, distfun_t >::Statistics ( int &  setupcost,
int &  numsearches,
int &  searchcost,
int &  mincost,
int &  maxcost,
double &  mean,
double &  sd 
) const
inline

The accumulated statistics on the searches so far.

Parameters
[out]setupcostthe cost of initializing the NearestNeighbor.
[out]numsearchesthe number of calls to Search().
[out]searchcostthe total cost of the calls to Search().
[out]mincostthe minimum cost of a Search().
[out]maxcostthe maximum cost of a Search().
[out]meanthe mean cost of a Search().
[out]sdthe standard deviation in the cost of a Search().

Here "cost" measures the number of distance calculations needed. Note that the accumulation of statistics is not thread safe.

Definition at line 587 of file NearestNeighbor.hpp.

◆ ResetStatistics()

template<typename dist_t , typename pos_t , class distfun_t >
void GeographicLib::NearestNeighbor< dist_t, pos_t, distfun_t >::ResetStatistics ( ) const
inline

Reset the counters for the accumulated statistics on the searches so far.

Definition at line 599 of file NearestNeighbor.hpp.

Friends And Related Function Documentation

◆ operator<<

template<typename dist_t , typename pos_t , class distfun_t >
std::ostream & operator<< ( std::ostream &  os,
const NearestNeighbor< dist_t, pos_t, distfun_t > &  t 
)
friend

Write the object to stream os as text.

Parameters
[in,out]osthe output stream.
[in]tthe NearestNeighbor object to be saved.
Exceptions
std::bad_allocif memory for the string representation of the object can't be allocated.

Definition at line 541 of file NearestNeighbor.hpp.

◆ operator>>

template<typename dist_t , typename pos_t , class distfun_t >
std::istream & operator>> ( std::istream &  is,
NearestNeighbor< dist_t, pos_t, distfun_t > &  t 
)
friend

Read the object from stream is as text.

Parameters
[in,out]isthe input stream.
[out]tthe NearestNeighbor object to be loaded.
Exceptions
GeographicErrif the state read from is is illegal.
std::bad_allocif memory for the tree can't be allocated.

Definition at line 552 of file NearestNeighbor.hpp.


The documentation for this class was generated from the following file: