-
Notifications
You must be signed in to change notification settings - Fork 38
Tutorial Using Edge Based Techniques Part 2
Under Construction: More content coming soon.
In Part I of this series of wiki articles, we described how to use Tinfour's pinwheel iterator to identify the immediate neighbors of a vertex in a Delaunay triangulation. In this second part of the Tutorial Using Edge-Based Techniques we will look at how we can use that concept to implement an application that detects anomalous sample points within a data survey.
The demonstration application we present here is based on the algorithms and ideas described in the Tinfour Project Notes webpage Using the Delaunay triangulation to detect anomalies in unstructured data. That article offered a real-world example that detected problematic temperature reports in a set of weather observations collected for a single hour in March of 2019. At that time, there was a significant error in data reported by a weather station located on an oil-drilling platform in the Gulf of Mexico. The article describes how errors and other anomalies could be detected by comparing data for one vertex to values obtained for its neighbors. In general, Tinfour's Project Notes center on concepts that could be applied using any software library that supported the Delaunay triangulation while these wiki pages focus on concrete implementations using the Tinfour API. So in these notes, we will look at the specific code techniques we used to obtain the results in the Project Notes.
In the aviation and weather communities, real-time meteorological data is collected from a global network of weather stations which report observations once an hour. These reports are issued in text-based messages using a standard format known as the Meterological Aerodrome Report, or METAR. For this tutorial, we downloaded historical METARs from the U.S. National Center for Atmospheric Research (NCAR). While the processing of the raw data is outside the scope of this article, the data that we use in the example code below consists of a Java array of objects of type Metar with the following layout:
class Metar {
String stationID; // Four uppercase letters or numerical digits
double latitude; // in degrees
double longitude; // in range -180 up to, but not including, 180
int temperature; // degrees C
};
METAR messages are always identified by a four-character station ID. For weather stations associated with airports, the station ID will be the International Civil Aviation Organization (ICAO) code for that airport (i.e. KLAX for Los Angeles International, RKSI for Seoul, Korea, etc.). The figure below depicts temperature data for the anomalous data reported by station KHHV in 2019. KHHV is an automated reportng station that operates on an oil drilling platform located in the Gulf of Mexico. It is located over an underwater feature known as the Alaminos Canyon. The Delaunay triangulation shown in the image used METAR locations as vertices. The color-coded temperature data was interpolated from the Delaunay triangulation using Tinfour's Natural Neighbor Interpolator.

One thing to note in the image above is that the Delaunay captures weather-reporting stations in all directions from KHHV. This feature will be important in our discussion below.
Now lets turn our attention to implementation details. Although the following description focuses on the process of identifying anomalous METAR samples, the ideas apply to a broad range of data types.
- Populate a list of vertices
- The vertex index provides a way of relating the vertex to the source observation (source METAR)
- The geographic coordinates from the source data become the Cartesian coordinates for the vertex.
- Use the list of vertices to populate a Delaunay triangulation (an instance of Tinfour's IncrementalTin class).
- Test each vertex in the Delaunay for consistency with its neighbors
- Loop through all edges in the triangulation
- Test for the first vertex in the edge (because a vertex is usually part of multiple edges, vertices are marked as "processed" as the loop progresses) so that the code does not try to process them more than once.
- Use the Tinfour pinwheel() iterator to inspect the neighbors for potential conflicts.
- Keep a record of "potential-conflict score for future evaluation.
- At the completion of the loop, inspect conflict scores to see which vertices have unusually large conflicts with neighbors. These vertices are considered anomalies.
The Tinfour API is designed to process vertices distributed over a 2D coordinate plane. The Tinfour Vertex class includes elements x and y which usually provide Cartesian coordinates. So to process METARs, which are obtained from points located on the surface of the Earth, we need some method for projecting geographic coordinates (latitude and longitude) to a Cartesian form. This kind of coordinate transformation is a major topic in Geographic Information Systems (GIS) and cartographic applications in general. For our purposes, we will just use a simplified version of the Mercator projection which is commonly used in web-based map applications (Mapquest, Google Maps, Carto, Mapbox, etc.).
The Mercator projection has the advantage that it can be used in interactive map applications to allow a user to continuously pan East and West without interruption. That feature makes the projection especially attractive when looking at low to medium latitudes (when cartographers speak of low and medium latitudes, they are typically referring to the magnitude of the latitude, rather than whether it is North or South). However, for higher latitudes, those larger than 60 degrees North or South, the Mercator introduces so much distortion of scale that it becomes problematic. So, if we were concerned about weather observations for higher latitudes, we would need to implement special handling for those stations. For purposes of this example code, we will ignore that consideration and focus on the problem at hand.
The following Java method maps a latitude and longitude to a point on the Cartesian plan. The coordinates are scaled based on the radius of the Earth in kilometers. Although it provides limited accuracy, it is sufficient for our purposes:
static Point2D geo2Merc(double lat, double lon) {
double x = Math.toRadians(lon)* EARTH_RADIUS_KM;
double phi = Math.toRadians(lat);
double y = Math.log(Math.tan(Math.PI / 4 + phi / 2)) * EARTH_RADIUS_KM;
return new Point2D.Double(x, y);
}
For the METAR data set, longitudes are expressed in the range -180 <= longitude < 180 with negative values being to the West and positive values to the East. Zero longitude is given at the Prime Meridian which is located based on Greenwich, England.
Longitude presents a special problem when connecting vertices in a Delaunay triangulation. If we have a weather station located at 1 degree West, or -1.0 degrees, and another located 2 degrees of longitude to its east, the second station will be assigned the longitude value +1.0. For longitudes near zero degrees (the Prime Meridian), simple arithmetic suffices. But if we have a weather station located at 179 degreees East, and another located 2 degrees further to its east, the second station will be assigned a longitude value of -179. This potentially problematic behavior occurs because there is a "seam" in the longitude coordinate system at 180 degrees East/West (the general location of the International Date Line).
When constructing the 2D Delaunay triangulation for METAR locations on the surface of the Earth, we need somew ay to ensure that each METAR vertex has a link to stations across across all directions from its position. Without special handling, vertices located near the International Date Line may be missing to close neighbors located on the other side of the 180 degree transition. There are different ways to do so. For this particular example, we will exploit the fact that angle measurements are cyclical. Angles 360 degrees apart are equivalent. An angle of negative 90 degrees identifies the same location as an angle of positive 240. So when we construct vertices for stations located near the International Date Line, we can build an equivalent supplemental vertex based on a longitude 360 degrees different from the primary METAR location.
The following code uses the array of Metar objects to populate a list of vertices for constructing a Delaunay triangulation. Vertices are marked as either ORIGINAL or SUPPLEMENTAL based on whether they are located at the source longitude or at the equivalent coordinate. The index element of each vertex is populated with the array index of the METAR from which it was created. This index allows the example code to trace a vertex back to the Metar from which it was derived. Later on, when we process the vertices to test whether their associated METAR is an anomaly, the code will use the auxilliary code (AUX_ORIGINAL or AUX_SUPPLEMENT) ensure that it only processes the original vertices. So while some of the supplemental vertices at the far ends of the longitude range may lack a complete set of neighbors, it will not matter because they will not be processed.
List<Vertex> vertices = new ArrayList<>();
for (int iMetar = 0; iMetar < metars.length; iMetar++) {
// when the vertex is instantiated, it is assigned the index iMetar
// which allows the application to map a vertex back to its source METAR.
Metar metar = metars[iMetar];
Point2D p = geo2Merc(metar.latitude, metar.longitude); // coordinate projection
Vertex v = new Vertex(p.getX(), p.getY(), metar.temperature, iMetar);
v.setAuxiliaryIndex(AUX_ORIGINAL);
vertices.add(v);
// The METAR longitudes are in the range -180 <= lon < 180.
// we wish to duplicate some of them to ensure that the
// triangulation spans the International Date Line.
if (Math.abs(metar.longitude) > 90) {
double lon = metar.longitude;
if (lon < 0) {
lon += 360;
} else {
lon -= 360;
}
p = geo2Merc(metar.latitude, lon);
v = new Vertex(p.getX(), p.getY(), metar.temperature, iMetar);
v.setAuxiliaryIndex(AUX_SUPPLEMENT);
vertices.add(v);
}
}
IncrementalTin tin = new IncrementalTin(1.0);
tin.add(vertices, null);
The Delaunay-based anomaly detection algorithm looks for cases where the value of a particular sample is not supported by those of its neighbors. In practice, the way we define not supported depends on the nature of the phenomenon that we are measuring. But for many data sets, we can use a simple rate of change criterion. We assume that there is a limit to how fast the value of a data element is allowed to change. Any sample that represents too large a change with regard to its distance from its neighbors is treated as an anomaly.
We define the following rules for testing to see if the METAR associated with a vertex is an anomaly:
- The sample represents a local extremum (either a minimum or a maximum).
- The magnitude of the average slope across all neighbors exceeds a reasonable threshold.
One consideration in applying these rules is that we may not know what constitutes a reasonable threshold for an average rate of change until after we have processed an entire data set. So the code that follows maintains an array of all conflict scores and then uses it to evaluate which scores represent significant deviations from the general population.
In the code below, we use Tinfour's edgesAndDuals() iterator to loop through all edges in the triangulation. The code extracts the starting vertex from each edge (vertex A) and tests to see if it is an original vertex. If it is, the vertex is checked to see whether it might carry an anomalous value. The METAR associated with a vertex is obtained by using the Tinfour Vertex class' getIndex() method to obtain an array index into the metars array.
In the case of METAR data, the rate of change for each neighbor is simply the difference in temperature divided by the geographic distance between the two METAR sites. Rather than using a simple average across neighbors, the logic below uses a weighted average. To get weighting factors for each neighbor, we use a second IncrementalTin instance to build a Delaunay triangulation consisting of just the neighbors. In the code, this second triangulation is named helperTIN. An associated Natural Neighbor Interpolator instance, helperNNI is used to compute the weights.
Tinfour includes a utility class named BasicSamplesTabulator that is useful for keeping track of simple statistics such as percentile values. As scores are computed, they are recorded by the tabulator so that code will be able to establish thresholds for detecting anomalies later in the process.
IncrementalTin helperTIN = new IncrementalTin(1.0);
NaturalNeighborInterpolator helperNNI = new NaturalNeighborInterpolator(helperTIN);
BasicSamplesTabulator tabulator = new BasicSamplesTabulator();
float[] conflictScores = new float[metars.length];
for (IQuadEdge e : tin.edgesAndDuals()) {
Vertex A = e.getA();
// if the vertex is a supplement, or if it is already processed, skip it
if (A.getAuxiliaryIndex() != AUX_ORIGINAL) {
continue;
}
A.setAuxiliaryIndex(AUX_PROCESSED); // marked as "already processed"
Metar m0 = metars[A.getIndex()];
List<Vertex> neighbors = new ArrayList<>();
boolean problematic = false;
for (IQuadEdge p : e.pinwheel()) {
Vertex B = p.getB();
if (B == null) {
// should happen only at very high latitudes
problematic = true;
break;
}
neighbors.add(B);
}
if (problematic) {
continue;
}
// use the Natural Neighbor class to get weights for each neighbor
// The elements contents will include all neighbors collected above
// but may not give them in the same order as the list. Note that
// the sum of weights always equals 1.0
helperTIN.clear(); // eliminate any previously stored information
helperTIN.add(neighbors, null);
helperNNI.resetForChangeToTin();
var elements = helperNNI.getNaturalNeighborElements(A.getX(), A.getY());
Vertex[] vTest = elements.getNaturalNeighbors();
double[] weights = elements.getSibsonCoordinates();
int nPos = 0;
int nNeg = 0;
double sumWeightedSlopes = 0;
for (int i = 0; i < vTest.length; i++) {
Metar m1 = metars[vTest[i].getIndex()];
double s = GCDistance.distanceM(
m0.latitude, m0.longitude,
m1.latitude, m1.longitude) / 1000.0;
double deltaT = m0.temperature - m1.temperature;
if (deltaT >= 0) {
nPos++;
} else {
nNeg++;
}
double slope = deltaT / s;
sumWeightedSlopes += slope * weights[i];
}
if (problematic) {
continue;
}
tabulator.recordSample(sumWeightedSlopes);
if (nPos > 0 && nNeg > 0) {
// the vertex is not a local extremum, skip it.
continue;
}
conflictScores[A.getIndex()] = (float) sumWeightedSlopes;
}
The conflict scores computed above are signed quantities. We could have used absolute values for that purpose, but we thought that we might see slightly different behaviors for locally high and locally low temperatures. We obtain threshold parameters by asking the BasicSamplesTabulator instance for the 99.9th percentile scores (locally high sample valuess) and the 0.1th percentile scores (locally low sample values).
double percentile0 = 0.1;
double percentile1 = 99.9;
double threshold0 = tabulator.getValueForPercentile(percentile0);
double threshold1 = tabulator.getValueForPercentile(percentile1);
System.out.println("\nLocal extrema with possible anomalous values");
System.out.format(" Lower threshold %5.2f percentile value: %7.4f \u00B0C/km%n",
percentile0, threshold0);
System.out.format(" Upper threshold %5.2f percentile value: %7.4f \u00B0C/km%n",
percentile1, threshold1);
System.out.println(" Station \u00B0C \u00B0C/km");
for (int i = 0; i < metars.length; i++) {
if (conflictScores[i] < threshold0 || conflictScores[i] > threshold1) {
System.out.format(" %4s %4.0f %7.4f%n",
metars[i].stationID, metars[i].temperature, conflictScores[i]);
}
}
The results are shown below (we added the place names for the stations as a reference):
Local extrema with possible anomalous values
Lower threshold 0.10 percentile value: -0.4271 °C/km
Upper threshold 99.90 percentile value: 0.4383 °C/km
Station °C °C/km
KMWN -11 -0.4642 Mount Washington,NH, USA
LIMH -10 -0.4308 Terminillo Mount, Italy
LIRK -2 -1.0194 Mt. Pian Rosa, Italy
KHHV 51 0.5344 Alaminos Canyon, Gulf of Mexico
KNKX 29 1.9457 Miramar NAS, CA, USA
KOWD 6 0.4449 Norwood, MA, USA