Skip to content

Tutorial Using Edge Based Techniques Part 2

gwlucastrig edited this page Feb 22, 2026 · 35 revisions

Under Construction: More content coming soon.

Introduction to Part 2

In Part I of this series of wiki articles, we described how to use Tinfour's pinwheel iterator to identify the immediate neighbors of a vertex in a Delaunay triangulation. In this second part of the Tutorial Using Edge-Based Techniques we will look at how we can use that concept to implement an application that detects anomalous sample points within a data survey.

The demonstration application we present here is based on the algorithms and ideas described in the Tinfour Project Notes webpage Using the Delaunay triangulation to detect anomalies in unstructured data. That article offered a real-world example that detected problematic temperature reports in a set of weather observations collected for a single hour in March of 2019. At that time, there was a significant error in data reported by a weather station located on an oil-drilling platform in the Gulf of Mexico. The article describes how errors and other anomalies could be detected by comparing data for one vertex to values obtained for its neighbors. In general, Tinfour's Project Notes center on concepts that could be applied using any software library that supported the Delaunay triangulation while these wiki pages focus on concrete implementations using the Tinfour API. So in these notes, we will look at the specific code techniques we used to obtain the results in the Project Notes.

The Test Data: Meterological Aerodrome Reports (METARs)

In the aviation and weather communities, real-time meteorological data is collected from a global network of weather stations which report observations once an hour. These reports are issued in text-based messages using a standard format known as the Meterological Aerodrome Report, or METAR. For this tutorial, we downloaded historical METARs from the U.S. National Center for Atmospheric Research (NCAR). While the processing of the raw text messages is outside the scope of this article, the data that we use in the example code below consists of a Java array of objects of type Metar with the following layout:

    class Metar {
        String stationID;    // Four uppercase letters or numerical digits
        double latitude;     // in degrees
        double longitude;    // in range -180 up to, but not including, 180
        int    temperature;  // degrees C
    };

METAR messages are always identified by a four-character station ID. For weather stations associated with airports, the station ID will be the International Civil Aviation Organization (ICAO) code for that airport (i.e. KLAX for Los Angeles International, RKSI for Seoul, Korea, etc.). The figure below depicts temperature data for the anomalous data reported by station KHHV in 2019. KHHV is an automated reportng station that operates on an oil drilling platform located in the Gulf of Mexico. It is located over an underwater feature known as the Alaminos Canyon. The Delaunay triangulation shown in the image used METAR locations as vertices. The color-coded temperature data was interpolated from the Delaunay triangulation using Tinfour's Natural Neighbor Interpolator.

One thing to note in the image above is that the Delaunay captures weather-reporting stations in all directions from KHHV. This feature will be important in our discussion below.

The Implementation

Now lets turn our attention to implementation details. Although the following description focuses on the process of identifying anomalous METAR samples, the ideas apply to a broad range of data types.

  1. Populate a list of vertices
    • The vertex index provides a way of relating the vertex to the source observation (source METAR)
    • The geographic coordinates from the source data become the Cartesian coordinates for the vertex.
  2. Use the list of vertices to populate a Delaunay triangulation (an instance of Tinfour's IncrementalTin class).
  3. Test each vertex in the Delaunay for consistency with its neighbors
    • Loop through all edges in the triangulation
    • Test for the first vertex in the edge (because a vertex is usually part of multiple edges, vertices are marked as "processed" as the loop progresses) so that the code does not try to process them more than once.
    • Use the Tinfour pinwheel() iterator to inspect the neighbors for potential conflicts.
    • Keep a record of "potential-conflict score for future evaluation.
  4. At the completion of the loop, inspect conflict scores to see which vertices have unusually large conflicts with neighbors. These vertices are considered anomalies.

Working with Geographic Coordinates

The Tinfour API is designed to process vertices distributed over a 2D coordinate plane. The Tinfour Vertex class includes elements x and y which usually provide Cartesian coordinates. So to process METARs, which are obtained from points located on the surface of the Earth, we need some method for projecting geographic coordinates (latitude and longitude) to a Cartesian form. This kind of coordinate transformation is a major topic in Geographic Information Systems (GIS) and cartographic applications in general. For our purposes, we will just use a simplified version of the Mercator projection which is commonly used in web-based map applications (Mapquest, Google Maps, Carto, Mapbox, etc.).

The Mercator projection has the advantage that it can be used in interactive map applications to allow a user to continuously pan East and West without interruption. That feature makes the projection especially attractive when looking at low to medium latitudes (when cartographers speak of low and medium latitudes, they are typically referring to the magnitude of the latitude, rather than whether it is North or South). However, for higher latitudes, those larger than 60 degrees North or South, the Mercator introduces so much distortion of scale that it becomes problematic. So, if we were concerned about weather observations for higher latitudes, we would need to implement special handling for those stations. For purposes of this example code, we will ignore that consideration and focus on the problem at hand.

The following Java method maps a latitude and longitude to a point on the Cartesian plan. The coordinates are scaled based on the radius of the Earth in kilometers. Although it provides limited accuracy, it is sufficient for our purposes:

  static Point2D geo2Merc(double lat, double lon) {
    double x   = Math.toRadians(lon)* EARTH_RADIUS_KM;
    double phi = Math.toRadians(lat);
    double y   = Math.log(Math.tan(Math.PI / 4 + phi / 2)) * EARTH_RADIUS_KM;
    return new Point2D.Double(x, y);
  }

For the METAR data set, longitudes are expressed in the range -180 <= longitude < 180 with negative values being to the West and positive values to the East. Zero longitude is given at the Prime Meridian which is located based on Greenwich, England.

Coordinate Discontinuity near the International Date Line

Longitude presents a special problem when connecting vertices in a Delaunay triangulation. By convention, longitudes in the Eastern Hemisphere are assigned positive coordinates and longitudes in the Western Hemisphere are assigned negative coordinates. Longitude 1°E is assigned a numeric value of 1. Longitude 1°W is assigned a numeric value of -1. And, if we start at a weather station located at 1 degree West, or -1.0, and move in a positive direction to a second station at 1 degree East, the second station will be assigned the numeric value +1.0. The stations are separated by two degrees of longitude and their numeric coordinates are separated by a difference of 2.0. Simple arithmetic is sufficient when comparing coordinates for longitudes near zero degrees (i.e. near the Prime Meridian).

Things work a bit differently if we start from a location in the Eastern Hemisphere near the International Date Line, and move a bit to the east (in a positive direction). We end up in the Western Hemisphere. Again, longitude coordinates in the Western Hemisphere are assigned negative values. And while locations with a fixed latitude and longitudes 179°E and 179°W may be located relatively close on the surface of the Earth, their corresponding numeric coordinates will be quite far apart: +179 and -179. This behavior complicates Delaunay vertex placement when projecting Earth locations to the 2D coordinate plane because it introduces an artificial discontinuity to the surface.

When constructing the 2D Delaunay triangulation for METAR locations on the surface of the Earth, we need some way to ensure that each METAR vertex has a link to stations across all directions from its position. Without special handling, vertices located near the International Date Line may be missing to close neighbors located on the other side of the 180 degree transition. There are different ways to do so. For this particular example, we will exploit the fact that angle measurements are cyclical. Angles 360 degrees apart are equivalent. An angle of negative 90 degrees identifies the same location as an angle of positive 240. So when we construct vertices for stations located near the International Date Line, we can build an equivalent supplemental vertex based on a longitude 360 degrees different from the primary METAR location.

The following code uses the array of Metar objects to populate a list of vertices for constructing a Delaunay triangulation. Vertices are marked as either ORIGINAL or SUPPLEMENTAL based on whether they are located at the source longitude or at the equivalent coordinate. The index element of each vertex is populated with the array index of the METAR from which it was created. This index allows the example code to trace a vertex back to the Metar from which it was derived. Later on, when we process the vertices to test whether their associated METAR is an anomaly, the code will use the auxilliary code (AUX_ORIGINAL or AUX_SUPPLEMENT) ensure that it only processes the original vertices. So while some of the supplemental vertices at the far ends of the longitude range may lack a complete set of neighbors, it will not matter because they will not be processed.

Code for Populating the Delaunay Triangulation

    static final int AUX_ORIGINAL   = 1;
    static final int AUX_SUPPLEMENT = 2;
    static final int AUX_PROCESSED  = 3;
  
    List<Vertex> vertices = new ArrayList<>();
    for (int iMetar = 0; iMetar < metars.length; iMetar++) {
      // when the vertex is instantiated, it is assigned the index iMetar
      // which allows the application to map a vertex back to its source METAR.
      Metar metar = metars[iMetar];
      Point2D p = geo2Merc(metar.latitude, metar.longitude); // coordinate projection
      Vertex v = new Vertex(p.getX(), p.getY(), metar.temperature, iMetar);
      v.setAuxiliaryIndex(AUX_ORIGINAL);
      vertices.add(v);

      // The METAR longitudes are in the range -180 <= lon < 180.
      // we wish to duplicate some of them to ensure that the
      // triangulation spans the International Date Line.
      if (Math.abs(metar.longitude) > 90) {
        double lon = metar.longitude;
        if (lon < 0) {
          lon += 360;
        } else {
          lon -= 360;
        }
        p = geo2Merc(metar.latitude, lon);
        v = new Vertex(p.getX(), p.getY(), metar.temperature, iMetar);
        v.setAuxiliaryIndex(AUX_SUPPLEMENT);
        vertices.add(v);
      }
    }

    IncrementalTin tin = new IncrementalTin(1.0);
    tin.add(vertices, null);

Processing the Vertices

The Delaunay-based anomaly detection algorithm looks for cases where the value of a particular sample is not supported by those of its neighbors. In practice, the way we define not supported depends on the nature of the phenomenon that we are measuring. But for many data sets, we can use a simple rate of change criterion. We assume that there is a limit to how fast the value of a data element is allowed to change. Any sample that represents too large a change with regard to its distance from its neighbors is treated as an anomaly.

We define the following rules for testing to see if the METAR associated with a vertex is an anomaly:

  • The sample represents a local extremum (either a minimum or a maximum).
  • The magnitude of the average slope across all neighbors exceeds a reasonable threshold.

One consideration in applying these rules is that we may not know what constitutes a reasonable threshold for an average rate of change until after we have processed an entire data set. So the code that follows maintains an array of all conflict scores and then uses it to evaluate which scores represent significant deviations from the general population.

In the code below, we use Tinfour's edgesAndDuals() iterator to loop through all edges in the triangulation. The code extracts the starting vertex from each edge (vertex A) and tests to see if it is an original vertex. If it is, the vertex is checked to see whether it might carry an anomalous value. The METAR associated with a vertex is obtained by using the Tinfour Vertex class' getIndex() method to obtain an array index into the metars array.

Computing Conflict-Score Rates of Change and Weighted Averages

In the case of METAR data, the rate of change for each neighbor is simply the difference in temperature divided by the geographic distance between the two METAR sites. Rather than using a simple average across neighbors, the logic below uses a weighted average. To get weighting factors for each neighbor, we use a second IncrementalTin instance to build a Delaunay triangulation consisting of just the neighbors. In the code, this second triangulation is named helperTIN. An associated Natural Neighbor Interpolator instance, helperNNI is used to compute the weights.

Tinfour includes a utility class named BasicSamplesTabulator that is useful for keeping track of simple statistics such as percentile values. As scores are computed, they are recorded by the tabulator so that code will be able to establish thresholds for detecting anomalies later in the process.

The Code for Collecting Conflict Scores

    IncrementalTin helperTIN = new IncrementalTin(1.0);
    NaturalNeighborInterpolator helperNNI = new NaturalNeighborInterpolator(helperTIN);
    BasicSamplesTabulator tabulator = new BasicSamplesTabulator();
    float[] conflictScores = new float[metars.length]; 

    for (IQuadEdge e : tin.edgesAndDuals()) {
      Vertex A = e.getA();
      // if the vertex is a supplement, or if it is already processed, skip it
      if (A.getAuxiliaryIndex() != AUX_ORIGINAL) {
        continue;
      }
      A.setAuxiliaryIndex(AUX_PROCESSED);  // marked as "already processed"
      Metar m0 = metars[A.getIndex()];

      List<Vertex> neighbors = new ArrayList<>();
      boolean problematic = false;
      for (IQuadEdge p : e.pinwheel()) {
        Vertex B = p.getB();
        if (B == null) {
          // should happen only at very high latitudes
          problematic = true;
          break;
        }
        neighbors.add(B);
      }
      if (problematic) {
        continue;
      }

      // use the Natural Neighbor class to get weights for each neighbor
      // The elements contents will include all neighbors collected above
      // but may not give them in the same order as the list.  Note that
      // the sum of weights always equals 1.0

      helperTIN.clear(); // eliminate any previously stored information
      helperTIN.add(neighbors, null);
      helperNNI.resetForChangeToTin();
      var elements = helperNNI.getNaturalNeighborElements(A.getX(), A.getY());
      Vertex[] vTest = elements.getNaturalNeighbors();
      double[] weights = elements.getSibsonCoordinates();

      int nPos = 0;
      int nNeg = 0;
      double sumWeightedSlopes = 0;
      for (int i = 0; i < vTest.length; i++) {
        Metar m1 = metars[vTest[i].getIndex()];
        double s = GCDistance.distanceM(
          m0.latitude, m0.longitude,
          m1.latitude, m1.longitude) / 1000.0;
        double deltaT = m0.temperature - m1.temperature;
        if (deltaT >= 0) {
          nPos++;
        } else {
          nNeg++;
        }
        double slope = deltaT / s;
        sumWeightedSlopes += slope * weights[i];
      }
      if (problematic) {
        continue;
      }
      tabulator.recordSample(sumWeightedSlopes);
      if (nPos > 0 && nNeg > 0) {
        // the vertex is not a local extremum, skip it.
        continue;
      }
      conflictScores[A.getIndex()] = (float) sumWeightedSlopes;
    }

Obtaining Conflict Thresholds and Finding Anomalies

The conflict scores computed above are signed quantities. We could have used absolute values for that purpose, but we thought that we might see slightly different behaviors for locally high and locally low temperatures. We obtain threshold parameters by asking the BasicSamplesTabulator instance for the 99.9th percentile scores (locally high sample valuess) and the 0.1th percentile scores (locally low sample values).

    double percentile0 = 0.1;
    double percentile1 = 99.9;
    double threshold0 = tabulator.getValueForPercentile(percentile0);
    double threshold1 = tabulator.getValueForPercentile(percentile1);

    System.out.println("\nLocal extrema with possible anomalous values");
    System.out.format(" Lower threshold %5.2f percentile value: %7.4f \u00B0C/km%n",
      percentile0, threshold0);
    System.out.format(" Upper threshold %5.2f percentile value: %7.4f \u00B0C/km%n",
      percentile1, threshold1);

    System.out.println(" Station  \u00B0C     \u00B0C/km");
    for (int i = 0; i < metars.length; i++) {
      if (conflictScores[i] < threshold0 || conflictScores[i] > threshold1) {
        System.out.format("  %4s  %4.0f    %7.4f%n",
          metars[i].stationID, metars[i].temperature, conflictScores[i]);
      }
    }

The Results

The results are shown below (we added the place names for the stations as a reference):

Local extrema with possible anomalous values
 Lower threshold  0.10 percentile value: -0.4271 °C/km
 Upper threshold 99.90 percentile value:  0.4383 °C/km
 Station  °C     °C/km

  KMWN   -11    -0.4642    Mount Washington,NH, USA
  LIMH   -10    -0.4308    Terminillo Mount, Italy
  LIRK    -2    -1.0194    Mt. Pian Rosa, Italy
  KHHV    51     0.5344    Alaminos Canyon, Gulf of Mexico
  KNKX    29     1.9457    Miramar NAS, CA, USA
  KOWD     6     0.4449    Norwood, MA, USA

Clone this wiki locally