Skip to content

Classification

Avgoustinos Vouros edited this page Jul 25, 2017 · 19 revisions

The classification process requires a default segmentation and a defaults labels to be selected.

Contents

  1. Classification Overview
  2. The Classification Panel
  3. Advanced Classification

Classification Overview

The classification process described in the publication of Gehring, Tiago V., et al. was based on finding an 'optimal' classifier which would successfully classify the segments. A 10-fold cross validation was used on different classifiers with different number of clusters which were tasked to classify the segments and the classification performance was assessed by taking into account three variables: the classification error, the percentage of the trajectories that was classified (coverage) and the percentage of undefined segments. The importance of the last variable was minor, because in case of high coverage and low error the undefined segments were not taking into account.

The new classification is based on classification boosting with majority voting meaning that various classifiers are generated and work together (form an ensemble) in order to complete the classification. The reason for using this approach is because some real-world problems are so complex that single algorithmic classification solutions are unable to achieve high performances. In addition, the particular problem of assigning trajectory segments to behavioural strategies can be subjected and it is also prone to human error because of the manual labelling. Thus multiple classification solutions can be used to create a more robust classification and to provide a degree of confidence on forming conclusions about the dataset under investigation.

Only strong classifiers are considered throughout the classification and their goodness is assessed by the cross-validation error; if the error is equal or more than 25% (default) then the classifier is considered week. Furthermore, apart from forming an ensemble the classification results of the classifiers are used to support the classification outcome of the ensemble.

The Classification Panel

classification overview

  • Default If a segmentation and its labels are selected, the default classification automatically run the Labelling Quality process for number of clusters from the number of different labels the user has provided increment by 2 up to 100 in order to detect and generate strong classifiers. Only classifiers with validation error lower than 25% are generated and are used in order to form an ensemble. Inside the ensemble a simple majority voting takes place meaning that for each segment votes are collected regarding in which class this segment belongs to and the strategy with the most votes wins; in case of equality the segment is marked as unclassified.

  • Advanced The advanced classification allows the user to generate his own specific classifiers, merge a portion of them or create multiple ensembles and also offers the ability of customising the majority voting by specifying a specific threshold. For more information refer to Advanced Classification

Advanced Classification

The advanced classification brings forth another window which is split into two parts.

The classifiers panel is used to specify for which segmentation and its labels which and how many classifiers will be created.

generate classifiers

  • Segmentation and Labels: One segmentation and its equivalent labels needs to be chosen from the two dropboxes.

  • Cluster: Each classifier is described by its number of clusters. The user has the ability to define specific numbers of clusters and for each one a classifier will be created. If two numbers are separated by ' : ', for example 15:17 then 3 classifiers will be created with number of clusters 15, 16, 17. Leaving this field empty will result in the generation of 30 random classifiers.

  • Generate Classifiers Pressing this button will result in the generation of classifiers depending on the options specified.

The merging panel is used to set up the merging procedure. Having generating the classifiers the box on the left side of the window will list the available classifiers pools (each segmentation has its own classifiers).

merge classifiers

  • Classifiers per group: Specifies how many classifiers will be used from the pool for the final classification of the segments.

  • Iterations: The classifiers used are selected randomly thus the final classification process can run multiple times and each time different sample of classifiers will be selected from the selected pool. The number of times for the final classification process to run is defined in this field.

  • Merging Rule: Currently only one rule is available, the majority voting. By pressing the button Rule Options the user can set a threshold for the winning strategy. For example if the strategy Thigmotaxis for a specific segment has 4 votes out of 10 and the Incursion strategy 5 out of 10 the Incursion wins but with a threshold of 80 (meaning 80%) there is no winning strategy thus this segment is marked as undefined. In case of draw then the segment is again marked as undefined.

Merge Pressing this button executes the merging procedure with the defined settings.

As an advanced merging technique there is also the ability to select multiple classifiers pools from the list. In this case equal number of classifiers will be taken from each pool and the majority rule will be applied only to the matched segments (since each classifier pool applies to a different segmentation given that both segmentations have the same length but different overlap some segments will be the same in both segmentations).

When the classification is finished the user can close this window and return to the main menu.

Similarity Check

Similarity check can be useful for the measuring of the classification results from two different classifiers pools. There is the option of performing similarity between 'files' which applies to two classifiers or two merged classifiers (the product of performing majority voting) or 'folders' which applies to two classifiers pools or two merged classifiers pools (the product of performing majority voting more than one times).

similarity check

Upon pressing Refresh button the For each classifier or classification the table will show how many segments for each strategy have been detected and their difference. In case of huge difference between two merged classifications, the inability of a specific classifiers pool to distinguish between certain classes (usually because not enough labels have been provided) may be detected, which in turn may cause errors on the consistency of the results.

Clone this wiki locally