Skip to content

Commit b2041ea

Browse files
wanguoweiJackFu123
authored andcommitted
update inter-tnt doc
Change-Id: I62c7bd3a9ca7781de780dbe99b71d55a94aaded9
1 parent 0dc8e3c commit b2041ea

File tree

1 file changed

+22
-24
lines changed

1 file changed

+22
-24
lines changed
Lines changed: 22 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,10 @@
1-
# Jointly VectorNet-TNT-Interaction Evaluator
2-
3-
# Introduction
1+
# Inter-TNT (Jointly VectorNet-TNT-Interaction) Evaluator
42

53
The prediction module comprises 4 main functionalities: Container, Scenario, Evaluator and Predictor.
64

7-
An Evaluator predicts paths and speeds for any given obstacles. An evaluator evaluates a path(lane sequence) with a probability by the given model stored in prediction/data/.
5+
An Evaluator predicts trajectories and speeds for surrounding obstacles of autonomous vehicle. An evaluator evaluates a path(lane sequence) with a probability by the given model stored in prediction/data/.
86

9-
In Apollo 7.0, a new model named Jointly VectorNet-TNT-Interaction Evaluator is
10-
used to generate short term trajectory points. This model uses VectorNet as
11-
encoder and TNT as decoder, in which the planning information of ADC is inserted
12-
at the end. For scenarios at junction, the obstacle trajectory prediction by the
13-
new model is more accurate according to the results of our experiments and road tests.
7+
In Apollo 7.0, a new model named Inter-TNT is introduced to generate short-term trajectories. This model applys VectorNet as encoder and TNT as decoder, and latest planning trajectory of autonomous vehicle is used to interact with surrounding obstacles. Compared with the prediction model based on semantic map released in Apollo 6.0, the performance is increased by more than 20% in terms of minADE and minFDE, and the inference time is reduced from 15 ms to 10 ms.
148

159
![Diagram](images/interaction_model_fig_1.png)
1610

@@ -30,33 +24,37 @@ Please refer [interaction filter](https://github.com/ApolloAuto/apollo/tree/mast
3024
void AssignInteractiveTag();
3125
```
3226

33-
# Algorithm Detail
34-
## Structure
27+
# Network Architecture
28+
The network architecutre of the proposed "Inter-TNT" is illustrated as below. The entire network is composed of three modules: an vectorized encoder, a target-driven decoder, and an interaction module. The vectorized trajectories of obstacles and autonomous vehicle (AV), along with HD maps, are first fed into the vectorized encoder to extract features. The target-driven decoder takes the extracted features as input and generate multi-modal trajectories for each obstacle. The main contribution of the proposed network is introducing an interaction mechanism which could measure the interaction between obstacles and autonomous vehicle by re-weighting confidences of multi-modal trajectories.
29+
3530
![Diagram](images/VectorNet-TNT-Interaction.png)
3631

3732
## Encoder
38-
Basically, the encoder is mainly using an [VectorNet](https://arxiv.org/abs/2005.04259). The ADC trajectory and all obstacle trajectories in the form of coordinate points are transformed into polylines. For each polyline, vectors are connected sequentially, where it contains start point, end point, obstacle length and some other attributes of vector. All points are transformed in the coordinate of the ADC, with North as the direction and (0, 0) as the position for ADC at time 0.
33+
Basically, the encoder is mainly using an [VectorNet](https://arxiv.org/abs/2005.04259).
3934

40-
After that, map information is extracted from HDMap files. As structures of lane/road/junction/crosswalk are depicted in points in HDMap, they are also processed as polylines whose vectors contain the types of traffic elements. In order to speed up calculation in Network, only information in ROI is taken into consideration.
35+
### Representation
36+
The trajectories of AV all obstacles are represented as polylines in the form of sequential coordinate points. For each polyline, it contains start point, end point, obstacle length and some other attributes of vector. All points are transformed to the AV coordinate with North as the y-axis and (0, 0) as the position for ADC at time 0.
4137

42-
A Subgraph is used to deal with each polyline and the feature of polyline is acquired. After that, the polyline features are inputted into a Global Graph, which is a GNN Network substantially.
38+
After that, map elements are extracted from HDMap files. As elements of lane/road/junction/crosswalk are depicted in points in HD map, they are conveniently processed as polylines.
4339

44-
The encoding feature is gained after encoding of VectorNet. For more information in detail, please find the References.
40+
### VectorNet
41+
The polyline features are first extracted from a subgraph network and further fed into a globalgraph network (GCN) to encode contextual information.
4542

4643
## Decoder
47-
The structure of Decoder is a [TNT model](https://arxiv.org/abs/2008.08294).
44+
Our decoder implementation mainly follows the [TNT](https://arxiv.org/abs/2008.08294) paper. There are three steps in TNT. For more details, please refer to the original paper.
4845

49-
There are three parts in TNT.
50-
1)Target Prediction. N points around the ADC is grid-sampled and M points are selected as target points. These target points are used as the final points of prediction trajectories in the next step.
51-
2)Motion Estimation. With VectorNet feature and target points, a prediction trajectory is generated for each selected target point. To train more efficiently, a teaching method using true trajectory is also used in this step.
52-
3)Scoring and Selection. A score is calculated for each motion trajectory in the second step, which is used to select final trajectories.
46+
### Target Prediction
47+
For each obstacle, N points around the AV are uniformly sampled and M points are selected as target points. These target points are considered to be the potential final points of the predicted trajectories.
5348

54-
## Interactive Planning
55-
After three steps in TNT, K predicted trajectories are assumed to be outputted. There is a latest planning trajectory with the same form, with which the distance of position and velocity can be calculated, named as (cost1, cost2) respectively. Meanwhile, weights of (cost1, cost2) are also outputted by VectorNet, outputting the final cost by multiplication.
49+
### Motion Estimation
50+
After selecting the potential target points, M trajectories are generated for each obstacle with its corresponding feature from encoder as input.
5651

57-
Note we can also find a cost with the truth obstacle trajectory and ADC planning, thus producing the true value of cost. That's how the loss is calculated in this step.
52+
### Scoring and Selection
53+
Finally, a scoring and selection module is performed to generate likelihood scores of the M trajectories for each obstacle, and select a final set of trajectory predictions by likelihood scores.
5854

55+
## Interaction with Planning Trajectory
56+
After TNT decoder, K predicted trajectories for each obstacle are generated. In order to measure the interaction between AV and obstacles, we calculate the postion and velocity differences between the latest planning trajectory and predicted obstacle trajectories. Note that we can also calculate a cost between the ground truth obstacle trajectory and AV planning trajectory, thus producing the true costs. That's how the loss is calculated in this step.
5957

6058
# References
6159
1. Gao, Jiyang, et al. "Vectornet: Encoding hd maps and agent dynamics from vectorized representation." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020.
62-
2.Zhao, Hang, et al. "Tnt: Target-driven trajectory prediction." arXiv preprint arXiv:2008.08294 (2020).
60+
2. Zhao, Hang, et al. "Tnt: Target-driven trajectory prediction." arXiv preprint arXiv:2008.08294 (2020).

0 commit comments

Comments
 (0)