Multivariate2D3D/Guidelines.Rmd at master · rfrelat/Multivariate2D3D · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
---
title: 'Introduction to multivariate analysis:  from 2D PCA to 3D PTA'
author: "Romain Frelat"
date: '20 october 2016 (last updates : 14 december 2020)'
output:
  html_document: null
  pdf_document: default
---

## Objectives:
At the end of this tutorial, you should be able to:

*	Run a principal component analysis (PCA) on a matrix (2D)
* Interpret the principal components (PC)
* Run a principal tensor analysis (PTA) on a array (3D)
* Interpret the principal tensors (PT)
*	Run a clustering analysis with Hierarchical Clustering
* **Understand what is a multivariate analysis, and when it can be useful**

This tutorial was initially designed as a half-day workshop for the statistical roundtable at the Leibniz center for tropical marine ecology (ZMT), Bremen, Germany. It is now the companion tutorial of an article published in PlosOne :
Frelat R, Lindegren M, Dencker TS, Floeter J, Fock HO, Sguotti C, Stäbler M, Otto SA and Möllmann C (2017). *Community ecology in 3D: Tensor decomposition reveals spatio-temporal dynamics of large ecological communities*. PLoS ONE, 12(11): e0188205. https://doi.org/10.1371/journal.pone.0188205


**The tutorial has been updated on 14th December 2020 with R 4.0.2, ade4 v1.7-16 and PTAk v1.3-34.**


## A. Getting ready:

### Get ready:

1. Download the zip file *Multivariate2D3D.zip* from [https://github.com/rfrelat/Multivariate2D3D/](https://github.com/rfrelat/Multivariate2D3D/raw/master/Multivariate2D3D.zip))
2. Unzip the archive in a new folder. The zip file contain the data (*IBTS_Tensor.Rdata*), the R-script (*script_Multivariate2D3D.R*) and the present document as a pdf
3. Open the R script *script_Multivariate2D3D.R* with your favourite R editor (RStudio is recommended)
4. Set the working directory (Session > Set Working Directory) to the directory where the script and the data are located.

### Load the package and needed functions
During this tutorial, we will use two R-packages *ade4* and *PTAk*.
```{r, message=FALSE, fig.path='figures/'}
library(ade4)
library(PTAk)
```

If you have an error message, check if the packages are installed correctly. If not, use the command: `install.packages(c("ade4", "PTAk"))`.


### Load the data set
The data is saved as a *Rdata* file, that can be loaded with the function `load()`.
```{r, comment ="", fig.path='figures/'}
load("IBTS_Tensor.Rdata")
dim(IBTS_tensor)
```
The variable called `IBTS_tensor` was loaded. It is an array with three dimension: 65 fish species in the first dimension, 31 years in the second dimension, and 7 roundfish areas (RA) in the third dimension.
To see the names of the dimension, you can type:
```{r, results='hide', fig.path='figures/'}
dimnames(IBTS_tensor)
```

***

![Description of the data set](figures/IBTSTensor.png)


Abundance data comes from the ICES DAtabase for TRAwl Surveys ([DATRAS]( http://datras.ices.dk/Home/Default.aspx)). The North Sea International Bottom Trawl Survey (NS-IBTS) is an international effort to sampled the demersal fish communities in the North Sea annually and consistently with a standard otter trawl net (*chalut Grande Ouverture Verticale*, GOV) hauled over the seabed for 30 min. The data is openly available online and the Catch per Unit Effort (CPUE) per length class and per area was downloaded for the first quarter of the period 1985 to 2015 for the roundfish area (RA) 1 to 7. Pre-processing included cleaning miss-indentified species, removing pelagic and the rare species, and transforming abundance values into a three dimensional array.

***

### Understanding the variables
While loading the data, we saw two different types of variables, quite unusual in R: `array` and `list`.

#### Array
The object `IBTS_Tensor` is an `array`. Array is a generalization of matrix, with more than 2 dimensions. It can only contain numbers. The dimension of the array is given by the function `dim()`, and the different elements are accessed with `[ ]`, similar to a `matrix` or a `data.frame`.
```{r, comment ="", fig.path='figures/'}
dim(IBTS_tensor)
IBTS_tensor[18,14,6] #Select one element, e.g. Abundance of Cod, in 1998, in RA6
IBTS_tensor[18,14,] #Select one vector, e.g. abundance of Cod in 1998
```
```{r, comment ="", eval=FALSE}
IBTS_tensor[18,,] #Select one matrix, e.g. abundance of Cod
```
```{r, comment ="", echo=FALSE}
head(IBTS_tensor[18,,])
```

#### List
The names of the dimensions of `IBTS_Tensor` are stored in a list. List can contain all kind of elements, without restriction on length or type (it can include elements of different lengths made of characters and numbers). The number of elements is given by function is `length()`, and the different elements are accessed by `[[ ]]`.

```{r, comment ="", fig.path='figures/'}
names_tensor <- dimnames(IBTS_tensor) # the list of names is stored in a new variable
length(names_tensor) #there are three elements in the list, one element for each dimension
names_tensor[[2]] #show the second element of the list, the names of the second dimension
names_tensor[[1]][18]#show the 18th element of the first element of the list
```

### Your turn:
1. What is the index (position in the array) of hake (*Merluccius merluccius*) in the data set ?
2. What is the abundance of hake (Merluccius merluccius) in 1988 in RA 1 ?
3. What is the abundance of hake between 2010 and 2015 in RA 1?
4. Can you show the evolution of hake abundance between 1985 and 2015 in RA 1?

```{r, dpi=150, echo = FALSE, collapse=TRUE, comment ="", fig.path='figures/', fig.width=5, fig.height=2.8}
par(mar=c(4, 4, 3, 1), las=1)
plot(names_tensor[[2]], IBTS_tensor[33,,1], type="l",
     xlab="Time", ylab="CPUE (n/h)", main="Abundance of hake in RA1")
```

#### Solution
```{r, eval = FALSE, collapse=TRUE, comment ="", fig.path='figures/'}
#1
which(names_tensor[[1]]=="Merluccius merluccius") # 33
#2
which(names_tensor[[2]]==1988) # 4
IBTS_tensor[33,4,1] # 0.83743
#3
which(names_tensor[[2]]%in%c(2010:2015)) #26 27 28 29 30 31
IBTS_tensor[33,26:31,1]
#4
plot(names_tensor[[2]], IBTS_tensor[33,,1], type="l",
     xlab="Time", ylab="CPUE (n/h)", main="Abundance of hake in RA1")
```


##B. Two dimensions: Principal Component Analysis

![Traditional multivariate analysis on matrix](figures/2DPCA_illu.png)

### B1. Preparing the data set

#### From 3D to 2D
The tensor is flattened into a 2D matrix. This section will study the spatial distribution of fishes in the North Sea. The abundance of fishes will be average over the period 1985-2015, losing the temporal information from the original data set.

A new matrix `IBTS_space` is created with the function `apply()`. It contains, in row,  the average abundances for the 7 RA, and in columns, the 65 species.
```{r, collapse=TRUE, comment ="", fig.path='figures/'}
IBTS_space <- apply(IBTS_tensor,c(3,1),mean)
dim(IBTS_space) #the new matrix has 7 rows (areas) and 65 columns (species)
```

#### Checking the distribution of the data
Principal component analysis (PCA), and other multivariate analysis in general, are sensible to outliers. So beforehand, the skewness of the data has to be checked, and if too skewed, it is recommended to log (or square root) transform the data.

```{r, collapse=TRUE, fig.show='hide', comment ="", fig.path='figures/'}
#boxplot() is used to look at the distribution of the data
boxplot(as.vector(IBTS_space), main="raw CPUE")
#The CPUE is very skewed, with few high outliers
#So data should be log transformed
IBTS_logspace <- log(IBTS_space+1)
#The new distribution of the log transformed CPUE
boxplot(as.vector(IBTS_logspace), main="log CPUE")
```

```{r, echo=FALSE, fig.path='figures/', fig.height=3, fig.width=6}
par(mfrow=c(1,2), mar=c(3,3,3,1))
boxplot(as.vector(IBTS_space), main="raw CPUE")
boxplot(as.vector(IBTS_logspace), main="log CPUE")
```

#### Scaling the data
The function used to run PCA normalized (i.e. center and scale) the variables by default. It is important to keep that step in mind. An illustration of the normalisation is given in the figures below.
```{r, collapse=TRUE, echo=FALSE, comment ="", fig.path='figures/', fig.height=4.5, fig.width=8}
par(mfrow=c(2,1), mar=c(2,4,3,1))
boxplot(IBTS_logspace, main="Abundance", ylab="CPUE in log",
        xaxt="n")
mtext("Species", side = 1, line = 0)
boxplot(scale(IBTS_logspace), main="Anomaly (= abundance normalized per species)",
        ylab="Anomaly", xaxt="n")
mtext("Species", side = 1, line = 0)
```

### B2. Principal Component Analysis

#### Run a PCA and choose the correct number of PC.
The PCA is run with the function `dudi.pca`. The data is normalized with the options `scale` and `center` set to `TRUE`. The function is interactive, showing you a scree plot: the variance explained by the successive Principal Components (PC)

***

PCA algorithm create many Principal Component (PC), but not all PCs perform well in simplifying the information. To choose the correct number of PCs, there are many conflicting methods. The one I recommend is the scree test (Cattell, 1966): based on the graphical detection of a bend in the distribution of the successive variance explained. As a comparison, a PCA run on random data will not find *strong* PC, i.e. PCs can not reduce the dimensionality of the data. On the contrary, in real world data, usually there is a bend in the successive variance explained by PC. Before the bend, the PCs reduce well the dimensionality of the data and should be kept (i.e. there is a significant pattern); after the bend, PC should be discarded (i.e. it is only noise). **All PC kept in the analysis should be interpreted.**

![Logic of the scree test to select the number of principal components](figures/SelectPC.png){width=45%}

***

```{r, eval=FALSE, fig.path='figures/'}
pca_space <- dudi.pca(IBTS_logspace, scale = TRUE, center = TRUE)
```

**Select the number of axes: **

```{r, echo=FALSE, comment ="", fig.height=2, fig.width=4, fig.path='figures/'}
pca_space <- dudi.pca(IBTS_logspace, scannf = FALSE, nf=2)
par(mar=c(0,4,0,0))
barplot(pca_space$eig/sum(pca_space$eig), ylab="Percentage of variance", xlab="PC")
mtext("PC", side = 1, line=1)
```

In our case, a bend can be seen on the scree plot after the second PC. So, **please type 2 and press Enter**.

The function `inertia.dudi` show how much variance is explained by the successive PC.
```{r, comment ="", fig.path='figures/'}
#To see how much variance the axes explain:
inertia.dudi(pca_space)$tot
```
In our case, the first PC explains 41% of the variance, the two first PC together explain 62% of the variance.

#### Interpretation of the PC.
The variable `pca_space`, result of the function `dudi.pca()`, contains two important data.frames:

- `co` with the scores of columns on the two PCs
- `li` with the scores of rows on the two PCs

We can see the projection in a table directly by typing the name of these objects preceded by `$` , or by using the graphical function `s.label()`.
```{r, comment ="", fig.height=3.5, fig.path='figures/'}
pca_space$li # or pca_space$co
par(mfrow=c(1,2), mar=c(0,0,0,0))
#Show the weight of the variables:
s.label(pca_space$li, xax=1, yax=2)
s.label(pca_space$co, xax=1, yax=2, clabel = 0.4)
```

The first PC (x-axis) make the difference between the northern NS (RA 1, 2 and 3 have negative weights, projected on the left side) and the southern NS (RA 5 and 6 have positive weights, projected on the right side).
The second PC (y-axis) make the difference between the South-Eastern NS (RA 7 have negative weight, projected on the lower side) and the rest of northern or western NS (RA 1 and 5 have positive weights, projected on the upper side).
The species projection (right side) allow us to see differences between species, but the high number of species makes it difficult to characterize the species. One solution is to group these species together, i.e. simplify the number of species into a smaller number of groups, and then characterize these new groups.

One important remark: **the sign of PCs is not informative**, we could multiply the PC scores by (-1) and get the same interpretation

### B3. Clustering species from their spatial distribution
Clustering is a subject by itself, here we will only see quickly one of its most famous method: hierarchical clustering. It works in 4 steps:

1. Compute the distances between each objects.
2. Build a tree according to a given joining criteria
3. Choose the number of cluster depending on the topology of the dendogram,
4. Create the clusters and interpret them

```{r, comment ="", fig.path='figures/', fig.height=5, fig.width=6}
#1. Compute the distance between species
dist_species <- dist(pca_space$co, method = "euclidean")

#2. Build a tree with Ward method
den <- hclust(dist_species,method = "ward.D2")

#3. Plot the dendogram
par(mar=c(2,3,3,1))
plot(den, hang=-1, ax = T, ann=T, xlab="", sub="", cex=0.6)

#Choosing the number of cluster
nclust <- 5

#Visualize the cutting
rect.hclust(den, k=nclust, border="dimgrey")
```

There are different linkage criteria (criterion used to group two objects). The most common ones are: *Single* (minimum distance between elements of each cluster), *Complete* (maximum distance between elements of each cluster), *Average* (mean distance between elements of each cluster, also called UPGMA) and *Ward* (decrease in variance for the cluster being merged). *Ward* linkage (Ward, 1963) is known to be more suitable for spherical data, and in most of the case, gives interesting results. I invite you to try other linkage criteria by changing the parameter `method = ` in the function `hclust`. Do you find the same clusters?

In the graph above, the question is where to put a horizontal line on the dendogram to create the clusters. The number of clusters should not be too sensitive of the height of the line. In our case, 5 clusters seem appropriate. We can now visualize how the hierarchical clustering grouped the species in 5 clusters on the two first PC.

```{r, comment ="", fig.path='figures/'}
#4. Create the clusters
clust_space <- as.factor(cutree(den, k=nclust))

#Visualize the cluster in the PC axis
s.class(pca_space$co,fac=clust_space, col=rainbow(nclust),xax=1,yax=2)
```

These clusters should be interpreted with the previous interpretation of the PC. For example,

* Cluster 1 (in red) has high value in PC1, and above average value in PC2. So it groups species that lives in the southern North Sea (high PC1), with a preference in the south-western side (high PC2).
* Cluster 2  (in yellow in the graph above) groups species mainly located in entrance to the Skagerrak, RA7 (low PC2) but that can spread in the north (low PC1). This cluster is heterogeneous, with the largest ellipse in the figure above.
* Cluster 3 (in green) groups species mainly located in southern NS (high PC1), with a preference on its eastern side (low PC2).

### Your turn:
1. How would you interpret the clusters 4 and 5 ?
2. How many species are located mainly in the south-west of the North Sea (i.e. grouped in cluster 1) ?
3. In which cluster is grouped Saithe (*Pollachius virens*)?

#### Solution
```{r, comment ="", fig.path='figures/'}
#1.
```
Cluster 4 (in blue) groups species spread exclusively in the northern NS (high PC1 and high PC2). Cluster 5 (in purple) groups species that are spread either in the northern extremity (RA1) or the southwestern community (RA5) but not in RA 7 (high PC2)

```{r, comment ="", fig.path='figures/'}
#2.
table(clust_space) #There are 14 species in cluster 1
#3.
clust_space[names_tensor[[1]]=="Pollachius virens"] #Saithe is in cluster 4
```


## C. Three dimension: Tensor Decomposition

![Tensor decomposition on a 3D array](figures/3DPTA_illu.png)


### C1. Preparing the data set

#### Checking the distribution of the data
Like with PCA, we have to check the skewness of the data and log transform it if it is highly skewed.

```{r, collapse=TRUE, fig.show='hide', comment ="", fig.path='figures/'}
#boxplot() is used to look at the distribution
boxplot(IBTS_tensor, main="raw CPUE")
#Data should be log transformed
IBTS_logtensor <- log(IBTS_tensor+1)
#The new distribution of the log tranformed CPUE
boxplot(IBTS_logtensor, main="log CPUE")
```

```{r, echo=FALSE, fig.path='figures/', fig.height=3, fig.width=6}
par(mfrow=c(1,2), mar=c(3,3,3,1))
boxplot(as.vector(IBTS_tensor), main="raw CPUE")
boxplot(as.vector(IBTS_logtensor), main="log CPUE")
```

#### Scaling the data
Contrary to PCA, the normalization of a tensor is not straightforward, and have to be done manually before running a Principal Tensor Analysis (PTA). Here, we decided to normalize the values per species and saved them in a new tensor `IBTS_logscale`.

```{r, collapse=TRUE, comment ="", fig.path='figures/'}
#Scaling per species
#Create a new empty array
IBTS_logscale <- array(0,dim=dim(IBTS_tensor))
#Loop scanning each species
for (i in 1:dim(IBTS_tensor)[1]){
  #Calculating the mean and sd of the log CPUE for species i
  ma<-mean(IBTS_logtensor[i,,])
  sa<-sd(IBTS_logtensor[i,,])
  #Saving the anomaly in the array
  IBTS_logscale[i,,]<-(IBTS_logtensor[i,,]-ma)/sa
}
#Copy the labels to the new array
dimnames(IBTS_logscale)<-dimnames(IBTS_tensor)
```

### C2. Principal Tensor Analysis
#### Run a PTA
The PTA is run with the function `PTA3`. The number of principal tensor is indicated by `nbPT` and `nbPT2`. One chooses the number of principal tensors at each *level* of analysis by `nbPT`, the last level (2-modes analysis) is fixed by `nbPT2`.
The Principal Tensor Analysis computed three main principal tensor and their two mode associated principal tensors.
```{r, comment ="", fig.path='figures/'}
pta <- PTA3(IBTS_logscale, nbPT = 3, nbPT2 = 3, minpct = 0.1)
summary.PTAk(pta,testvar = 0)
```

#### Choosing the number of PT.

To select the significant PT, we build the scree plot from the global variance explained.
```{r, comment ="", fig.path='figures/'}
#Create the scree plot
out <- !substr(pta[[3]]$vsnam, 1, 1) == "*"
gct <- (pta[[3]]$pct*pta[[3]]$ssX/pta[[3]]$ssX[1])[out]
barplot(sort(gct, decreasing = TRUE), xlab="PT",
        ylab="Percentage of variance")
```

A bend can be seen after 4 PT, so we will select the 4 PT with the best explaining power. Numerically, it corresponds to the PT 1, 6, 7 and 11 of our object `pta`

#### Interpretation of PT.
The plotting function per default allow to use the argument `mod` to select which dimension to plot, `nb1` and `nb2` to select which PT will be shown on x-axis and y-axis.
For example, we plot the time and space components (the second and third dimension of the array, so `mod=c(2,3)`) projected on vs111 and vs222 (respectively the elements 1 and 11 of `pta`, so `nb1 = 1, nb2 = 11`):
```{r, comment ="", fig.width=8, fig.path='figures/'}
par(mfrow=c(1,2))
plot(pta, mod=c(2,3), nb1 = 1, nb2 = 11, xpd=NA, lengthlabels = 4)
plot(pta, mod=1, nb1 = 1, nb2 = 11, lengthlabels = 3)
```

We can see from the plot above that vs111 characterize the space (green crosses representing RA are spread over vs111) and vs222 characterize the time (red triangles representing years are spread over vs222).
vs111 makes the difference between the northern NS (RA 1, 2 and 3 have negative weights, projected on the left side) and the southern NS (RA 5 and 6 have positive weights, projected on the right side).
vs222 shows a temporal trend and makes the difference between the period 1985-1998 (years before 1998 have negative weights, projected on the lower side) and the recent period 2003-2015 (years after 2003 have positive weights, projected on the upper side).
The species projection (right side, created with `mod=1`) allow us to see the species plotted on these two PT. For example, we can see Cod (abbreviated *Gad*) in the upper part: it is the species with the strongest decrease in abundance (the highest point in vs222), and doesn't have clear spatial north-south pattern. However, the high number of species makes it difficult to characterize each species (suggesting the need for clustering).

Similar plot can be done for the two other significant PT

```{r, comment ="", fig.width=8, fig.height=6,fig.path='figures/'}
par(mfrow=c(2,2))
plot(pta, mod=c(2,3), nb1 = 1, nb2 = 6, xpd=NA, lengthlabels = 4)
plot(pta, mod=1, nb1 = 1, nb2 = 6, xpd=NA, lengthlabels = 4)
plot(pta, mod=c(2,3), nb1 = 1, nb2 = 7, xpd=NA, lengthlabels = 4)
plot(pta, mod=1, nb1 = 1, nb2 = 7, xpd=NA, lengthlabels = 4)
```

31vs111 are temporal mode PT associated with vs111, meaning that vs111 and 31vs111 share the same temporal component. This feature can be seen with the straight line representing the projection of the years on these PT.

Another way to represent the spatiotemporal variations of the PT is to use a 2D representation, with x-axis being the time and y-axis being the space. The graphical functions are out of scope of this tutorial but presented here to help the interpretation.

![Spatio-temporal pattern summarized by the successive PTs ](figures/PTA_Inter.png)


### C3. Clustering
We use the same approach as before, but with the species projected on the 4 PT.
```{r, comment ="", fig.path='figures/'}
#Create the matrix with the projection of species on the 4 PT
keep <- c(1, 6, 7, 11) # PT that are kept in the analysis
coo <- t(pta[[1]]$v[c(keep),])
labkeep <- paste0(pta[[3]]$vsnam[keep], " - ", round((100 * (pta[[3]]$d[keep])^2)/pta[[3]]$ssX[1],1), "%")

#1. Compute the distance between species
dist1 <- dist(coo, method = "euclidean")

#2. Build a tree with Ward linkage
den <- hclust(dist1,method = "ward.D2")

#3. Plot the dendogram
par(mar=c(1,3,1,1))
plot(den, hang=-1, ax = T, ann=F, xlab="", sub="",labels = FALSE)

#Choose the number of clusters
nclust <- 6

#Visualize the cutting
rect.hclust(den, k=nclust, border=rainbow(nclust)[c(6,5,2,4,3,1)])
```

The dendogram suggests to create 6 clusters. It is now important to interpret the clusters by projecting them on the PT.

```{r, comment ="", fig.width=9, fig.height=4, fig.path='figures/'}
#4. Create the clusters
clust_3D <- as.factor(cutree(den, k=nclust))

#Visualize them
par(mfrow=c(1,3))
s.class(coo, fac = clust_3D, xax=1, yax=2, col=rainbow(nclust), clabel = 2)
text(min(coo[,1])-0.03,0, labkeep[2], srt=90, xpd=NA, cex=1.5)
text(0,max(coo[,2])+0.02, labkeep[1], xpd=NA, cex=1.5)
s.class(coo, fac = clust_3D, xax=1, yax=3, col=rainbow(nclust), clabel = 2)
text(min(coo[,1])-0.03,0, labkeep[3], srt=90, xpd=NA, cex=1.5)
text(0,max(coo[,3])+0.02, labkeep[1], xpd=NA, cex=1.5)
s.class(coo, fac = clust_3D, xax=1, yax=4, col=rainbow(nclust), clabel = 2)
text(min(coo[,1])-0.03,0, labkeep[4], srt=90, xpd=NA, cex=1.5)
text(0,max(coo[,4])+0.02, labkeep[1], xpd=NA, cex=1.5)
```

* Cluster 1 is the southern community with high abundance in RA5 and no temporal pattern.
* Cluster 2 is the community with decreasing abundance and no clear spatial pattern.
* Cluster 3 is the South-East community with small increase in abondance
* Cluster 4 is the northern community with high abundance in RA1 and no temporal trend.
* Cluster 5 is the North-West community with a small increase in abondance
* Cluster 6 is the increasing community, mainly pronounced in RA 1, 3 and 5

![Spatio-temporal dynamics of the 6 sub-communities](figures/PTA_GroupInter.png)

## Summary

Multivariate analysis are methods to simplify complex and multidimensional data set. The dimensions are simplified with the objective of keeping most of the variance. In this process, the information and the noise are separated; the main patterns being revealed by the principal components. Multivariate analysis are data-mining tools, in other words they are not predictive or mechanistic tools but data-driven. They can help to visualize and characterize what information is hidden in large data set.

![](figures/Summary.png){width=90%}


## References

Cattell, R. B. (1966). *The scree test for the number of factors.* Multivariate behavioral research,1(2), 245-276.

Cichocki, A., Mandic, D., De Lathauwer, L., Zhou, G., Zhao, Q., Caiafa, C., & Phan, H. A. (2015). *Tensor decompositions for signal processing applications: From two-way to multiway component analysis.* IEEE Signal Processing Magazine, 32(2), 145-163.

Frelat R, Lindegren M, Dencker TS, Floeter J, Fock HO, Sguotti C, Stäbler M, Otto SA and Möllmann C (2017). *Community ecology in 3D: Tensor decomposition reveals spatio-temporal dynamics of large ecological communities*. PLoS ONE 12(11): e0188205.

Leibovici, D. G. (2010). *Spatio-temporal multiway decompositions using principal tensor analysis on k-modes: The R package PTAk*. Journal of Statistical Software, 34(10), 1-34.

Ward Jr, J. H. (1963). *Hierarchical grouping to optimize an objective function*. Journal of the American statistical association, 58(301): 236-244.