Aviso: Se está a ler esta mensagem, provavelmente, o browser que utiliza não é compatível com os "standards" recomendados pela W3C. Sugerimos vivamente que actualize o seu browser para ter uma melhor experiência de utilização deste "website". Mais informações em webstandards.org.

Warning: If you are reading this message, probably, your browser is not compliant with the standards recommended by the W3C. We suggest that you upgrade your browser to enjoy a better user experience of this website. More informations on webstandards.org.

Script rmarkdown (Link)

---
title: "Processing big geographic data sets with R"
subtitle: "Working example: forest/urban interface"
author: "Manuel Campagnolo (ISA/ULisboa)"
date: "September 12-14, 2018"
output:
  html_document:
    toc: true # table of content true
    toc_depth: 3  # upto three depths of headings (specified by #, ## and ###)
    number_sections: true  ## if you want number sections at each table header
    #theme: united  # many options for theme, this one is my favorite.
    #highlight: tango  # specifies the syntax highlighting style
---



```{r setup, include=FALSE, purl=FALSE}
options(width=80)
knitr::opts_chunk$set(comment = "", warning = FALSE, message = FALSE, echo = TRUE, tidy = TRUE, size="small", cache=FALSE, eval=TRUE)
```


# Packages
```{r}
library(sf) # new package for geographic vector datasets (replaces sp)
library(mapview) #mapView
library(RANN) # nn2
library(tidyverse) #  contains dplyr, magrittr
```


<!--
Comments:

input<-'big-data-read-files.rmd';
outputR <- paste(tools::file_path_sans_ext(input), 'R', sep = '.');
outputhtml <- paste(tools::file_path_sans_ext(input), 'html', sep = '.');

to run from command line:
> rmarkdown::render('big-data-read-files.rmd',output_file='big-data-read-files.html')

to create r code and comment everything else:
> knitr::purl(input,outputR,documentation=2,quiet=T)

To interrupt processing use the following chunck:
```{r, echo=FALSE}
#knitr::knit_exit()
```
-->

```{r, echo=FALSE}
setwd("\\\\pinea\\mlc\\Aulas\\CURSOS_R\\isa-setembro-2018-grandes-cdg\\aulas")
library(knitr)
```


# The COS2015 urban/forest direct interface problem and data

With this example we review some GIS spatial analysis tools that are available through package *sf*. Then, we explore some alternatives that use distances between geographic points to obtain a much faster solution to the problem.

The problem at hand is to determine the so-called *direct interface* between urban and flammable forested areas. The input is a land cover vector map for Portugal (COS2015). Flammable areas are the one labelled essentially as "floresta", "matos" e "vegetacao esparsa", and the urban areas are the ones labelled as "tecido urbano contínuo", tecido urbano descontínuo" e "Indústria comércio e equipamentos gerais". The *direct interface* are the spatial lines where those two types of land cover come together. Supposing the the features match spatially, this is just the *intersection* between flammable and urban areas.

We first read the data using *sf_read* from package *sf* (we call call this using *sf::st_read*) to read both *shapefiles* of flammable areas and urban areas for the whole Continental Portugal according to COS2015. Since both data sets were extracted from the same COS2015 land cover map for Continental Portugal, they have the same coordinate reference system.

```{r}
flam<-sf::st_read(dsn=file.path(getwd(),"datasets","Flam2015v4.shp"))
urb<-sf::st_read(dsn=file.path(getwd(),"datasets","urb12015v4.shp"))
```

It is relevant to understand what is the geometry of the spatial information we are going to use. One simple way of doing this is to list the geometry types that are present in each data set. For this data sets, one can conclude that there is only one type of geometry, which is "POLYGON". 

```{r, echo=TRUE}
table(st_geometry_type(flam))
table(st_geometry_type(urb))
```

"POLYGON" geometries differ from "MULTIPOLYGON" in the sense that each geometric object has only one "part". Nonetheless, a "POLYGON" can have "holes" in his interior.

# Use *st_intersection* for a small region of Portugal

Since the full data sets are quite large, let's first look at a subset of the data. In order to do this we consider just one region in Portugal and select the features that intersect that region. It is a good practice to re-project one of the data sets to the coordinate reference system (CRS) of the other data set to guarantee that both data sets are expressed on the same CRS.

Then, we re-project *caop* to the CRS of *flam*, and we select a particular administrative unit (Guimarães)

```{r}
caop<-sf::st_read(dsn=file.path(getwd(),"datasets","Cont_AAD_CAOP2015.shp"))
caop.proj<-sf::st_transform(caop, crs=sf::st_crs(flam))
regiao<-sf::st_union(caop.proj[caop.proj$Concelho=="GUIMARÃES",])
```

The next step is to select just the flammable and urban areas over the region. Below we perform a *selection by location*, and we select the features that *intersect* the region. In order to do this, we use *sf* method *sf_intersects* (which is distinct from *sf_intersection* that creates new geometries). The output of *sf_intersects* is a logical matrix that indicates which features from each of the inputs intersects spatially. Since *regiao* has only one feature, the matrix has just one column of *TRUE* and *FALSE*. Then, to select the features from *flam* and *urb* that intersect *regiao*, we use vector of *TRUE* and *FALSE* to select the rows of the *sf* data.frames. The selected features are called *flams* and *urbs*.

```{r}
its<-st_intersects(flam,regiao,sparse=FALSE)
flams<-flam[its[,1],]
its<-st_intersects(urb,regiao,sparse=FALSE)
urbs<-urb[its[,1],]
```

Let's see where those features occur with *mapview*.

```{r}
mapview(flams,col.regions="green")+mapview(urbs,col.regions="gray")
```

Next we compute the spatial intersection between *flams* and *urbs* which returns the *direct interface* we were interested in. This is very easy since there is a *sf* method that does it, which is *st_intersection*.


```{r, eval=FALSE}
system.time({int<-st_intersection(flams,urbs)}) # 90.74  for regiao==Guimarães
mapview(int,color="red")+mapview(flams,col.regions="green")+mapview(urbs,col.regions="gray")
table(st_geometry_type(int))
int[st_geometry_type(int)=="MULTILINESTRING",]
```




Note that the new data set *int* includes different kinds of geometry. for subsequent processing, one could just consider the feature with one particular kind of geometry.
```{r, eval=FALSE}
table(st_geometry_type(int))
int[st_geometry_type(int)=="MULTILINESTRING",]
```

# An alternative and much faster approach using distances

As we saw above, the computation time for one region is already sizable, and it would be very lengthy to process the whole Portugal with *st_intersection*. Next, we explore one alternative, which will lead to a much faster algorithm for the problem at hand.

It is easy to see that the *direct interface* is determined by the vertices of flammable and urban that have the same location.

The key aspect of the approach below is that it uses just the vertices (points) instead of the full polygons that define flammable and urban areas. Then, it relies on function *RANN::nn2* which is extremely efficient to compute distances between points.

The sequence of steps is the following:
1. Extract vertices from *flam* and *urb*;
2. Determine the vertices that have same location (up to a tolerance that we can choose);
3. From those vertices, built a new features of geometry *LINESTRING* that connect the vertices that were identified.

```{r}
xyflam<-st_coordinates(flams)
xyurb<-st_coordinates(urbs)
head(xyflam,3)
```

Note that *xyflam* is a matrix, where the first two columns (named *X* and *Y*) are the coordinates of the vertices. The tow last columns *L1* and *L2* (*L* for level) indicate the level of the feature those points came from. Each feature can have more than one ring: the 1st ring determines the exterior of the feature, and the following rings, if any, determine the "holes" in the feature. For each vertex that was extracted from *flams*, *L2* indicates the index of the feature, and *L1* indicates the ring index*.

```{r}
nrow(flams) # number of features
range(xyflam[,"L2"]) # L2 indicates the index of the feature
```

We are going to choose one of the data sets (from *flams* and *urbs*) to be the reference for the following steps. The choice is arbitrary but it will be relevant to keep note of the indices of the vertices of the reference data set.

```{r}
xyurb<-cbind(idx=1:nrow(xyurb),xyurb) # idx keeps the index of each original urb vertex
head(xyurb,3)
```

The main function (*nn2*) comes next. Given two sets of points, it determines what are the *k* closest points of the second set. The output is a list with two components:
1. (*$nn.idx*) a matrix with the indices of the *k* closest points;
2. (*$nn.dists*) a matrix with the distances to the *k* closest points;

We  apply *nn2*  and use *k=1* (we just need the closet point). The output *knn$nn.ids* has the same number of rows than *xyurb* (the matrix for the *query* argument, which is our reference data set). When one point doesn't have any neighbor within distance determined by *radius*, the index in *knn$nn.idx* is 0. This gives an easy criterium to remove all vertices or *xyurb* which are not near a vertex of *xyflam*.

```{r}
tol<-3 # tolerance
knn<-nn2(data=xyflam[,c("X","Y")],query=xyurb[,c("X","Y")],k=1,searchtype = "radius",radius=tol)
tail(knn$nn.idx,3)
xyint<-xyurb[knn$nn.idx>0,]
```

Let's look at the result: this should be a set of vertices of *urb* that are near the vertices of *flamm* and that indicate where the direct interface is.

```{r}
sfint<-st_as_sf(as.data.frame(xyint), coords = c("X","Y") , crs = st_crs(urb)$proj4string)
mapview(sfint,color="red")+mapview(flams,col.regions="green")+mapview(urbs,col.regions="gray")
```

This could be the answer to the problem if one would just need to determine the locations of interest or even compute some statistics on the direct interface. However, one could want to obtain a spatial object of type  *LINESTRING* that could be used to display the result and, for instance, to compute the total length of the direct interface

In order to do this, we need to group all  vertices from the same feature, same ring, and same succession of vertices (segment), and then create a *LINESTRING* object from each of the groups.

The information about the feature and ring is available in *sfint*. The only piece of information that is missing is the one that allow us to break up the succession of vertices in segments, since the interface can be made of several segments of the same ring, in the same feature. In order to determine the segments, we use column *idx* of *xyurb* we defined earlier, and we use the fact that one segment is composed of successive vertices, i.e., *idx* increases 1 from one vertex to the next. Below, *jump* is *TRUE* if the increase is larger than 1, which indicates that one should "jump" to the following segment. Using *jump* one can create a new attribute of *sfint* that stores the index of the *segment*.

```{r}
jump <- diff(sfint$idx)>1 # indicates if the vertex index "jumps" more than 1 position
sfint<-cbind(sfint,segment=cumsum(c(0,jump))) # adds columns lineidx
```

The next command uses *magrittr*'s *pipe* operator. This is a convenient way of applying several functions consecutively: the function after *%>%* uses as its first argument the object before  *%>%*.

```{r}
fastint<-sfint %>% group_by(L2,L1,segment) %>% summarize(do_union=FALSE) %>% st_cast("LINESTRING")
mapview(sfint,color="blue")+mapview(fastint,color="red")+mapview(flams,col.regions="green")+mapview(urbs,col.regions="gray")
```

Function *group by* groups vertices according to the values of *L2*, *L1* and *segment*. Function *summarise* is described as <<In case one or more of the arguments (expressions) in the summarise call creates a geometry list-column, the first of these will be the (active) geometry of the returned object. If this is not the case, a geometry column is created, depending on the value of do_union>>. With *summarise* one can also create  new variables that are computed for each group (e.g. the mean of some variable within the group). A simple way of creating a feature class of type "LINESTRING" from points is using function *st_cast*. Option *do_union=FALSE* is necessary to preserve the order of the vertices before the cast to *LINESTRING*.

Finally, one can apply the algorithm to the whole Continental Portugal as below, which should take approximately 11 minutes to run.

```{r, eval=FALSE}
system.time({
xyflam<-st_coordinates(flam)
xyurb<-st_coordinates(urb)
xyurb<-cbind(idx=1:nrow(xyurb),xyurb)
tol<-3 # tolerance to consider vertices coincident
knn<-nn2(data=xyflam[,c("X","Y")],query=xyurb[,c("X","Y")],k=1,searchtype = "radius",radius=tol)
xyint<-xyurb[knn$nn.idx>0,]
sfint<-st_as_sf(as.data.frame(xyint), coords = c("X","Y") , crs = st_crs(urb)$proj4string)
jump <- diff(sfint$idx)>1
sfint<-cbind(sfint,segment=cumsum(c(0,jump)))
fastint<-sfint %>% group_by(L2,L1,lineidx) %>% summarize(do_union=FALSE) %>% st_cast("LINESTRING")
})
```