proofed and refined advanced concepts guide

Atma Mani · Atma Mani · commit 2cc3e630b440 · 2018-08-28T16:58:54.000-07:00
diff --git a/guide/05-working-with-the-spatially-enabled-dataframe/spatially-enabled-dataframe-advanced-topics.ipynb b/guide/05-working-with-the-spatially-enabled-dataframe/spatially-enabled-dataframe-advanced-topics.ipynb
@@ -4,15 +4,16 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# Advanced Topics\n",
+    "# Spatially Enabled DataFrames - Advanced Topics\n",
     "\n",
     "The information in this section provides a brief introduction to advanced topics with the `Spatially Enabled DataFrame` structure.  \n",
     "\n",
     "One of the most important tasks for software applications is to quickly retrieve and process information. Enterprise systems, whether storing GIS information or not, all utilize the concept of indexing to allow for quick searching through large data stores to locate and select specific information for subsequent processing. \n",
     "\n",
-    "This document will outline how row and column indexing work in the Spatial Dataframe and also demonstrate building a spatial index on dataframe geometries to allow for quick searching, accessing, and processing. The document will also demonstrate spatial joins to combine dataframes.\n",
+    "This document will outline how row and column indexing work in Spatially Enabled Dataframes and also demonstrate building a spatial index on dataframe geometries to allow for quick searching, accessing, and processing. The document will also demonstrate spatial joins to combine dataframes.\n",
     "\n",
-    " * [Dataframe Index](#Dataframe-Index)\n",
+    " * [DataFrame Index](#DataFrame-Index)\n",
+    "  * [Slicing DataFrames](#Slicing-DataFrames)\n",
     " * [Spatial Index](#Spatial-Index)\n",
     " * [Intersection with the Spatial Index](#Intersection-with-the-Spatial-Index)\n",
     " * [Spatial Joins](#Spatial-Joins)\n",
@@ -23,10 +24,10 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## Dataframe Index\n",
-    "As mentioned in the [Introduction to the spatial dataframe guide](../introduction-to-the-spatial-dataframe), the Pandas [dataframe structure](https://pandas.pydata.org/pandas-docs/stable/dsintro.html#dataframe) underlies the ArcGIS API for Python Spatial Dataframe. Pandas dataframes are analagous to spreadsheets. They have a row axis and a column axis. Each of these axes are indexed and labeled for quick and easy identification, data alignment, and retrieval and updating of data subsets.\n",
+    "## DataFrame Index\n",
+    "As mentioned in the [Introduction to the Spatially Enabled DataFrame guide](../introduction-to-the-spatially-enabled-dataframe), the Pandas [DataFrame structure](https://pandas.pydata.org/pandas-docs/stable/dsintro.html#dataframe) underlies the ArcGIS API for Python's Spatially Enabled DataFrame. Pandas DataFrames are analagous to spreadsheets. They have a row axis and a column axis. Each of these axes are indexed and labeled for quick and easy identification, data alignment, and retrieval and updating of data subsets.\n",
     "\n",
-    "Lets explore the axis labels and indices and how they allow for data exploraation:"
+    "Let's explore the axes labels and indices and how they allow for data exploraation:"
    ]
   },
   {
@@ -43,7 +44,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "When working with an ArcGIS Online feature layer, the [`query()`](https://esri.github.io/arcgis-python-api/apidoc/html/arcgis.features.toc.html#arcgis.features.FeatureLayer.query) method returns a `FeatureSet` object which has a `df` method to instantiate a Spatial Dataframe."
+    "When working with an ArcGIS Online feature layer, the [`query()`](https://esri.github.io/arcgis-python-api/apidoc/html/arcgis.features.toc.html#arcgis.features.FeatureLayer.query) method returns a `FeatureSet` object which has a `sdf` method to instantiate a Spatially Enabled DataFrame."
    ]
   },
   {
@@ -551,6 +552,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
+    "### Slicing DataFrames\n",
     "We can access rows, columns and subsets of rows and columns using Python slicing:"
    ]
   },
@@ -774,9 +776,9 @@
    "metadata": {},
    "source": [
     "## Spatial Index\n",
-    "In addition to row and column indices to search a dataframe, we can use a spatial indexes quickly access information based on its location and relationship with other features. They are based on the concept of a minimum bounding rectangle - the smallest rectangle that contains an entire geometric shape. Each of these rectangles are then grouped into `leaf` nodes representing a single shape and `node` structures containing groups of shapes according to whatever algorithm the different types of spatial indexing use. Querying these rectangles requires magnitudes fewer computer resources for accessing and processing geometries relative to accessing the entire feature array of coordinate pairs that compose a shape. Access to points, complex lines and irregularly-shaped polygons becomes much quicker and easier through different flavors of spatial indexing.\n",
+    "In addition to row and column indices to search a DataFrame, we can use a spatial indexes to quickly access information based on its location and relationship with other features. They are based on the concept of a **minimum bounding rectangle** - the smallest rectangle that contains an entire geometric shape. Each of these rectangles are then grouped into `leaf` nodes representing a single shape and `node` structures containing groups of shapes according to whatever algorithm the different types of spatial indexing use. Querying these rectangles requires magnitudes fewer compute resources for accessing and processing geometries relative to accessing the entire feature array of coordinate pairs that compose a shape. Access to points, complex lines and irregularly-shaped polygons becomes much quicker and easier through different flavors of spatial indexing.\n",
     "\n",
-    "The Spatial DataFrame uses an implementation of spatial indexing known as [QuadTree indexing](https://en.wikipedia.org/wiki/Quadtree), which searches nodes when determining locations, relationships and attributes of specific features. QuadTree indexes are the default spatial index, but the SEDF also supports r-tree implementations.  In the [**Dataframe Index**](#Datframe-index) section of this notebook, the USA Major Cities feature layer was queried and the `df` method was called on the results to create a data frame. The [`sindex`](https://esri.github.io/arcgis-python-api/apidoc/html/arcgis.features.html?highlight=style#arcgis.features.SpatialDataFrame.sindex) method on the `df` creates a quad tree index:"
+    "The Spatially Enabled DataFrame uses an implementation of spatial indexing known as [QuadTree indexing](https://en.wikipedia.org/wiki/Quadtree), which searches nodes when determining locations, relationships and attributes of specific features. `QuadTree` indexes are the default spatial index, but the SEDF also supports `r-tree` implementations.  In the [**DataFrame Index**](#DataFrame-index) section of this notebook, the USA Major Cities feature layer was queried and the `sdf` property was called on the results to create a DataFrame. The [`sindex`](https://esri.github.io/arcgis-python-api/apidoc/html/arcgis.features.toc.html#arcgis.features.GeoAccessor.sindex) method on the DataFrame creates a QuadTree index:"
    ]
   },
   {
@@ -792,7 +794,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Let's visually inspect the external frame of the Quadtree index. We'll then plot the spatial dataframe to ensure the spatial index encompasses all our features:"
+    "Let's visually inspect the external frame of the QuadTree index. We'll then plot the spatial dataframe to ensure the spatial index encompasses all our features:"
    ]
   },
   {
@@ -861,7 +863,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Let's use the feature we drew above to define a spatial reference variable for use throughout the rest of this guide."
+    "Let's use the feature we drew earlier to define a spatial reference variable for use throughout the rest of this guide."
    ]
   },
   {
@@ -1477,11 +1479,11 @@
     "sym_poly = {\n",
     "  \"type\": \"esriSFS\",\n",
     "  \"style\": \"esriSFSSolid\",\n",
-    "  \"color\": [0,0,0,0],\n",
+    "  \"color\": [0,0,0,0],  # hollow, no fill\n",
     "    \"outline\": {\n",
     "     \"type\": \"esriSLS\",\n",
     "     \"style\": \"esriSLSSolid\",\n",
-    "     \"color\": [255,0,0,255],\n",
+    "     \"color\": [255,0,0,255],  # red border\n",
     "     \"width\": 3}\n",
     "}\n",
     "\n",
@@ -1512,11 +1514,11 @@
     "sym_poly_aoi = {\n",
     "  \"type\": \"esriSFS\",\n",
     "  \"style\": \"esriSFSSolid\",\n",
-    "  \"color\": [0,0,0,0],\n",
+    "  \"color\": [0,0,0,0],  # hollow, no fill\n",
     "    \"outline\": {\n",
     "     \"type\": \"esriSLS\",\n",
     "     \"style\": \"esriSLSSolid\",\n",
-    "     \"color\": [0,255,0,255],\n",
+    "     \"color\": [0,255,0,255],   # green border\n",
     "     \"width\": 3}\n",
     "}\n",
     "\n",
@@ -1783,10 +1785,19 @@
     "df.iloc[index_of_features]"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Let us plot these features that intersect on a map:"
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 27,
-   "metadata": {},
+   "metadata": {
+    "scrolled": true
+   },
    "outputs": [
     {
      "data": {
@@ -1842,7 +1853,7 @@
     "pt_sym = {\n",
     "    \"type\": \"esriSMS\",\n",
     "    \"style\": \"esriSMSDiamond\",\n",
-    "    \"color\": [255,140,0,255],        \n",
+    "    \"color\": [255,140,0,255],  # yellowish\n",
     "    \"size\": 8,\n",
     "    \"angle\": 0,\n",
     "    \"xoffset\": 0,\n",
@@ -1856,18 +1867,25 @@
     "    m2.draw(shape = df.iloc[pt_index]['SHAPE'], symbol = pt_sym)  "
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Thus we were able to use the spatial indexes to query features that fall within an extent."
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
     "## Spatial Joins\n",
-    "Dataframes are table-like structures comprised of rows and columns. In relational database, SQL `joins` are fundamental operations that combine columns from one or more tables using values that are common to each. They occur in almost all database queries.\n",
+    "DataFrames are table-like structures comprised of rows and columns. In relational database, SQL `joins` are fundamental operations that combine columns from one or more tables using values that are common to each. They occur in almost all database queries.\n",
     "\n",
     "A Spatial join is a table operation that affixes data from one feature layer’s attribute table to another based on a spatial relationship. The spatial join involves matching rows from the Join Features (data frame1) to the Target Features (data frame2) based on their spatial relationship.\n",
     "\n",
-    "Let's look at how joins work with dataframes by using subsets of our original dataframe and the pandas `merge` fucntionality. We'll then move onto examining a spatial join to combine features from one dataframe with another based on a common attribute value.\n",
+    "Let's look at how joins work with dataframes by using subsets of our original DataFrame and the pandas `merge` fucntionality. We'll then move onto examining a spatial join to combine features from one dataframe with another based on a common attribute value.\n",
     "\n",
-    "Query the dataframe to extract 3 attribute columns of information from 2 states, Ohio and Michigan:"
+    "Query the DataFrame to extract 3 attribute columns of information from 2 states, Ohio and Michigan:"
    ]
   },
   {
@@ -2507,7 +2525,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Notice how all the rows from the left dataframe appear in the result with all the attribute columns and values appended from the right dataframe where the column value of NAME matched. The `POP2010` attribute from the left dataframe is combined with all the attributes from the right dataframe."
+    "Notice how all the rows from the left `DataFrame` appear in the result with all the attribute columns and values appended from the right `DataFrame` where the column value of `NAME` matched. The `POP2010` attribute from the left `DataFrame` is combined with all the attributes from the right `DataFrame`."
    ]
   },
   {
@@ -2879,18 +2897,18 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "The rows where the on parameter value is the same in both tables have all attributes from both dataframes in the result. The rows from the first dataframe that do not have a matching `NAME` value in the second dataframe have values filled in with NaN values."
+    "The rows where the on parameter value is the same in both tables have all attributes from both DataFrames in the result. The rows from the first DataFrame that do not have a matching `NAME` value in the second dataframe have values filled in with `NaN` values."
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "A spatial join works similarly on matching attribute values.\n",
+    "A spatial join works similarly on matching attribute values. However, instead of joining on an attribue field (like you did earlier), you will join based on the spatial relationship between the records in the two tables.\n",
     "\n",
     "#### Example: Merging State Statistics Information with Cities\n",
     "\n",
-    "The goal is to get Wyoming's city locations and census data joined with Wymoing's state census data.\n",
+    "The goal is to get Wyoming's city locations and census data joined with Wyoming's state census data.\n",
     "> If you do not have access to the `ArcPy` site-package from the Python interpreter used to execute the following cells, you must authenticate to an ArcGIS Online Organization or ArcGIS Enterprise portal."
    ]
   },
@@ -2948,7 +2966,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "We will use python's list comprehensions to create lists of the attribute columns in the dataframe, then print out the lists to see the names of all the attribute columns."
+    "We will use python's list comprehensions to create lists of the attribute columns in the DataFrame, then print out the lists to see the names of all the attribute columns."
    ]
   },
   {
@@ -3036,7 +3054,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Create a dataframe for the cities in Wyoming:"
+    "Create a DataFrame for the cities in Wyoming:"
    ]
   },
   {
@@ -3703,6 +3721,13 @@
     "sdf2"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Notice, you retain the geometry type of your left DataFrame (points) in this case, however, you get all the attributes from both the left and right DataFrames. Let us plot the results of the spatial join on a map:"
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 47,
@@ -3755,6 +3780,14 @@
     "for idx, row in sdf2.iterrows():\n",
     "    m3.draw(row['SHAPE'], symbol=pt_sym)"
    ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Conclusion\n",
+    "Spatially Enabled DataFrame give you powerful data analysis and data wrangling capabilities. In addition to performing sql like operations on attribute data, you can perform geographic queries. This guide demonstrated some of these advanced capabilities of the SEDF."
+   ]
   }
  ],
  "metadata": {
@@ -3773,7 +3806,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.6.5"
+   "version": "3.6.6"
   }
  },
  "nbformat": 4,