LAB 04: GENERATE! > TUTORIAL
Through this exercise, you will learn several key geoprocessing and analysis tools in QGIS: integrating tabular data into our scene, followed by the calculation of a distance matrix to explore the spatial relationships between pairs of locations. The generated outputs are as much analytical as they are speculative.
Premise and Objectives
After completing this exercise, you should be able to:
You will then post your work-in-progress to Are.na.
Prep
If you’re working in the SSDLAB virtual environment, copy the data repository for this exercise from the shared drive (G:) to your working location (U:). If not, download the data using this link. In this exercise, will use two new datasets:
Adding x and y coordinate data as a layer
In addition to geospatial datasets, such as shapefiles or GeoTIFFs, you can also add tabular data that contains the geographic location (in the form of x and y coordinates) to your map:
Several fields will automatically populate. Confirm the following:
Click
Important Note: The CRS units must correspond to the x and y units in the tabular data. If the table contains latitude and longitude information in degrees, always set the Geometry CRS to EPSG:4326 - WGS 84 (unless specified otherwise in the metadata).
Right-click on the point layer you just added and export features as
Reprojecting a vector layer
When using GIS tools or methods involving distance measurements, you should ensure that all your data layers are in the same projected coordinate system using linear units such as meters or feet. The best practice is to also use the same CSR as the Project CRS. In this exercise, we will use the NAD 83 / Conus Albers (
To reproject the zebra mussels layer to NAD 83 / Conus Albers, click through
Save the reprojected layer as
Add the EPA plastics facilities (
Set the project CRS to NAD 83 / Conus Albers and save your map project.
Your map project should look something like this:
Distance From Point to Point
A distance matrix is a two-dimensional array containing distance measurements between two sets of locations. In QGIS, the
To answer the first question, we need to generate a series of points along the coastlines (the tool only takes point features as inputs). The points will serve as a proxy for coastlines:
Your should see something like this:
Open and inspect the Attribute Tables of the input shorelines layer and the resulting points layer. You will notice that the attrubutes from the single polygon representing shorelines are copied to each interpolated points we just generated, and there is no unique identifier that would disringuish points from one another. Although not always required, it’s generally considered a good practice to have at the minimum a unique Feature ID (FID) for each feature in a feature class.
To do that, we will create a new field in the shoreline points Attribute Table, and assign a row number to it as a unique identifier.
In the shorline points Attribute Table:
Now we’re able to compute distances between our sets of point features:
Open the Attribute Table of the resulting layer. We have generated two new fields in the Attribute Table where each EPA plastics facility is matched with its nearest coastline point, and QGIS has computed the distance between them in meters (the
On your own:
Distance Raster (Interpolation)
Interpolation methods use points with known values to estimate values at other unknown points. We can use it to predict unknown values for any geographic point data, such as elevation, distance, chemical concentrations, noise levels, etc.
One method to do so is the
Distance matrices that you just created are ideal inputs for IDW interpolation:
Finally, save the resulting raster as
On your own: Use this same workflow to generate new interpolation rasters for the distance between EPA sites and Zebra Mussels (or any other point feature class of interest).
Through this exercise, you will learn several key geoprocessing and analysis tools in QGIS: integrating tabular data into our scene, followed by the calculation of a distance matrix to explore the spatial relationships between pairs of locations. The generated outputs are as much analytical as they are speculative.
Premise and Objectives
After completing this exercise, you should be able to:
- add tabular data containing geographic coordinates to the scene
- reproject a shapefile
- compute a distance matrix between two sets of locations
- create a distance raster using the interpolation method
You will then post your work-in-progress to Are.na.
Prep
If you’re working in the SSDLAB virtual environment, copy the data repository for this exercise from the shared drive (G:) to your working location (U:). If not, download the data using this link. In this exercise, will use two new datasets:
- Nonindigenous Sightings of Dreissena polymorpha (zebra mussel)
- EPA’s Toxic Release Inventory (Plastics and Rubber Facilities)
Adding x and y coordinate data as a layer
In addition to geospatial datasets, such as shapefiles or GeoTIFFs, you can also add tabular data that contains the geographic location (in the form of x and y coordinates) to your map:
-
Launch QGIS and open the
Data Source Manager
(located on the main toolbar) - Open the
Delimited text
tab - Click the Browse (
…
) button and navigate to ../00_Data/NAS and selectNAS_Dreissena-polymorpha_1986-2023.csv
Several fields will automatically populate. Confirm the following:
- File Format is set to
CSV (comma-separated values)
- Geometry Definition is set to
Point Coordinates
; the X field is set toLongitude
, and the Y field is set toLatitude
- Geometry CRS is set to
EPSG:4326 - WGS 84
Click
Add
and close the dialogue box. Save your map project.Important Note: The CRS units must correspond to the x and y units in the tabular data. If the table contains latitude and longitude information in degrees, always set the Geometry CRS to EPSG:4326 - WGS 84 (unless specified otherwise in the metadata).
Right-click on the point layer you just added and export features as
NAS_zebra-mussels_wgs84.shp
. Make sure the shapefile CRS is set to EPSG:4326 - WGS 84.Reprojecting a vector layer
When using GIS tools or methods involving distance measurements, you should ensure that all your data layers are in the same projected coordinate system using linear units such as meters or feet. The best practice is to also use the same CSR as the Project CRS. In this exercise, we will use the NAD 83 / Conus Albers (
EPSG : 5070
).To reproject the zebra mussels layer to NAD 83 / Conus Albers, click through
Vector
> Data Management Tools
> Reproject Layer…
in the main menu. Set the input layer as NAS_zebra-mussel_wgs84.shp
, and Target CRS as EPSG:5070 - NAD83 / Conus Albers: Save the reprojected layer as
NAS_zebra-mussel_nad83-conus-albers.shp
. Add the EPA plastics facilities (
EPA_TRI_Plastics_nad83-conus-albers.shp
) and the Great Lakes shorelines (main_lakes.shp
) shapefiles to your scene. Open the Layer Properties window for both layers and inspect their CRSs. You'll notice that the shorelines shapefile is unprojected. Use the Reproject Layer
function again, and set its CRS to NAD 83 / Conus Albers. Save the new layer as GL_shorelines_nad83-conus-albers.shp
. Set the project CRS to NAD 83 / Conus Albers and save your map project.
Your map project should look something like this:
Distance From Point to Point
A distance matrix is a two-dimensional array containing distance measurements between two sets of locations. In QGIS, the
Distance to nearest hub (points)
tool takes two point layers and computes the distance between
point features taken as the origin and their closest point from the destination features (hubs). We can use it to answer questions such as:- Which plastics facilities are closest to the Great Lakes?
- Which sites of zebra mussels sightings are nearest (or farthest) from the plastics facilities?
To answer the first question, we need to generate a series of points along the coastlines (the tool only takes point features as inputs). The points will serve as a proxy for coastlines:
- Add the Processing Toolbox Panel to to your workspace (
View
>Panels
>Processing Toolbox Panel
) - Find the
Points along geometry
tool under Vector geometry (start typing “Points along geometry” in the search box) - Set the Input layer to
GL_shorelines_nad83-conus-albers.shp
- Set the distance to 2 kilometers (this will generate a point every 2 km along the coast)
- Save the Interpolated points as
GL_shorelines_points.shp
.
Your should see something like this:
Open and inspect the Attribute Tables of the input shorelines layer and the resulting points layer. You will notice that the attrubutes from the single polygon representing shorelines are copied to each interpolated points we just generated, and there is no unique identifier that would disringuish points from one another. Although not always required, it’s generally considered a good practice to have at the minimum a unique Feature ID (FID) for each feature in a feature class.
To do that, we will create a new field in the shoreline points Attribute Table, and assign a row number to it as a unique identifier.
In the shorline points Attribute Table:
- Click on
Open Field Calculator
button (fourth from right in the toolbar) - In the Field Calculator dialogue box, check the
Create a new Field
- Set the Output field name to
ID
orFID
(for feature classes, it’s a convention to name identifiers as “FID”) - Double-click on “row_number” (you should see @row_number appear in the Expression Box)
- Click
OK
(This will automatically activate edit mode) - Click
Save edits
and toggle editing mode off
Now we’re able to compute distances between our sets of point features:
- In the Processing Toolbox, search for
Distance to Nearest Hub (Points)
- Set the Source points layer to
EPA_TRI_Plastics_nad83-conus-albers
- Set the Destination hubs layer to
GL_shorelines_points.shp
- Set the Hub layer name attribute to
FID
- Set the Measurement unit to
Meters
orKilometers
- Save the output Distance matrix as
EPA_TRI_DistanceToLakes.shp
Run
the process
Open the Attribute Table of the resulting layer. We have generated two new fields in the Attribute Table where each EPA plastics facility is matched with its nearest coastline point, and QGIS has computed the distance between them in meters (the
HubName
field is the FID
number of the nearest point).On your own:
- Create a "water bodies" point feature class that includes the Great Lakes shorelines and the nearby rivers. Then compute a new distance matrix of the EPA plastics facilities in relation to the water bodies, including rivers. (Hint: you will have to use selection methods to create a subset of the
ne_rivers_north_america.shp
features, create points along geometry, and then merge those points with the lakes' shorelines points that you already have. We covered multiple selection methods in the Lab 02 tutorial and theMerge Vector Layers
tool in the appendix.) - Compute a distance matrix between the NAS_zebra-mussel and the EPA_TRI_Plastics data layers.
Distance Raster (Interpolation)
Interpolation methods use points with known values to estimate values at other unknown points. We can use it to predict unknown values for any geographic point data, such as elevation, distance, chemical concentrations, noise levels, etc.
One method to do so is the
Inverse Distance Weighting (IDW)
interpolation technique, also known as the weighted average interpolator. It takes an input array with scattered data values for every point and outputs a grid geometry in the form of a distance raster. Distance rasters are powerful analytical tools, but even more powerful visual tools.Distance matrices that you just created are ideal inputs for IDW interpolation:
- In the main menu, navigate to
Raster
>Analysis
>Grid (Inverse Distance to a Power)…
- Set the Point layer to
EPA_TRI_DistanceToLakes.shp
- Set the Z value from field to
HubDist
Run
the operation
Finally, save the resulting raster as
EPA_distToLakes_grid.tif
and adjust the symbology:On your own: Use this same workflow to generate new interpolation rasters for the distance between EPA sites and Zebra Mussels (or any other point feature class of interest).