Quickstart using Python scripting
Importing the modules
Providing that r.learn.ml2 is installed as a GRASS GIS addon, the python modules can be imported directly using:
# import the addon's modules
from pygrassml import RasterStack
The RasterStack class
Initiation
The main module in r.learn.ml2 is the RasterStack
class. A RasterStack can be initiated using a list of GRASS GIS raster maps:
stack = RasterStack(rasters=["lsat7_2002_10", "lsat7_2002_20", "lsat7_2002_30", "lsat7_2002_40"])
Alternatively, it can be initiated using a GRASS imagery group:
stack = RasterStack(group="landsat_2002")
Indexing of RasterStack objects
Individual rasters within a RasterStack
can be accessed using several methods:
stack.names # returns names of rasters
# methods that return RasterRow objects
stack.lsat7_2002_10 # use attribute name directly
stack.iloc[0] # access by integer index
stack.iloc[0:2] # access using slices
stack.loc["lsat7_2002_10"] # access using a label, or list of labels
# methods that always return a new RasterStack object
stack["lsat7_2002_10"]
Individual rasters within the RasterStack
can be set using:
from grass.pygrass.raster import RasterRow
# set layers using a single index
stack.iloc[0] = RasterRow("lsat7_2002_61")
# set layers using a multiple indexes
stack.iloc[[0, 1]] = [RasterRow("lsat7_2002_70"), RasterRow("lsat7_2002_80")]
# set layers using a slice of indexes
stack.iloc[0:2] = [RasterRow("lsat7_2002_70"), RasterRow("lsat7_2002_80")]
# set layers using a single label
stack.loc["lsat7_2002_10"] = RasterRow("lsat7_2002_61")
# set layers using multiple labels
stack.loc[["lsat7_2002_10", "lsat7_2002_20"]] = [RasterRow("lsat7_2002_61"), RasterRow("lsat7_2002_62")]
Viewing data with a RasterStack
Quick views of the values of the rasters within a RasterStack
object can be generated by:
stack = RasterStack(rasters=["lsat7_2002_10", "lsat7_2002_20", "lsat7_2002_30", "lsat7_2002_40"])
# view data from the first 10 rows
stack.head()
# view data from the last 10 rows
stack.tail()
# convert raster to pandas dataframe
stack.to_pandas()
Reading array data from a RasterStack
Data from a RasterStack can be read into a 3D numpy array using the read
method.
The data is returned as a masked array with the GRASS GIS null values for each
raster value masked.
# read all data (obeying the computational window settings)
stack.read()
# read a single row
stack.read(row=1)
# read a set of rows in a contiguous interval (start, end)
stack.read(rows=(1, 10))
Extracting data from a RasterStack
Pixel values can be spatially-queried in the RasterStack using either another
raster containing labelled pixels via the extract_pixels
method, or a GRASS
GIS vector map containing point geomeries using the extract_points
method.
Either method can return the extracted data as three numpy arrays, or as a pandas
dataframe.
When extracting data using another raster map, X
will be a 3D numpy array containing
the extracted data from the RasterStack, y
will be a 1D numpy array containing
the values of the pixels in the labelled pixels map, and cat
is the index value
of the pixels.
# extract data using another raster
X, y, cat = stack.extract_pixels(response="labelled_pixels")
# extract data using another raster, and returning the GRASS raster categories
# instead of integer values
X, y, cat = stack.extract_pixels(response="labelled_pixels", use_cats=True)
# return data as a pandas dataframe
df = stack.extract_pixels(rast_name="labelled_pixels", as_df=True)
When extracting data using a vector map, the fields
parameter refers to the
name of an attribute, or several attributes in the vect_name
map to returned
with the extracted raster data. If several attributes are used the y
will be
a 2D numpy array.
# basic use
X, y, cat = stack.extract_points(vect_name="points_map", field="slope")
X, y, cat = stack.extract_points(vect_name="points_map"), field=["slope", "aspect"]
# as pandas
df = stack.extract_points(vect_name="points_map", field="slope", as_df=True)
By default, rows containing null values in any of the rasters are removed. This can
be disabled by using na_rm=False
:
df = stack.extract_points(vect_name="points_map", field="slope", as_df=True, na_rm=True)
Applying a machine learning model to data within a RasterStack
Any scikit-learn compatible model that has a predict
method can be applied to
the data within a RasterStack. The following provides a brief example within the
nc_spm_08 sample GRASS location:
from grass.pygrass.modules.shortcuts import raster as r
# generate some training data from another land use map
r.random(input="landclass96", npoints=1000, raster="training_pixels")
# create a stack of landsat data
stack = RasterStack(
rasters=[
"lsat7_2002_10",
"lsat7_2002_20",
"lsat7_2002_30",
"lsat7_2002_40",
"lsat7_2002_50",
"lsat_2002_70"
]
)
# extract training data
X, y, cat = stack.extract_pixels(rast_name="training_pixels")
# fit a ml model
from sklearn.ensemble import RandomForestClassifier
rf = RandomForestClassifier()
rf.fit(X, y)
# apply fitted model to RasterStack and returns a RasterStack
preds = stack.predict(
estimator=rf, # fitted model
output="rf_classification", # name of output GRASS raster
height=25, # number of rows to predict in chunks
overwrite=False
)
probs = stack.predict_proba(
estimator=rf,
output="rf_classification",
height=25,
overwrite=False
)
Multi-target prediction is also allowed for scikit-learn models which accept multiple target features.