Quickstart using Python scripting

Importing the modules

Providing that r.learn.ml2 is installed as a GRASS GIS addon, the python modules can be imported directly using:

# import the addon's modules
from pygrassml import RasterStack

The RasterStack class

Initiation

The main module in r.learn.ml2 is the RasterStack class. A RasterStack can be initiated using a list of GRASS GIS raster maps:

stack = RasterStack(rasters=["lsat7_2002_10", "lsat7_2002_20", "lsat7_2002_30", "lsat7_2002_40"])

Alternatively, it can be initiated using a GRASS imagery group:

stack = RasterStack(group="landsat_2002")

Indexing of RasterStack objects

Individual rasters within a RasterStack can be accessed using several methods:

stack.names  # returns names of rasters

# methods that return RasterRow objects
stack.lsat7_2002_10  # use attribute name directly

stack.iloc[0]  # access by integer index

stack.iloc[0:2]  # access using slices

stack.loc["lsat7_2002_10"]  # access using a label, or list of labels

# methods that always return a new RasterStack object
stack["lsat7_2002_10"]

Individual rasters within the RasterStack can be set using:

from grass.pygrass.raster import RasterRow

# set layers using a single index
stack.iloc[0] = RasterRow("lsat7_2002_61") 

# set layers using a multiple indexes
stack.iloc[[0, 1]] = [RasterRow("lsat7_2002_70"), RasterRow("lsat7_2002_80")]

# set layers using a slice of indexes
stack.iloc[0:2] = [RasterRow("lsat7_2002_70"), RasterRow("lsat7_2002_80")]

# set layers using a single label
stack.loc["lsat7_2002_10"] = RasterRow("lsat7_2002_61")

# set layers using multiple labels
stack.loc[["lsat7_2002_10", "lsat7_2002_20"]] = [RasterRow("lsat7_2002_61"), RasterRow("lsat7_2002_62")]

Viewing data with a RasterStack

Quick views of the values of the rasters within a RasterStack object can be generated by:

stack = RasterStack(rasters=["lsat7_2002_10", "lsat7_2002_20", "lsat7_2002_30", "lsat7_2002_40"])

# view data from the first 10 rows
stack.head()

# view data from the last 10 rows
stack.tail()

# convert raster to pandas dataframe
stack.to_pandas()

Reading array data from a RasterStack

Data from a RasterStack can be read into a 3D numpy array using the read method. The data is returned as a masked array with the GRASS GIS null values for each raster value masked.

# read all data (obeying the computational window settings)
stack.read()

# read a single row
stack.read(row=1)

# read a set of rows in a contiguous interval (start, end)
stack.read(rows=(1, 10))

Extracting data from a RasterStack

Pixel values can be spatially-queried in the RasterStack using either another raster containing labelled pixels via the extract_pixels method, or a GRASS GIS vector map containing point geomeries using the extract_points method. Either method can return the extracted data as three numpy arrays, or as a pandas dataframe.

When extracting data using another raster map, X will be a 3D numpy array containing the extracted data from the RasterStack, y will be a 1D numpy array containing the values of the pixels in the labelled pixels map, and cat is the index value of the pixels.

# extract data using another raster 
X, y, cat = stack.extract_pixels(response="labelled_pixels")

# extract data using another raster, and returning the GRASS raster categories
# instead of integer values
X, y, cat = stack.extract_pixels(response="labelled_pixels", use_cats=True)

# return data as a pandas dataframe
df = stack.extract_pixels(rast_name="labelled_pixels", as_df=True)

When extracting data using a vector map, the fields parameter refers to the name of an attribute, or several attributes in the vect_name map to returned with the extracted raster data. If several attributes are used the y will be a 2D numpy array.

# basic use
X, y, cat = stack.extract_points(vect_name="points_map", field="slope")
X, y, cat = stack.extract_points(vect_name="points_map"), field=["slope", "aspect"]

# as pandas
df = stack.extract_points(vect_name="points_map", field="slope", as_df=True)

By default, rows containing null values in any of the rasters are removed. This can be disabled by using na_rm=False:

df = stack.extract_points(vect_name="points_map", field="slope", as_df=True, na_rm=True)

Applying a machine learning model to data within a RasterStack

Any scikit-learn compatible model that has a predict method can be applied to the data within a RasterStack. The following provides a brief example within the nc_spm_08 sample GRASS location:

from grass.pygrass.modules.shortcuts import raster as r

# generate some training data from another land use map
r.random(input="landclass96", npoints=1000, raster="training_pixels")

# create a stack of landsat data
stack = RasterStack(
    rasters=[
        "lsat7_2002_10", 
        "lsat7_2002_20", 
        "lsat7_2002_30", 
        "lsat7_2002_40", 
        "lsat7_2002_50", 
        "lsat_2002_70"
    ]
)

# extract training data
X, y, cat = stack.extract_pixels(rast_name="training_pixels")

# fit a ml model
from sklearn.ensemble import RandomForestClassifier

rf = RandomForestClassifier()
rf.fit(X, y)

# apply fitted model to RasterStack and returns a RasterStack
preds = stack.predict(
    estimator=rf,                # fitted model
    output="rf_classification",  # name of output GRASS raster
    height=25,                   # number of rows to predict in chunks
    overwrite=False
)

probs = stack.predict_proba(
    estimator=rf,
    output="rf_classification",
    height=25,
    overwrite=False
)

Multi-target prediction is also allowed for scikit-learn models which accept multiple target features.