Stream Extraction Algorithm¶
Overview¶
The stream extraction algorithm converts flow accumulation data into vector stream networks. Cells exceeding a threshold are classified as streams, then vectorized into polylines with junction points at confluences and sources.
Core Concept¶
Stream extraction is a three-step process:
- Classification: Apply threshold to identify stream cells
- Node Detection: Find topologically significant points (sources, confluences)
- Vectorization: Trace stream segments between nodes
Stream Classification¶
A cell belongs to the stream network if its flow accumulation meets or exceeds the threshold:
where \(T\) is the accumulation threshold parameter.
Threshold selection:
- Lower thresholds extract more tributaries (denser network)
- Higher thresholds extract only main channels (sparser network)
- Common heuristics: \(T = 100\) to \(T = 1000\) cells, or based on contributing area
Node Detection¶
Nodes are topologically significant points in the stream network:
Source nodes: Stream cells with no upstream stream neighbors
Confluence nodes: Stream cells with multiple upstream stream neighbors
where \(\text{upstream}(n, c)\) is true if cell \(n\) flows directly to cell \(c\).
Upstream Neighbor Detection¶
A neighbor \(n\) is upstream of cell \(c\) if \(n\)'s flow direction points to \(c\):
def is_upstream(neighbor, cell, fdr):
"""Check if neighbor flows into cell."""
downstream_of_neighbor = get_downstream(neighbor, fdr)
return downstream_of_neighbor == cell
Algorithm¶
def find_nodes(stream_raster, fdr):
nodes = []
for cell in all_cells():
if not stream_raster[cell]:
continue
# Count upstream stream neighbors
upstream_count = 0
for neighbor in neighbors_8(cell):
if stream_raster[neighbor] and is_upstream(neighbor, cell, fdr):
upstream_count += 1
if upstream_count == 0: # Source
nodes.append(cell)
elif upstream_count > 1: # Confluence
nodes.append(cell)
return nodes
Stream Tracing¶
Each stream segment connects two nodes. Starting from each node, trace downstream until reaching another node or the domain boundary:
def trace_stream(start_node, fdr, stream_raster, node_set):
path = [start_node]
current = start_node
while True:
downstream = get_downstream(current, fdr)
if downstream is None:
# Reached boundary
break
if not stream_raster[downstream]:
# Left stream network
break
path.append(downstream)
if downstream in node_set and downstream != start_node:
# Reached another node
break
current = downstream
return path
Coordinate Conversion¶
Raster cell indices are converted to geographic coordinates using the geotransform:
where: - \((x_0, y_0)\) is the origin (top-left corner) - \((\Delta x, \Delta y)\) are pixel dimensions - \((r, c)\) are row and column indices - The \(+0.5\) offset places coordinates at cell centers
Output Structure¶
The algorithm produces two vector layers:
Streams Layer (LineString)¶
Each feature represents a stream segment between nodes:
- Geometry: LineString with vertices at cell centers
- Attributes: Feature ID, optional length
Junctions Layer (Point)¶
Each feature represents a topologically significant point:
- Sources: Headwater points (upstream extent)
- Confluences: Where tributaries join
- Outlets: Where streams exit the domain
Tiled Processing¶
For large rasters, streams are extracted per-tile and then merged:
Per-Tile Extraction¶
- Extract streams and junctions within tile
- Track tile coordinates for global positioning
- Record stream endpoints for cross-tile merging
Cross-Tile Merging¶
When streams cross tile boundaries, they are split into separate features. Post-processing merges these:
- Identify edge junctions: Junctions with exactly 2 stream endpoints
- Classify merge type: Determine endpoint orientations
- Merge geometries: Concatenate coordinates in correct order
Merge Types¶
For two stream segments meeting at a junction:
| Type | Description | Action |
|---|---|---|
| 0 | downstream \(\to\) upstream | Append second to first |
| 1 | upstream \(\to\) downstream | Prepend second to first |
| 2 | upstream \(\gets\) upstream | Reverse one, concatenate |
| 3 | downstream \(\gets\) downstream | Reverse one, concatenate |
def merge_streams(stream1, stream2, merge_type):
if merge_type == 0:
return stream1.coords + stream2.coords[1:]
elif merge_type == 1:
return stream2.coords + stream1.coords[1:]
elif merge_type == 2:
return list(reversed(stream1.coords)) + stream2.coords[1:]
elif merge_type == 3:
return stream1.coords + list(reversed(stream2.coords))[1:]
Spatial Hashing¶
To efficiently find matching endpoints across tiles, a spatial hash is used:
where \(\delta\) is the grid cell size and \(P\) is a large prime.
Complexity¶
Let \(n\) be total cells, \(s\) be stream cells, and \(j\) be junction count.
| Operation | Time Complexity |
|---|---|
| Classification | \(O(n)\) |
| Node detection | \(O(s)\) |
| Stream tracing | \(O(s)\) |
| Coordinate conversion | \(O(s)\) |
| Total | \(O(n)\) |
For tiled processing with merging:
| Operation | Time Complexity |
|---|---|
| Per-tile extraction | \(O(n)\) total |
| Edge junction finding | \(O(j)\) |
| Stream merging | \(O(j \cdot l)\) |
where \(l\) is average stream segment length.
Network Properties¶
The extracted stream network forms a directed tree (or forest):
- Connectivity: All stream cells are connected
- Hierarchy: Flow direction defines parent-child relationships
- Single outlet: Each connected component has one outlet (unless truncated by boundary)
See Also¶
- Flow Accumulation - Computing accumulation for thresholding
- Basin Delineation - Using stream junctions as drainage points