PhD ResearchRemote SensingMachine Learning

Why Irrigation Mapping is Complex

Discovering the challenges of detecting smallholder irrigation in diverse African landscapes using satellite data.

The Challenge

Smallholder irrigation in sub-Saharan Africa is vital for food security and rural livelihoods, yet nobody knows exactly where it is. Government statistics are incomplete, and traditional surveys are expensive and slow.

Satellite imagery offers hope—we can see the Earth at high resolution from space. But detecting scattered, small-scale irrigation plots in complex, diverse landscapes is surprisingly difficult.

Why is it hard?

Scale: Plots are often tiny (< 1 hectare), requiring very high-resolution imagery
Heterogeneity: Mixed cropland, natural vegetation, and water create confusion
Temporal dynamics: Irrigation patterns shift with seasons and cropping cycles
Data scarcity: Limited training data in most African regions
Methodological choices: Different algorithms, temporal windows, and parameters yield different results

This raises a critical question: How do we know if our satellite-based maps are actually detecting real irrigation?

Study Areas: Research conducted across Mozambique in Manica and Gaza provinces (Manica, Xai-Xai, Chokwe). Click a box to open the related interactive demo.

Temporal Strategy Matters

One of the first choices in satellite-based irrigation mapping is how long a period to analyze at once. This is called the "composite length."

The Trade-off

Short composites (2-3 months): Capture rapid changes and can identify scattered plots with detail. But they're noisier—clouds, shadows, and sensor artifacts create false detections.

Long composites (6-12 months): Smooth out noise and create cleaner signals. But they average out the details—scattered, dispersed irrigation may disappear into the seasonal background.

What This Means

There is no perfect answer. Your choice of temporal window directly affects which irrigated areas you detect and which you miss.

Composite Length

Temporal Window

Opacity80%

Show Agreement Map

Interactive: Use the controls to switch between 2, 3, 6, and 12-month composites. Shorter windows reveal scattered plots but include noise. Longer windows are cleaner but may miss dispersed irrigation.

Algorithm Choice

Machine Learning Model

Opacity80%

Show Agreement Map

Interactive: Compare four algorithms (RF, SVM, Neural Network, KNN) on the same satellite data. Dark blue = high agreement. Lighter = disagreement zones (high uncertainty).

Algorithm Choice Changes Everything

Once you have satellite data, the next question is: which machine learning algorithm should you use to classify irrigation?

The Variety

There are dozens of algorithms—Random Forest, Support Vector Machines (SVM), Neural Networks, K-Nearest Neighbors, and many more. Each has different strengths and weaknesses.

Random Forest is fast and robust. SVM excels with limited training data. Neural Networks can capture complex patterns. KNN is simple but can overfit. There's no "one true algorithm."

The Problem

On the same satellite data, different algorithms produce different maps. In areas where they disagree, we don't know which is right without going to the field to check.

Training Data Size

Dataset Size

Algorithm

Opacity80%

Irrigation Detection

Low Agreement

High Agreement

Interactive: Compare 1% vs 100% training data across algorithms. Accuracy plateaus after ~10% of data. Different algorithms respond differently to data scarcity.

Training Data: How Much is Enough?

Machine learning algorithms need examples—labeled data showing "this is irrigation" and "this is not". In remote, data-poor regions, collecting training data is expensive and time-consuming.

The Question

How much training data do you really need? Is it worth the effort to collect comprehensive training data, or can you get good results with minimal effort?

The Answer (Spoiler)

There's a sweet spot. Accuracy improves dramatically from 1% to ~10% of data, then plateaus. After that, you get diminishing returns for your effort.

But the size of this sweet spot depends on your landscape complexity and algorithm choice.

What This Means: Practical Insights for Practitioners

No Single "Right Answer"

Different methodological choices (composite length, algorithm, training data size) lead to different but potentially equally valid maps. The best approach depends on your landscape, your resources, and what you're trying to learn.

Use Ensemble Approaches

Rather than trusting a single algorithm, combine multiple algorithms and temporal windows. Areas where multiple approaches agree are higher confidence. Disagreement zones need field validation.

Invest Smartly in Training Data

You don't need perfect, complete training data. Focus on 5-10% high-quality labeled examples. This gives you ~90% of the accuracy gains for a fraction of the effort.

Always Validate in the Field

Satellite-based maps are hypotheses, not ground truth. Use field validation to understand where your maps are right (and where they're wrong). Uncertainty is honest—plan for it.

Context Matters

Homogeneous irrigation zones are easier to map. Heterogeneous, fragmented landscapes are harder. Choose your methods based on landscape characteristics, not just what works elsewhere.

Transparent About Uncertainty

Map accuracy varies by location and landscape type. Report confidence levels, not just final maps. Help decision-makers understand where the data is reliable and where it's exploratory.

Read the Full Paper Explore All Demonstrations

The Challenge

Why is it hard?

The Problem in Numbers

Temporal Strategy Matters

The Trade-off

What This Means

Composite Length

Accuracy by Temporal Window

Algorithm Choice

Algorithm Choice Changes Everything

The Variety

The Problem

Algorithm Performance Comparison

Training Data Size

Training Data: How Much is Enough?

The Question

The Answer (Spoiler)

The Training Data Plateau

What This Means: Practical Insights for Practitioners

No Single "Right Answer"

Use Ensemble Approaches

Invest Smartly in Training Data

Always Validate in the Field

Context Matters

Transparent About Uncertainty