Why Irrigation Mapping is Complex

Discovering the challenges of detecting smallholder irrigation in diverse African landscapes using satellite data

PhD Research • Remote Sensing • Machine Learning

The Challenge

Smallholder irrigation in sub-Saharan Africa is vital for food security and rural livelihoods, yet nobody knows exactly where it is. Government statistics are incomplete, and traditional surveys are expensive and slow.

Satellite imagery offers hope—we can see the Earth at high resolution from space. But detecting scattered, small-scale irrigation plots in complex, diverse landscapes is surprisingly difficult.

Why is it hard?

  • Scale: Plots are often tiny (< 1 hectare), requiring very high-resolution imagery
  • Heterogeneity: Mixed cropland, natural vegetation, and water create confusion
  • Temporal dynamics: Irrigation patterns shift with seasons and cropping cycles
  • Data scarcity: Limited training data in most African regions
  • Methodological choices: Different algorithms, temporal windows, and parameters yield different results

This raises a critical question: How do we know if our satellite-based maps are actually detecting real irrigation?

Study Areas: Research conducted across Mozambique in Manica and Gaza provinces (Manica, Xai-Xai, Chokwe). Click a box to open the related interactive demo.
📊 The Problem in Numbers

Irrigation in Africa

2.5M hectares of smallholder irrigation exist in sub-Saharan Africa

30-40% are not documented in official agricultural statistics

Impact: Invisible to policy makers, development planners, and climate adaptation programs

Satellite Data Challenge

Most plots: < 1 hectare in size

Temporal coverage: Satellite revisit times range from 5-16 days

Impact: Small plots and long revisit times make detection prone to error

Temporal Strategy Matters

One of the first choices in satellite-based irrigation mapping is how long a period to analyze at once. This is called the "composite length."

The Trade-off

Short composites (2-3 months): Capture rapid changes and can identify scattered plots with detail. But they're noisier—clouds, shadows, and sensor artifacts create false detections.

Long composites (6-12 months): Smooth out noise and create cleaner signals. But they average out the details—scattered, dispersed irrigation may disappear into the seasonal background.

What This Means

There is no perfect answer. Your choice of temporal window directly affects which irrigated areas you detect and which you miss.

Interactive: Use the controls to switch between 2, 3, 6, and 12-month composites. Shorter windows reveal scattered plots but include noise. Longer windows are cleaner but may miss dispersed irrigation.
📊 Accuracy by Temporal Window
Accuracy vs composite length

Key Findings

In homogeneous zones: Longer composites perform best (94-96% accuracy). The consistent signal wins.

In heterogeneous zones: Shorter composites capture more true irrigation. Longer composites miss 15-25% of real plots.

Bottom line: Your choice of composite length directly impacts map accuracy and depends on your landscape type.

→ Read the full methodology →

Interactive: Compare four algorithms (RF, SVM, Neural Network, KNN) on the same satellite data. Dark blue = high agreement. Lighter = disagreement zones (high uncertainty).

Algorithm Choice Changes Everything

Once you have satellite data, the next question is: which machine learning algorithm should you use to classify irrigation?

The Variety

There are dozens of algorithms—Random Forest, Support Vector Machines (SVM), Neural Networks, K-Nearest Neighbors, and many more. Each has different strengths and weaknesses.

Random Forest is fast and robust. SVM excels with limited training data. Neural Networks can capture complex patterns. KNN is simple but can overfit. There's no "one true algorithm."

The Problem

On the same satellite data, different algorithms produce different maps. In areas where they disagree, we don't know which is right without going to the field to check.

📊 Algorithm Performance Comparison
Algorithm accuracy comparison

Algorithm Consensus Matters

High agreement zones: 90%+ accuracy. All algorithms agree = high confidence.

Medium agreement zones: 70-85% accuracy. Some uncertainty, but likely real irrigation.

Low agreement zones: < 70% accuracy. This is where field validation becomes critical.

Lesson: Use algorithm agreement as a confidence measure. Don't rely on a single algorithm.

Interactive: Compare 1% vs 100% training data across algorithms. Accuracy plateaus after ~10% of data. Different algorithms respond differently to data scarcity.

Training Data: How Much is Enough?

Machine learning algorithms need examples—labeled data showing "this is irrigation" and "this is not". In remote, data-poor regions, collecting training data is expensive and time-consuming.

The Question

How much training data do you really need? Is it worth the effort to collect comprehensive training data, or can you get good results with minimal effort?

The Answer (Spoiler)

There's a sweet spot. Accuracy improves dramatically from 1% to ~10% of data, then plateaus. After that, you get diminishing returns for your effort.

But the size of this sweet spot depends on your landscape complexity and algorithm choice.

📊 The Training Data Plateau
Training data size effect

The Diminishing Returns Curve

Set 1 (1% data): Accuracy ~75-80%. High variance across algorithms.

Set 2-3 (5-10% data): Accuracy jumps to ~88-92%. Sweet spot is here.

Set 8 (100% data): Accuracy plateaus at ~93-95%. Marginal gains only.

Practical takeaway: Invest in 5-10% quality training data. After that, effort is better spent on field validation.

What This Means: Practical Insights for Practitioners

🎯 There's No Single "Right Answer"

Different methodological choices (composite length, algorithm, training data size) lead to different but potentially equally valid maps. The best approach depends on your landscape, your resources, and what you're trying to learn.

📊 Use Ensemble Approaches

Rather than trusting a single algorithm, combine multiple algorithms and temporal windows. Areas where multiple approaches agree are higher confidence. Disagreement zones need field validation.

💡 Invest Smartly in Training Data

You don't need perfect, complete training data. Focus on 5-10% high-quality labeled examples. This gives you ~90% of the accuracy gains for a fraction of the effort.

🔍 Always Validate in the Field

Satellite-based maps are hypotheses, not ground truth. Use field validation to understand where your maps are right (and where they're wrong). Uncertainty is honest—plan for it.

🌍 Context Matters

Homogeneous irrigation zones are easier to map. Heterogeneous, fragmented landscapes are harder. Choose your methods based on landscape characteristics, not just what works elsewhere.

📈 Transparent About Uncertainty

Map accuracy varies by location and landscape type. Report confidence levels, not just final maps. Help decision-makers understand where the data is reliable and where it's exploratory.

PhD Research

About This Research

This work explores methodological challenges in remote sensing-based irrigation mapping for smallholder systems in sub-Saharan Africa. Research conducted in Mozambique with support from the World Bank and development partners.

Want to learn more?

📄 Read the Full Paper💻 View the Code (GitHub)📊 Access the Data← Back to All Demonstrations

© 2025 Resilience BV | Geospatial Lab