Why Irrigation Mapping is Complex
Discovering the challenges of detecting smallholder irrigation in diverse African landscapes using satellite data.
The Challenge
Smallholder irrigation in sub-Saharan Africa is vital for food security and rural livelihoods, yet nobody knows exactly where it is. Government statistics are incomplete, and traditional surveys are expensive and slow.
Satellite imagery offers hope—we can see the Earth at high resolution from space. But detecting scattered, small-scale irrigation plots in complex, diverse landscapes is surprisingly difficult.
Why is it hard?
- Scale: Plots are often tiny (< 1 hectare), requiring very high-resolution imagery
- Heterogeneity: Mixed cropland, natural vegetation, and water create confusion
- Temporal dynamics: Irrigation patterns shift with seasons and cropping cycles
- Data scarcity: Limited training data in most African regions
- Methodological choices: Different algorithms, temporal windows, and parameters yield different results
This raises a critical question: How do we know if our satellite-based maps are actually detecting real irrigation?
Temporal Strategy Matters
One of the first choices in satellite-based irrigation mapping is how long a period to analyze at once. This is called the "composite length."
The Trade-off
Short composites (2-3 months): Capture rapid changes and can identify scattered plots with detail. But they're noisier—clouds, shadows, and sensor artifacts create false detections.
Long composites (6-12 months): Smooth out noise and create cleaner signals. But they average out the details—scattered, dispersed irrigation may disappear into the seasonal background.
What This Means
There is no perfect answer. Your choice of temporal window directly affects which irrigated areas you detect and which you miss.
Composite Length
Interactive: Use the controls to switch between 2, 3, 6, and 12-month composites. Shorter windows reveal scattered plots but include noise. Longer windows are cleaner but may miss dispersed irrigation.
Algorithm Choice
Interactive: Compare four algorithms (RF, SVM, Neural Network, KNN) on the same satellite data. Dark blue = high agreement. Lighter = disagreement zones (high uncertainty).
Algorithm Choice Changes Everything
Once you have satellite data, the next question is: which machine learning algorithm should you use to classify irrigation?
The Variety
There are dozens of algorithms—Random Forest, Support Vector Machines (SVM), Neural Networks, K-Nearest Neighbors, and many more. Each has different strengths and weaknesses.
Random Forest is fast and robust. SVM excels with limited training data. Neural Networks can capture complex patterns. KNN is simple but can overfit. There's no "one true algorithm."
The Problem
On the same satellite data, different algorithms produce different maps. In areas where they disagree, we don't know which is right without going to the field to check.
Training Data Size
Interactive: Compare 1% vs 100% training data across algorithms. Accuracy plateaus after ~10% of data. Different algorithms respond differently to data scarcity.
Training Data: How Much is Enough?
Machine learning algorithms need examples—labeled data showing "this is irrigation" and "this is not". In remote, data-poor regions, collecting training data is expensive and time-consuming.
The Question
How much training data do you really need? Is it worth the effort to collect comprehensive training data, or can you get good results with minimal effort?
The Answer (Spoiler)
There's a sweet spot. Accuracy improves dramatically from 1% to ~10% of data, then plateaus. After that, you get diminishing returns for your effort.
But the size of this sweet spot depends on your landscape complexity and algorithm choice.
What This Means: Practical Insights for Practitioners
No Single "Right Answer"
Different methodological choices (composite length, algorithm, training data size) lead to different but potentially equally valid maps. The best approach depends on your landscape, your resources, and what you're trying to learn.
Use Ensemble Approaches
Rather than trusting a single algorithm, combine multiple algorithms and temporal windows. Areas where multiple approaches agree are higher confidence. Disagreement zones need field validation.
Invest Smartly in Training Data
You don't need perfect, complete training data. Focus on 5-10% high-quality labeled examples. This gives you ~90% of the accuracy gains for a fraction of the effort.
Always Validate in the Field
Satellite-based maps are hypotheses, not ground truth. Use field validation to understand where your maps are right (and where they're wrong). Uncertainty is honest—plan for it.
Context Matters
Homogeneous irrigation zones are easier to map. Heterogeneous, fragmented landscapes are harder. Choose your methods based on landscape characteristics, not just what works elsewhere.
Transparent About Uncertainty
Map accuracy varies by location and landscape type. Report confidence levels, not just final maps. Help decision-makers understand where the data is reliable and where it's exploratory.