Hand-Labeled DARPA LAGR Datasets


Algorithms designed for time-evolving domains have a history of being evaluated on synthetic data, e.g., the "moving hyperplane" class of artificial data, where concept drift is introduced manually and any correlation to real-world problems is unestablished. This motivated the creation of natural datasets taken from the problem domain. The natural datasets used here are taken from logged field tests conducted by DARPA evaluators, and have been shown to contain time-varying (drifting) concepts.

Representative Images

Representative Dataset Images

Scenarios and Lighting Conditions

Overall, three scenarios are considered. Each scenario is associated with two distinct image sequences, each representing a different lighting condition. There are thus six datasets total. The terrain appearing in the datasets varies greatly, and includes various combinations of ground type (mulch, dirt); foliage; natural obstacles (trees, dense shrubs); and man-made obstacles (hay bales). Lighting conditions range from overcast with good color definition (e.g., DS1B, shown above), to very sunny, causing shadows and saturation (e.g., DS2A). Additional descriptions and representative images from each dataset are available.


Each dataset consists of a 100-frame hand-labeled image sequence. Each image was manually labeled, with each pixel being placed into one of three classes: Obstacle, Groundplane, or Unknown. If it was difficult for a human to tell what a certain area of an image was--even when using higher-level context--then that region was labeled as Unknown. On average, approximately 80% of each image was labeled as either Obstacle or Groundplane, with the remaining 20% labeled as Unknown.

Working with the Datasets

These are MATLAB-6 compatible *.mat files (read in via the load() function). Each MAT file (representing one single frame from the robot log files) has the raw RGB image as well as the disparity information (so you can do your own stereo processing if desired). Also included in the MAT file is an integer "mask" of the image indicating a pixelwise labeling. 0 means ground plane, 1 means obstacle, and 2 means "this pixel was not labeled by a human. Unlabeled areas have meaning; they may be regions for which the terrain class was hard to tell (even with context), or they may be "don't cares" (e.g., sky).

More information

For further information on these datasets, including additional representative images, see:

Michael J. Procopio, Jane M. Mulligan, and Greg Grudic. "Learning Terrain Segmentation with Classifier Ensembles for Autonomous Robot Navigation in Unstructured Environments." Journal of Field Robotics (2009).

Here's a link to the article.


Special thanks to to Wei Xu (at the University of Colorado at Boulder) and to Sharon Procopio for their assistance in labeling these images.


If you use this data in your research, we ask that you cite it as follows:

     author = {Michael J. Procopio},
     title = {Hand-Labeled {DARPA} {LAGR} Datasets},
     howpublished = {Available at \url{http://www.mikeprocopio.com/labeledlagrdata.html}},
     year = {2007}


The datasets can be downloaded as individual ZIP archives using the links below.

Dataset 1A (DS1A) - 100 Frames From LAGR Test 11 (452 MB)

Dataset 1B (DS1B) - 100 Frames From LAGR Test 11 (464 MB)

Dataset 2A (DS2A) - 100 Frames From LAGR Test 9 (440 MB)

Dataset 2B (DS2B) - 100 Frames From LAGR Test 9 (468 MB)

Dataset 3A (DS3A) - 100 Frames From LAGR Test 14 (492 MB)

Dataset 3B (DS3B) - 100 Frames From LAGR Test 14 (445 MB)

Additional Unlabeled Data

In addition to the labeled data above, there is a significant amount of unlabeled data frames from each of the above six datasets. These unlabeled frames occur later in the particular robot test run, and as such, exhibit stronger degrees of concept drift, and also include terrain not present in the earlier frames (1 to 100). These supplemental datasets, which start at Frame 101 (picking up where the labeled datasets above leave off) are available for download below.

Dataset 1A (DS1A) - 203 Supplemental Frames From LAGR Test 11 (101-303) (543 MB)

Dataset 1B (DS1B) - 154 Supplemental Frames From LAGR Test 11 (101-264) (435 MB)

Dataset 2A (DS2A) - 524 Supplemental Frames From LAGR Test 9 (101-624) (1.31 GB)