Carnegie Mellon University

Human Detection and Tracking in Agriculture

NREC developed advanced machine vision techniques for safety around agricultural vehicles. Robotics offers the opportunity to improve efficiency on the farm, but these systems must reliably detect other workers to ensure their safety.

Enabling the full promise of robotics in agriculture requires reliable detection and tracking of human coworkers so that people and machines can effectively and safely perform required tasks. Many agricultural machines are powerful and potentially dangerous, and certain tasks require humans to work closely to these machines. Other applications may need to enforce a safety buffer, and agricultural fields generally have minimal access controls. Even for smaller agricultural robots, it is often important for them to understand where the people in their environment are to effectively complete their tasks.

Our previous work resulted in a spatially distributed multi-vehicle system of autonomous tractors that shared task responsibilities with multiple human co-workers to accomplish agricultural operations in a citrus orchard. This system has demonstrated over 2400km of autonomous operation and performed significant useful work at a higher productivity level than current methods. The system includes a sophisticated obstacle detection system, but a key limiting factor was the reliable detection of people when partially occluded by tree branches and weeds or when lying on the ground or in other non-standard poses. [1] [2]

[1] Carnegie Mellon University. "Integrated Automation for Sustainable Specialty Crops." http://www.rec.ri.cmu.edu/usda/index.html

[2] S. J. Moorehead, C. K. Wellington, B. J. Gilmore, and C. Vallespi, “Automating orchards: A system of autonomous tractors for orchard maintenance,” in IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS) Workshop on Agricultural Robotics, 2012.

The aim of this work was to advance the state of the art in detection and tracking of people in agricultural environments. We benchmarked many current methods in pedestrian detection and developed new ones, and released the dataset below [3,4]. We hope that this common benchmark allows the field to move forward with a spirit of both competition and cooperation.

[3] T. Tabor, Z. Pezzementi, C. Vallespi and C. Wellington, 'People in the Weeds: Pedestrian Detection Goes Off-road', in 2015 IEEE International Symposium on Safety, Security, and Rescue Robotics, Purdue University, West Lafayette, IN, 2015.

[4]  Z. Pezzementi, T. Tabor, P. Hu, J. K. Chang, D. Ramanan, C. Wellington, B. P. Wisely Babu, H. Herman. “Comparing Apples and Oranges: Off-Road Pedestrian Detection on the NREC Agricultural Person-Detection Dataset.” arXiv preprint arXiv:1707.07169.

The NREC Person Detection Dataset is a collection of off-road videos taken in an apple orchard and orange grove. The videos are collected with a set of visible people in a variety of outfits, locations, and times. We encourage you to train a detector on our dataset and submit your curves for display on this webpage.

Labels are provided in Pascal VOC format and images are provided as rectified pngs. A training set has been partitioned for algorithm training. A full validation set has been partitioned for algorithm tuning and development results. Finally, a test set is provided for final evaluation and publication. We ask that the test set be used only after completion of development, in order to preserve the integrity of the dataset.

Details, analysis, and initial results on the data set can be found in our paper. Please cite this paper for any work making use of the data set:

Z. Pezzementi, T. Tabor, P. Hu, J. K. Chang, D. Ramanan, C. Wellington, B. P. Wisely Babu, H. Herman. “Comparing Apples and Oranges: Off-Road Pedestrian Detection on the NREC Agricultural Person-Detection Dataset.” arXiv preprint arXiv:1707.07169.

Scripts for working with the dataset are available at: https://github.com/zpezz/nrecAgPersonEval

The benchmark only requires the apples left labeled and oranges left labeled. The right images are provided for stereo. Additional left and right images, including 7 frames (1 second) before the labeled data begins are available in the unlabeled files. These can be used to compute motion features for detection or for visual odometry and new view synthesis benchmarking. Finally, the unassigned.zip file includes additional labeled data not included in the dataset, for instance, videos taken at night. 

PLEASE READ: The links for each data set will take you a corresponding folder on Box.com. Each folder contains .zip files of the data. The files are labeled numerically: "example-file-name-1.zip, example-file-name-2.zip, etc." Please download them and open them in order, starting with 1. 

Benchmark
apples left labeled 
oranged left labeled 

Right Stereo images for Benchmark
apples right labeled
oranges right labeled

Other Images from Benchmark videos
(1 second of video before labels, images not subsampled, variable length after labels)
apples left unlabeled
apples right unlabeled
oranges left unlabeled
oranges right unlabeled

Pose data in KITTI odometry format
(requires Benchmark labeled and unlabeled data for alignment)
poses

Other Videos
unassigned video

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Below are standings on the various benchmark metrics for the dataset. Please see https://github.com/zpezz/nrecAgPersonEval for the detailed definition of each category and implementation of the evaluation criteria.

ROCs are shown for a bounding box overlap (intersection over union: IoU) requirement of 0.5. Values shown in the table are average detection rate, which averages performance over IoU values from 0.3 to 0.7 in steps of size 0.1. Averages over each constituent ROC are computed by sampling the curve at false positive rates between 10-3  and 10-1 in steps of 101/4

To submit your own results, contact humandetection@nrec.ri.cmu.edu

The video results for the top three performers can be viewed by clicking the button below:

Test Set Detections Video

Algorithm

Baseline

Env=
Orange

Env=
Apple

All

Occ=
Clear

Occ=
Partial

Occ=
Heavy

Scale=
Large

Scale=
Medium

Scale=
Small

Pose=
Typical

Pose=
Unusual

Motion=
Static

Motion=
Moving

MFC

59.4

59.1

65.8

58.4

67.3

47.2

16.8

69.0

70.7

47.6

62.7

41.1

56.4

70.4

RPN+BF

44.2

46.0

48.5

43.5

52.1

31.8

9.9

59.7

67.7

15.7

48.8

14.0

40.5

57.9

MSCNN

54.6

55.0

56.6

53.8

60.7

40.9

7.3

72.7

59.5

30.3

59.5

19.1

53.0

60.0

Detectnet

9.0

11.0

6.9

8.9

10.5

6.1

0.3

17.3

4.4

1.7

10.2

0.7

10.5

7.8

MFC
Ref: Pezzementi, Z., Tabor, T., Hu, P., Chang, J., Ramanan, D., Wellington, C., Babu, B., Herman, H. Comparing Apples and Oranges: The NREC Agricultural Person-Detection Dataset. In submission.

RPN+BF
Ref: Zhang, L., Lin, L., Liang, X., and He, K. Is faster R-CNN doing well for pedestrian detection? ECCV 2016.
Notes: Using default settings on their FOS implementation

MSCNN
Ref: Cai, Z., Fan, Q., Feris, R. S., and Vasconcelos, N. A unified multi-scale deep convolutional neural network for fast object detection. ECCV 2016
Notes: Using default settings on their FOS implementation

Detectnet
Ref: Tao, A., Barker, J., and Sarathy, S. (2016). Detectnet: Deep neural network for object detection in digits. Website.
Notes: Using default settings on their FOS implementation

Photos

Human Detection and Tracking in Agriculture by the National Robotics Engineering Center.
Human Detection and Tracking in Agriculture by the National Robotics Engineering Center.
Human Detection and Tracking in Agriculture by the National Robotics Engineering Center.