DOKUWIKI pour: voir

Table des matières

Computer vision datasets

Computer vision datasets

(kept by the computer vision and ML group of the LIRIS laboratory, INSA-Lyon)

Robotics simulators

Mobile agents, navigation

A useful classification on the AI-Thor website

2019 3D Deep-RL and reasoning w/o supercomputer by Inria-Chroma and LIRIS: high-level reasoning with VizDoom arxiv paper
2019 StreetLearn Dataset Navigation, from StreetView, by DeepMind.
2019 Habitat-Sim by FAIR. Built on SUNCG. Different tasks, including language, or not.
2019 Obstacle Tower Challenge By Unity.
2019 RobotIx Robot interactions, Pepper robot
2018 CHALET Cornell House Agent Learning Environment. CG, not photo-realistic, manipulable, 58 rooms, 10 houses. Still at version 0.1, i.e. unstable.
2018 SYnCity Vehicles, Unity. Commercial: Prize?
2018 Holodeck Agent or Drone, Unreal-4, single or multi-agent
2018 Matterport Indoor. Photo-realism, no physics, 90 buildings. Natural language interaction with robots.
2018 Gibson Real world images, indoor
2018 Chalet rendered images, indoor
2017 Minos Photo-realism, 90 buildings. goal-directed navigation in complex indoor environments.
2017 Minos-SunCG No-Photorealsim; customable. 45 000 houses.
2017 Minos-Matterport3D Photorealsim; 90 buildings.
2017 HoME simulated indoor, sun-cg (procedurally created)
2017 AI-Thor Allen I., Stanford, CMU. Photo realism, physics/actionable, customable, 32 rooms.
2017 Udacity Self driving car simulator based on Unity
2017 CARLA Autonomous driving. Python+ROS support. Depth,LIDAR,Semantic segmentation. Dynamic weather, multiple cars.
2017 SynCity Drones & driving.
2017 MS Airsim Drone or car, game engine.
2016 VizDoom
2002 Gazebo Very general robotics simulator

Walking, crawling etc.

2018 Learning Acrobatics by Watching YouTube Bair; articulated agents perform actions seen on youtube.

Aerial imagerie

2018 Large Scale Aereal Dataset Serge Belongie, CVPR 2018.

Semantic Full Scene Labelling

2017 Hyperspectral Koblenz
2017 Mapillary Vistas Dataset 25000 high resolution images.
2017 Stanford 2D-3D-Semantics Dataset: 100k images, indoor, with segmentation and 3D labels (surface normals etc. ;S. Saverese; Paper
2016 ADE20K
2016 ETH Video segmentation dataset
2016 Citscapes dataset Cityscapes (2000 densely annotated frames, 20000 total)
CamVid Dataset CamVaid
Stanford Dataset
KITTI Dataset Kitti
Siftflow dataset SiftFlow
NYU depth v2 dataset NYU

Places dataset Places 2,5 millions d’images avec 205 Scènes Labellisées.

Synthetically created datasets

2017 Physically-Based Rendering for Indoor Scene Understanding 300 000 images.

Gesture recognition

Review on datasets: http://www.sciencedirect.com/science/article/pii/S1077314214001568#
2015 INRIA LARSEN Dataset 12 Kinect video sequences of people in cluttered environment including MoCap ground truth
2013 Dexter 1: A Dataset for Evaluation of 3D Articulated Hand Motion Tracking
2013 ChAirGest multimodal dataset (10 gestures for close HCI, Kinect + accelerometer data)
2013 Shefﬁeld KInect Gesture (SKIG) Dataset
2014, 2013, 2012 ChaLearn Gesture Dataset
2012 The ASLAN action similarity labeling challenge
2012 MSRC-12 Kinect Gesture Dataset

Full body pose estimation

2016 COCO 2016 Keypoint Challenge 90k RGB images
2015 ChaLearn Looking at People 2015 - Track 1: Human Pose Recovery 8000 RGB images
2014 H3.6M Full body pose, 32joints, multiple RGB cameras + Swiss ranger TOF camera
2014 MultiHumanPose Shelf & Campus Datasets, Multi-camera RGB images. Calibration available. 3200 images per camera, but ground truth is available for only 300 frames for Shelf and 270 frames for Campus.
2014 Poses in the Wild 30 video sequences (~30 RGB frames each)
2014 MPII Human Pose Dataset ~21K RGB images, also annotated with action classes
2014 PARSE dataset 300 RGB images, D. Ramanan. Additional images
2013 FLIC (Frames Labeled in Cinema) dataset : 3987+1016 RGB images
2013 KTH Multiview Football Dataset : 800 time frames, captured from 3 views (ground truth joint annotation & calibration data available) + 5907 images (ground truth joint annotation available)
2011 Utrecht Multi-Person Motion (UMPM) Multi-cameras (RGB), multiple activities. Camera calibration & ground truth available.
2010 "We are family" stickmen - multiple people per image 525 images
2010 LSP - Leeds Sports Pose Dataset 2000 RGB images
2008 Buffy pose dataset RGB images
2006 HumanEva dataset Multi-camera RGB / Grayscale images

Hand pose estimation

2017 Big Hand Dataset 2.2M Images
2015 FingerPaint dataset
2015 HandNet, Technicon.il
2015 CVRR VIVA Challenge: detection, tracing and gesture recognition
2014 ICVL Hand Posture Dataset
2014 Microsoft Research hand tracking dataset
2014 General-HANDS dataset
2014 NYU Hand Pose Dataset

Action recognition

Surveys and dataset lists

Survey paper with descriptions and tables: PDF ( J.M. Chaquet, E.J. Carmona, A. Fernández-Caballero, A Survey of Video Datasets for Human Action and Activity Recognition, Computer Vision and Image Understanding, 2013)
Survey paper with descriptions and tables : Orit Kliper-Gross, Tal Hassner, and Lior Wolf, The Action Similarity Labeling Challenge, PAMI 2012. PDF
A list of datasets by Kevin Murphy
This page references a lot of datasets (including HARL) : https://www.cs.utexas.edu/~chaoyeh/web_action_data/dataset_list.html

Datasets

2017 Moments in Time Monfort et al., arXiv 2017.
2017 From Lifestyle VLOGs to Everyday Interactions 572k videos, Fouhey et al., arXiv 2017.
2017 The Kinetics Human Action Video Dataset , 300k videos Deepmind, arXiv 2017.
2017 Procedural Human Action Videos De Souza, Gaidon et al., CVPR 2017.
2017 20BN-something-something Dataset 256,591 labeled videos - Goyal et al. ICV 2017
2017 20BN-jester Dataset 148,092 videos,
2017 Edinburgh Ceilidh Overhead Video Data 16 dances with 2 dance patterns. Overhead video. Tracked people.
2017 AVA 64k videos, 60 classes, localized w/ bounding boxes
2017 PKU-MMD multi-modal human action understanding 51 classes, 66 subjects, 1076 videos, ~20 actions per video.
2016 NTU RGB+D Action Recognition Dataset (Rose Lab)
2015 Dementia Ambient Care: Multi-Sensing Monitoring for Intelligent Remote Management and Decision Support
2015 MEXaction2 action detection and localization dataset
2015 A2D 7 actor classes x 8 actions, >=99 video / class
2015 Activity-net 203 classes, 137 video per class, from the web
2014 Sports 1-M (Google) 1 million youtube videos, 487 classes
2014 Robocoffee dataset of human manipulation actions
2014 DogCentric Activity Dataset (First person views from dogs
2014 Northwestern-UCLA Multiview Action 3D Dataset (MoreInfoMultiAction3D)
2013 SBU-Kinect-Interaction dataset
2013 http://crcv.ucf.edu/ICCV13-Action-Workshop/
2013 Florence3D-Action dataset
2013 JHMDB dataset (fully annotated subset of HMDB dataset, including person binary mask and positions of joints)
2013 50 Salads dataset (RGB + Depth), accelerometers on kitchen tools
2013 JPL First-Person Interaction dataset
2013 Hollywood3D
2013 YouCook Dataset
2013 WorkoutSU-10 Exercise Dataset
2013 Berkeley Multimodal Human Action Database (MHAD)
2012 UCF101 dataset
2012 UCLA Courtyard Dataset
2012 UT Kinect Action Dataset
2011 The LIRIS dataset
2011 RGBD-HuDaAct Dataset
2011 VIRAT Video Dataset surveillance, 12 types of events
2011 HMDB51 : Large Video Database for Human Motion 51 action categories, 6849 clips.
2010 Olympic sports dataset 16 classes of sports.
2010 TV Human Interactions dataset For video retrieval (handshakes, high fives, hugs and kisses)
2010 UT Interaction dataset groundtruth : time + bounding boxes (shaking hands, pointing, hugging, pushing, kicking and punching)
2010 UT Tower dataset groundtruth : bounding boxes, foreground masks (pointing, standing, digging, walking, carrying, running, waving 1, waving 2, and jumping)
2010 Multiple Cameras Fall Dataset
2009 Collective Activity Dataset
2009 The Hollywood2 dataset
2009 Cornell Human Activities Dataset
2009 The MSR dataset (hand clapping, hand waving, boxing)
2009 University of Rochester Activities of Daily Living Dataset
2009 TUM Kitchen Dataset
2008 UIUC Action dataset Groundtruth: foreground masks (walking, running, jumping, waving, jumping jacks, clapping, jumping from sit-up, raise one hand, stretching out, turning, sitting to standing, crawling, pushing up and standing to sitting)
2007 Drinking and Smoking ("Coffee and Cigarettes") Including bounding boxes on key frames!
2007 The CASIA Action database outdoor, (walking, running, bending, jumping, crouching, fainting, wandering and punching a car)
2005 The VISOR dataset
2005 The ETISEO dataset (walking, running, sitting, lying, crouching, holding, pushing, jumping, pick up, puts down, fighting, queueing, tailgating, meeting and exchanging an object)
2004 The KTH dataset, Multi-KTH Dataset (Motion sequence with 6 persons each performing different KTH action. Includes camera motion, zoom and structured background with multiple planes.)
2004 The Behave dataset (Abnormal crime behavior)
2002 The CAVIAR dataset (Shopping mal; bounding boxes, activity label)
2001 The Weizmann dataset

Different and various UCF datasets: Different UCF datasets: UCF50, UCF sports, UCF aerial actions, UCF youtube, UCF Crowd segmentation ( eye gaze annotations for the UCF sports dataset).
Different and various MSR Action Recognition Datasets: MSRGesture3D, MSRDailyActivity3D, MSRAction3D

Multiview datasets:
- 2010 VideoWeb Dataset focusus on interactions (people meeting, people following, vehicles turning, people dispersing, shaking hands, gesturing, waving, hugging, and pointing)
- 2010 MuHAVi: Multicamera Human Action Video Data
- 2009 i3DPost Multi-view Dataset Groundtruth inclues 3D mesh models (walking, running, jumping, bending, hand-waving, jumping in place, sitting-stand up, running-falling, walking-sitting, running-jumping-walking, handshaking, pulling, and facial-expressions)
- 2009 PETS 2009
- 2007 PETS 2007
- 2006 INRIA IXMAS dataset 5 cameras, daily living (nothing, checking watch, crossing arms, scratching head, sitting down, getting up, turning around, walking, waving, punching, kicking, pointing, picking up, throwing (over head), and throwing (from bottom up))
- 2006 HumanEva dataset Multi-camera RGB / Grayscale images

Egocentric datasets:
- 2013, 2012 Georgia Tech egocentric activities (GTEA) dataset
- 2013 JPL First-Person Interaction dataset

Accelerometer actions:
- 2013 50 Salads dataset (RGB + Depth), accelerometers on kitchen tools
- 2012 The UCF-iPhone dataset

Body part segmentation

Touch gesture recognition

Itekube-7 Touch Dataset Touch table dataSet by Itekube + LIRIS

Object recognition and segmentation

Pedestrian detection

2014 PETA Dataset 19000 images

Motion capture

https://sites.google.com/a/cgspeed.com/cgspeed/motion-capture

Visual Question Answering

2016 VQA Dataset 200,000 real scene images from MSCOCO along with 1 million questions. Versions: v1, v2, VQA-CP.
2016 CLEVR Diagnostic dataset for compositional language and elementary visual reasoning (synthetic data). Versions: CLEVR, CLEVR Humans, CoGenT.
2016 Visual Genome 108,000 real scene images (MSCOCO & YFCC100M intersection) along with 1.7 million questions. It is a general purpose dataset as it proposes many annotations in addition to question/answer paires: object instances, relationships, etc…
2016 Visual Dialog 123,000 images from MSCOCO. Each image is annotated with a dialog composed of 10 question answer paires.
2018 VizWiz answering visual questions from blind people.
2017 GuessWhat?! visual object discovery through multi-modal dialogue. 66,000 images from MSCOCO along with 160,000 dialogues (822,000 question answer paires).
2016 Visual7W grounded question answering in images. It is a subset of Visual Genome, 47,000 images from MSCOCO along with 328,000 question answer paires.

Main : VOIR - a smart vision platform

DOKUWIKI pour: voir

Outils pour utilisateurs

Outils du site

Table des matières

Computer vision datasets

Robotics simulators

Mobile agents, navigation

Walking, crawling etc.

Robotic arms

Other dataset lists and surveys

Physics simulators

Aerial imagerie

Semantic Full Scene Labelling

Synthetically created datasets

Gesture recognition

Full body pose estimation

Hand pose estimation

Action recognition

Surveys and dataset lists

Datasets

Body part segmentation

Touch gesture recognition

Object recognition and segmentation

Pedestrian detection

Motion capture

Visual Question Answering

Outils de la page