10 Open Datasets You Can Use For Computer Vision Projects

Computer vision is accelerating almost every domain in the industry. With the help of Computer Vision technologies, organisations are revolutionising the way machines used to work earlier. Now, big tech around the globe are utilising computer vision technology domains like healthcare and autonomous driving, among others. In order to build a robust deep learning model for Computer Vision, one must apply high-quality datasets into the training phase.

In this article, we will list down 10 high-quality datasets that one can use for Computer Vision projects.

1 – CIFAR-10

CIFAR-10 is a popular computer-vision dataset collected by Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton. This dataset is used for object recognition and it consists of 60,000 32×32 colour images in 10 classes, with 6,000 images per class. It is divided into five training batches and one test batch, each with 10,000 images which means there are 50,000 training images and 10,000 test images.

2 – Cityscapes

Cityscapes is an open-sourced large-scale dataset for Computer Vision projects which contains a diverse set of stereo video sequences recorded in street scenes from 50 different cities. It includes high-quality pixel-level annotations of 5,000 frames in addition to a larger set of 20,000 weakly annotated frames.

3 – Fashion MNIST

Fashion-MNIST is an image dataset for Computer Vision which consists of a training set of 60,000 examples and a test set of 10,000 examples. In this dataset, each example is a 28×28 grayscale image, associated with a label from 10 classes. There is an automatic benchmarking system based on Scikit-learn that covers 129 classifiers with different parameters.

4 – ImageNet

One of the popular datasets for Computer Vision projects, ImageNet provides an accessible image database which is organised according to the WordNet hierarchy. There are more than 100,000 synsets in WordNet where ImageNet provides an average of 1,000 images to illustrate each synset in the WordNet. It offers tens of millions of cleanly sorted images for most of the concepts in the WordNet hierarchy.

5 – IMDB-Wiki Dataset

IMDB-Wiki dataset is one of the largest and open-sourced datasets of face images with gender and age labels for training. There is a total of 523,051 face images in this dataset where face images are obtained from 20,284 celebrities from IMDB and 62,328 from Wikipedia.

6 – Kinetics-700

Kinetics-700 is a large-scale, high-quality dataset of YouTube video URLs which include a diverse range of human-focused actions. The dataset consists of approximately 650,000 video clips and covers 700 human action classes with at least 600 video clips for each action class. Here, each clip lasts around 10 seconds and is labelled with a single class

7 – MS Coco

COCO or Common Objects in COntext is large-scale object detection, segmentation, and captioning dataset. The dataset contains photos of 91 objects types which is easily recognisable and has a total of 2.5 million labelled instances in 328k images.

8 – MPII Human Pose Dataset

MPII Human Pose dataset is used for evaluation of articulated human pose estimation. The dataset includes around 25K images containing over 40K people with annotated body joints. Here, each image is extracted from a YouTube video and provided with preceding ann following un-annotated frames. Overall the dataset covers 410 human activities and each image is provided with an activity label.

9 – Open Images

This Open Images dataset is one of the largest existing datasets with object location annotations. It consists of around 9 million images annotated with image-level labels, object bounding boxes, object segmentation masks, and visual relationships. The dataset contains a total of 16 million bounding boxes for 600 object classes on 1.9 million images.

10 – The 20BN-something-something Dataset V2

The 20BN-Something-Something dataset is a large collection of densely-labelled video clips that show humans performing pre-defined basic actions with everyday objects. It was created by a large number of crowd workers which allows ML models to develop a fine-grained understanding of basic actions that occur in the physical world.