TLDR: This research explores automatic image colorization using two deep learning approaches: classification with Convolutional Neural Networks (CNNs) and Generative Adversarial Networks (GANs). Evaluating on the CIFAR-10 dataset, the study found that while both methods can colorize images, the GAN-based approach generally achieves higher pixel accuracy and PSNR, though it is more computationally intensive. A user study also indicated that GAN-generated images were more realistic to human observers.
Image colorization, the fascinating process of adding colors to grayscale images, has seen significant advancements in computer vision. This task is inherently challenging because it involves recovering two out of three color dimensions that are lost, leading to many possible solutions. However, the context of a scene, like a blue sky or green grass, provides crucial clues for accurate color prediction. Thanks to the abundance of colored images available, deep learning models can be trained extensively to learn these complex relationships.
Traditionally, image colorization has been approached as a regression problem, which often overlooks the fact that a single grayscale image can have multiple plausible color interpretations. This research explores two modern deep learning strategies: classification and adversarial learning, building upon previous works and adapting them for specific scenarios.
Exploring Different Approaches
The study delves into two primary methods for automatic image colorization:
Classification-based Colorization: Instead of predicting continuous color values, this approach treats colorization as a classification problem. The ‘ab’ channels of the CIE Lab color space (which represent color information independently of brightness) are quantized into 313 discrete pairs. A neural network, similar to the U-Net architecture, is trained to predict a probability distribution over these color bins for each pixel. The final colorized image is then generated by mapping these probabilities back to color channels. Unlike some prior works, this study did not apply class rebalancing, finding it disrupted the training process in their specific setup.
GAN-based Colorization: Generative Adversarial Networks (GANs) consist of two competing neural networks: a generator and a discriminator. In this context, the generator takes a grayscale image and attempts to produce a realistic colorized version. The discriminator’s role is to distinguish between real colored images and those generated by the generator. By training these two networks in opposition, the generator learns to create increasingly convincing colorizations. This research uses a conditional GAN, where the generator is guided by the input grayscale image, and employs a modified U-Net architecture for the generator. The CIE Lab color space is also utilized here to separate brightness from color prediction, focusing the generator on the ‘ab’ color channels.
Experiments and Findings
The models were evaluated using the CIFAR-10 dataset, which comprises 60,000 small images (32×32 pixels) across 10 classes. Both models were trained using the Adam optimizer, with specific learning rates and epochs tailored to each approach. Training times varied, with the classification model taking about 4.5 hours and the GAN model taking 4 hours on different GPU setups.
To assess performance, several metrics were used: pixel-wise accuracy (measuring how many pixels’ colors are within a small error threshold of the original), Peak Signal-to-Noise Ratio (PSNR), and Structural Similarity Index (SSIM). PSNR and SSIM are common measures of image quality and similarity.
The results showed that while both classification and GAN methods could colorize grayscale images to an acceptable visual degree, the GAN-based approach generally outperformed the classification method. GANs achieved significantly higher pixel-wise accuracy and PSNR values, indicating better overall colorization quality. The SSIM values were comparable between the two methods. Interestingly, both models struggled more with generating accurate colors in the red (R) channel compared to green (G) and blue (B).
A user study was also conducted, where 16 students were asked to identify the ground truth image from a set including classification-generated and GAN-generated images. The results indicated that images produced by GANs were more successful at fooling users, with 40.69% of GAN-generated images being mistaken for ground truth, compared to 4.80% for classification-generated images. Users also rated GAN-generated images higher in terms of reality and quality.
Also Read:
- Iris Images and Gender Classification: A Deep Dive into Techniques
- Fd-CycleGAN: Enhancing Image Translation Through Frequency-Aware Learning
Implementation and Future Directions
The models were implemented in PyTorch, with adaptations made to the architectures to suit the smaller image size of the CIFAR-10 dataset. The researchers also utilized TensorBoard for visualizing synchronous colorization results.
In conclusion, this project successfully compared and evaluated the performance of convolutional neural networks and generative adversarial networks for automatic image colorization. While both are effective, the Conditional Deep Convolutional Generative Adversarial Network (C-DCGAN) demonstrated superior performance, albeit with higher computational demands. Future work includes experimenting with higher-resolution datasets like ImageNet or MS COCO, exploring different classifier backbones like ResNet, and investigating other generative models such as VAEs.
For more technical details, you can refer to the full research paper: Automatic Image Colorization with Convolutional Neural Networks and Generative Adversarial Networks.


