License: CC BY-NC-ND 4.0
arXiv:2403.01476v1 [cs.CV] 03 Mar 2024

CCC: Color Classified Colorization

Mrityunjoy Gain, Avi Deb Raha, and Rameswar Debnath*, Member, IEEE Computer Science and Engineering Discipline, Khulna University, Khulna 9208, Bangladesh
E-mail: {gain1624, dev1611, rdebnath}@cseku.ac.bd
Abstract

Automatic colorization of gray images with objects of different colors and sizes is challenging due to inter- and intra-object color variation and the small area of the main objects due to extensive backgrounds. The learning process often favors dominant features, resulting in a biased model. Like the class imbalance problem, a weighted function imposing a higher weight on minority features can solve this feature imbalance problem. In this paper, we formulate the colorization problem into a multinomial classification problem and then apply a weighted function to classes. We propose a set of formulas to transform color values into color classes and vice versa. Class optimization and balancing feature distribution are the keys for good performance. Class levels and feature distribution are fully data-driven. Observing class appearance on various extremely large-scale real-time images in practice, we propose 215 color classes for our colorization task. During training, we propose a class-weighted function based on true class appearance in each batch to ensure proper color saturation of individual objects. We establish a trade-off between major (mostly appearing) and minor classes (rarely appearing) to provide orthodox class prediction by eliminating major classes’ dominance over minor classes. As we apply regularization to enhance the stability of the minor class, occasional minor noise may appear at the object’s edges. We propose a novel object-selective color harmonization method empowered by the Segment Anything Model (SAM) to refine and enhance these edges. We propose a new color image evaluation metric, the Chromatic Number Ratio (CNR), to quantify the richness of color components. We compare our proposed model with state-of-the-art models using five different datasets: ADE, Celeba, COCO, Oxford 102 Flower, and ImageNet, in both qualitative and quantitative approaches. The experimental results show that our proposed model outstrips other models in visualization and CNR measurement criteria while maintaining satisfactory performance in regression (MSE, PSNR), similarity (SSIM, LPIPS, UIQI), and generative criteria (FID).

Index Terms:
Colorization, Minority Features, Feature Balancing, Chromatic Number

I Introduction

Figure 1: Imbalance feature distribution makes the regression task biased
Refer to caption
Refer to caption
Figure 1: Imbalance feature distribution makes the regression task biased
Figure 2: Special priority can effectively regulate the minor class
Figure 3: CCC can overcome feature imbalance in colorization by considering the regression task as a classification task and imposing higher weights on minor classes.

Human vision perceives thousands of colors, making object identification easier. Color images are a popular way to express creativity and reminisce. Colorizing images from antiquity, medicine, industry, and astronomy helps convey their meanings. Color-coded subjects continue to captivate the public with remastered versions of vintage black-and-white movies, colored books, and online automatic colorization bots.
Colorization is a process that assigns color components to grayscale images. It can be non-linear and ill-posed, allowing multiple colors in a single gray image. For example, a fruit’s color can be light green, yellow, or red. Natural colorization aims to predict credible color distribution, not just the intensity values of a gray image. This process is not limited to the ground truth image color values.
Researchers have used various methods for image coloring, including user-guided [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 31, 36, 51, 52] and learning-based methods[13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 28, 29, 30, 32, 33, 34, 35, 37]. Traditional user-guided methods require significant human interaction, leading to a decline in effectiveness. Learning-based strategies, which involve classical regression[21, 24, 28, 29, 50, 53], object segmentation[25, 32, 33, 35], generative approaches[13, 14, 15, 16, 17, 18, 19, 34, 37], and feature-balancing[20, 22, 23, 30] techniques, are now more popular. Deep learning approaches, particularly in regression, are also gaining popularity for image colorization. These methods are easier to implement and require less human labor. Deep Neural Networks (DNNs) learn representative features and hidden structural knowledge from data through training. The loss function generates feedback to refine the model’s parameters, and networks adjust weights proportionally to the error. However, imbalanced class distributions can cause learning models to incorrectly classify minority class observations, making predicted class probabilities unreliable[26, 27].
The colorization problem primarily involves feature distributions, with unbalanced distributions causing imbalances in the training process. Desaturated color components are more prevalent in training images, impacting the performance of saturated color components. This bias can cause smaller objects’ hues to merge with the background, making learning tasks biased. Handling feature imbalance is essential for achieving the desired learning outcomes.
Class imbalance is often resolved by resampling the dataset or using weighted functions to increase minority class weight. In training, features of a sample determine gradient directions on the loss function. The sample’s spatial resolution determines the input dimension of a learning model, while the output dimension of colorization models is the same. Defining rules to transform feature values into class values can solve feature imbalance problems.

The study proposes a method to transform continuous color values into discrete color classes and vice versa to predict a distribution of possible colors for each pixel. The study revealed that 215 of the 400 color classes are predominantly present. So, we reduced the class to 215 as redundancy reduced the classification accuracy. To address the class imbalance issue, we determine the class weights by analyzing the true class of each batch during training, assigning higher weights to rarely-appearing classes. This adjustment aims to alleviate desaturation and biases towards predominant features. We propose a SAM-empowered[57] object-selective color harmonization method to refine and polish the edge more.

Our proposed model formulated regression problem into classification problem based on the work CIC[30]. The CIC[30] defines classes that are static. But self-supervision majorly depends on data pattern and variety. So We formulate our problem in a data-driven manner. Moreover, we embedded SAM that improves results because it eradicates color bleeding and ensures object-selective color harmonization for model failure cases. We also describe with experiments the impact of classes with their appearance in the image (>>>500). The basic works improvement of CCC over the work CIC[30] are illustrated in Tab. I:

TABLE I: Differences between the CIC[30] and our proposed CCC
Contents CIC[30] CCC
Formula for color class conversion and vice versa No Yes
Data driven class points optimization No Yes
Task generalization(Adaptibility on
similar feature imbalance problem) No Yes
Data driven class weight formulation No Yes
Segmentation based edge refinement No Yes
Chromatic diversity evaluation metric No Yes

The essence of our method is shown in Fig. I. The following are the contributions to this work:

  1. 1.

    We propose a set of formulas to transform continuous double-channel color values into discrete single-channel color classes and vice versa. Any feature imbalance regression problem can be configured to a classification problem using these baseline formulas.

  2. 2.

    We optimize class levels of the colorization problem by analyzing numerous different images.

  3. 3.

    We propose a class re-weighting formula for graving high gradient from misclassified low appeared or rare classes to ensure a balance contribution of all classes in the loss. This removes feature biases as well as desaturation along with over-saturation from the color distribution and ensures orthodox prediction.

  4. 4.

    We proposed a novel object-selective color harmonization method empowered by the Segment Anything Model (SAM) to make the edge more refined and polished.

  5. 5.

    We propose a new color image evaluation metric, Chromatic Number Ratio (CNR), which quantifies the richness of color classes in generated images compared to ground truth images, providing a comprehensive measure of the color spectrum.

  6. 6.

    We present an abundance of quantitative and qualitative results demonstrating that our method significantly outperforms extant state-of-the-art baselines and produces reasonable results.

The rest of the paper is structured as follows: Sec. II reviews the relevant literature; Sec. III, the entire CCC, including problem formulation and solution approach; Sec. IV, the SAM-empowered color harmonization; Sec. V the experimental outcomes and a comparative analysis with other cutting-edge techniques; and Sec. VI, the conclusion.

II Related Litarature

Image colorization mainly falls into two categories: user-guided and learning-based colorization.

II-A User Guided Colorization

The user-guided colorization mainly falls into two categories: scribble-based and example-based.
Scribble-based colorization The scribble-based colorization technique uses user input to fill in missing or incomplete sections of an image. Techniques include optimizing color propagation[2], combining non-iterative techniques with adaptive edge extraction[1], introducing color blending[3], propagating color effectively[4] in pattern-continuous and intensity-continuous regions[5], and incorporating U-Net structures[31].
Example-based colorization The example-based colorization minimizes user effort in grayscale image transmission, including global color statistics[6], segmented region-level approaches[7, 8, 36], super-pixel-level[51, 9], and pixel-level methods[10, 52]. However, manual similarity metrics can be prone to error in scenarios with significant variations in intensity and content[11, 12].

II-B Learning Based Colorization

Learning-based colorization is a machine-learning technique that automatically applies color to grayscale or black-and-white photographs using CNNs trained on large datasets. The main challenge is feature balancing for focused objects and backgrounds.
Basic Regression Based Colorization: Colorization involves using conventional CNN or specialized architectures like InceptionNet, VGGNet, ResNet, and DenseNet to estimate color channels from grayscale images. Gradient is calculated using regression loss function by automated methods, such as[28, 1, 29, 24], encoder-decoder based colorization models[53, 21], iColoriT[50] etc.
Object Segmentation Based Colorization Various colorization models that segment objects within an image, learn color assignment segment-wise or object-wise, and assign colors to segments using techniques like spatial connections or global color coherence. These models have been developed using various techniques, including semantic segmentation[25, 33, 32], adversarial edge-aware models[35], and point annotations.
GAN Based Colorization GAN image colorization models combine discriminator and generator networks to produce realistic, aesthetically pleasing colorized photographs. These models use semantic information[13], CapsNet[34], GAN encoders[14], and other techniques[17, 16] to improve colorization results. Examples include creating ethnic costumes, using GAN encoders for colorization, and using GAN for colorizing medical images[18, 37, 15, 19]. Techniques like transfer learning and deep convolution GAN have been developed for various applications.
Feature Balancing for Colorization Zhang et al.[30] proposed an automatic colorization using CNN, classifying intensity into predetermined color levels and assigning corresponding colors based on classified class levels. An et al.[23] used a VGG-16 CNN model and color rebalancing technique to solve feature imbalance problems. Larsson et al.[20] used unbalanced loss of classification, and Gain et al.[22] proposed a deep localized network for image colorization.

III Color Classified Colorization

III-A Color Space

Conventional RGB is the most commonly used color space, consisting of Red, Green, and Blue. However, its inability to distinguish between color and content information renders it inappropriate for color manipulation tasks involving colorization. CIE LAB[38] is a suitable choice, as it separates color information from context information, allowing for manipulation while keeping context information unchanged. In La*b* (LAB) space, L denotes the brightness or luminosity of the picture, with intensities falling between [0,100]0100[0,100][ 0 , 100 ]. As L𝐿Litalic_L increases, colors become brighter. The a* and b* channels correspondingly represent the image’s proportion of red-green and yellow-blue tones, with red-yellow represented by a positive value and green-blue by a significant negative value, often falling between [128,127]128127[-128,127][ - 128 , 127 ].

Refer to caption
Figure 4: Color class conversion

III-B Problem Definition

The colorization problem is considered to predict color channels from a given gray channel. The Lightness (L) channel of La*b* color space can be mapped into the gray channel(intensity) and vice versa[38]. Furthermore, RGB can be mapped into LAB and vice versa. The task can be defined as follows in Eq. 1, 2, 3, and 4.

𝒳ab=f(𝒳L)subscript𝒳𝑎𝑏𝑓subscript𝒳𝐿\small\mathcal{X}_{ab}=f(\mathcal{X}_{L})\vspace{-2mm}caligraphic_X start_POSTSUBSCRIPT italic_a italic_b end_POSTSUBSCRIPT = italic_f ( caligraphic_X start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT ) (1)
Distancemin(𝒴ab,𝒳ab)𝐷𝑖𝑠𝑡𝑎𝑛𝑐subscript𝑒𝑚𝑖𝑛subscript𝒴𝑎𝑏subscript𝒳𝑎𝑏\small Distance_{min}(\mathcal{Y}_{ab},\mathcal{X}_{ab})italic_D italic_i italic_s italic_t italic_a italic_n italic_c italic_e start_POSTSUBSCRIPT italic_m italic_i italic_n end_POSTSUBSCRIPT ( caligraphic_Y start_POSTSUBSCRIPT italic_a italic_b end_POSTSUBSCRIPT , caligraphic_X start_POSTSUBSCRIPT italic_a italic_b end_POSTSUBSCRIPT ) (2)
𝒳Lab=concat(𝒳L,𝒳ab)subscript𝒳𝐿𝑎𝑏𝑐𝑜𝑛𝑐𝑎𝑡subscript𝒳𝐿subscript𝒳𝑎𝑏\small\mathcal{X}_{Lab}=concat(\mathcal{X}_{L},\mathcal{X}_{ab})caligraphic_X start_POSTSUBSCRIPT italic_L italic_a italic_b end_POSTSUBSCRIPT = italic_c italic_o italic_n italic_c italic_a italic_t ( caligraphic_X start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT , caligraphic_X start_POSTSUBSCRIPT italic_a italic_b end_POSTSUBSCRIPT ) (3)
𝒳LH×W×1,𝒳abH×W×2,𝒴abH×W×2formulae-sequencesubscript𝒳𝐿superscript𝐻𝑊1formulae-sequencesubscript𝒳𝑎𝑏superscript𝐻𝑊2subscript𝒴𝑎𝑏superscript𝐻𝑊2\small\mathcal{X}_{L}\in\mathbb{R}^{H\times W\times 1},\mathcal{X}_{ab}\in% \mathbb{R}^{H\times W\times 2},\mathcal{Y}_{ab}\in\mathbb{R}^{H\times W\times 2}caligraphic_X start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_H × italic_W × 1 end_POSTSUPERSCRIPT , caligraphic_X start_POSTSUBSCRIPT italic_a italic_b end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_H × italic_W × 2 end_POSTSUPERSCRIPT , caligraphic_Y start_POSTSUBSCRIPT italic_a italic_b end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_H × italic_W × 2 end_POSTSUPERSCRIPT (4)

where 𝒳Lsubscript𝒳𝐿\mathcal{X}_{L}caligraphic_X start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT is the lightness channel, 𝒳absubscript𝒳𝑎𝑏\mathcal{X}_{ab}caligraphic_X start_POSTSUBSCRIPT italic_a italic_b end_POSTSUBSCRIPT is the predicted color channel, 𝒳Labsubscript𝒳𝐿𝑎𝑏\mathcal{X}_{Lab}caligraphic_X start_POSTSUBSCRIPT italic_L italic_a italic_b end_POSTSUBSCRIPT is the predicted color image, 𝒴absubscript𝒴𝑎𝑏\mathcal{Y}_{ab}caligraphic_Y start_POSTSUBSCRIPT italic_a italic_b end_POSTSUBSCRIPT is the ground truth color channel, f(.)f(.)italic_f ( . ) is the mapping function achieved by deep learning, Distancemin(.)Distance_{min}(.)italic_D italic_i italic_s italic_t italic_a italic_n italic_c italic_e start_POSTSUBSCRIPT italic_m italic_i italic_n end_POSTSUBSCRIPT ( . ) is the objective function(can be any loss function) by which the optimizer makes the learning efficient, \mathbb{R}blackboard_R is the total image component, \mathbb{H}blackboard_H and 𝕎𝕎\mathbb{W}blackboard_W are the image dimension.

Theoretically, the values of the a* and b* channels are continuous within [-128, 127]. Therefore, the prediction is considered a regression problem. That’s why the Distancemin(.)Distance_{min}(.)italic_D italic_i italic_s italic_t italic_a italic_n italic_c italic_e start_POSTSUBSCRIPT italic_m italic_i italic_n end_POSTSUBSCRIPT ( . ) of Eq. 2 naturally can be either L1 loss or L2 loss or Huber loss or Log-cosh loss or similar regression loss shown in Eq. 5, 6, 7, 8.

L1(𝒴ab,𝒳ab)=1NN|𝒴ab𝒳ab|subscript𝐿1subscript𝒴𝑎𝑏subscript𝒳𝑎𝑏1𝑁subscript𝑁subscript𝒴𝑎𝑏subscript𝒳𝑎𝑏\small L_{1}(\mathcal{Y}_{ab},\mathcal{X}_{ab})=\frac{1}{N}\sum_{N}|\mathcal{Y% }_{ab}-\mathcal{X}_{ab}|\vspace{-2mm}italic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( caligraphic_Y start_POSTSUBSCRIPT italic_a italic_b end_POSTSUBSCRIPT , caligraphic_X start_POSTSUBSCRIPT italic_a italic_b end_POSTSUBSCRIPT ) = divide start_ARG 1 end_ARG start_ARG italic_N end_ARG ∑ start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT | caligraphic_Y start_POSTSUBSCRIPT italic_a italic_b end_POSTSUBSCRIPT - caligraphic_X start_POSTSUBSCRIPT italic_a italic_b end_POSTSUBSCRIPT | (5)
L2(𝒴ab,𝒳ab)=1NN(𝒴ab𝒳ab)2subscript𝐿2subscript𝒴𝑎𝑏subscript𝒳𝑎𝑏1𝑁subscript𝑁superscriptsubscript𝒴𝑎𝑏subscript𝒳𝑎𝑏2\small L_{2}(\mathcal{Y}_{ab},\mathcal{X}_{ab})=\frac{1}{N}\sum_{N}(\mathcal{Y% }_{ab}-\mathcal{X}_{ab})^{2}italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( caligraphic_Y start_POSTSUBSCRIPT italic_a italic_b end_POSTSUBSCRIPT , caligraphic_X start_POSTSUBSCRIPT italic_a italic_b end_POSTSUBSCRIPT ) = divide start_ARG 1 end_ARG start_ARG italic_N end_ARG ∑ start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ( caligraphic_Y start_POSTSUBSCRIPT italic_a italic_b end_POSTSUBSCRIPT - caligraphic_X start_POSTSUBSCRIPT italic_a italic_b end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT (6)
Lδ={1NN12(𝒴ab𝒳ab)2,|𝒴ab𝒳ab|<δ1NNδ((𝒴ab𝒳ab)12δ),otherwise\small L_{\delta}=\left\{\begin{matrix}\frac{1}{N}\sum\limits_{N}\frac{1}{2}(% \mathcal{Y}_{ab}-\mathcal{X}_{ab})^{2},&|\mathcal{Y}_{ab}-\mathcal{X}_{ab}|<% \delta\\ \frac{1}{N}\sum\limits_{N}\delta((\mathcal{Y}_{ab}-\mathcal{X}_{ab})-\frac{1}{% 2}\delta),&otherwise\end{matrix}\right.italic_L start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT = { start_ARG start_ROW start_CELL divide start_ARG 1 end_ARG start_ARG italic_N end_ARG ∑ start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( caligraphic_Y start_POSTSUBSCRIPT italic_a italic_b end_POSTSUBSCRIPT - caligraphic_X start_POSTSUBSCRIPT italic_a italic_b end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , end_CELL start_CELL | caligraphic_Y start_POSTSUBSCRIPT italic_a italic_b end_POSTSUBSCRIPT - caligraphic_X start_POSTSUBSCRIPT italic_a italic_b end_POSTSUBSCRIPT | < italic_δ end_CELL end_ROW start_ROW start_CELL divide start_ARG 1 end_ARG start_ARG italic_N end_ARG ∑ start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT italic_δ ( ( caligraphic_Y start_POSTSUBSCRIPT italic_a italic_b end_POSTSUBSCRIPT - caligraphic_X start_POSTSUBSCRIPT italic_a italic_b end_POSTSUBSCRIPT ) - divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_δ ) , end_CELL start_CELL italic_o italic_t italic_h italic_e italic_r italic_w italic_i italic_s italic_e end_CELL end_ROW end_ARG (7)
LogCosh(𝒴ab,𝒳ab)=1NNlog(cosh(𝒴ab𝒳ab))𝐿𝑜𝑔𝐶𝑜𝑠subscript𝒴𝑎𝑏subscript𝒳𝑎𝑏1𝑁subscript𝑁𝑙𝑜𝑔𝑐𝑜𝑠subscript𝒴𝑎𝑏subscript𝒳𝑎𝑏\small Log-Cosh(\mathcal{Y}_{ab},\mathcal{X}_{ab})=\frac{1}{N}\sum_{N}log(cosh% (\mathcal{Y}_{ab}-\mathcal{X}_{ab}))italic_L italic_o italic_g - italic_C italic_o italic_s italic_h ( caligraphic_Y start_POSTSUBSCRIPT italic_a italic_b end_POSTSUBSCRIPT , caligraphic_X start_POSTSUBSCRIPT italic_a italic_b end_POSTSUBSCRIPT ) = divide start_ARG 1 end_ARG start_ARG italic_N end_ARG ∑ start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT italic_l italic_o italic_g ( italic_c italic_o italic_s italic_h ( caligraphic_Y start_POSTSUBSCRIPT italic_a italic_b end_POSTSUBSCRIPT - caligraphic_X start_POSTSUBSCRIPT italic_a italic_b end_POSTSUBSCRIPT ) ) (8)

Background colors like clouds, soil, pavement, and walls dominate real-time images, leading to an imbalanced distribution of features. Handling feature imbalance is crucial because the smaller subsets of features are the feature of interest for the learning task. The ambiguity and multimodality of the colorization problem make the above loss functions vulnerable. The mean of the set is the most effective method to solve the loss, as the averaging error effect favors color values predominantly covered in the ground truth image. In an imbalanced feature distribution, the training process is biased towards larger feature subsets, resulting in the colors of smaller objects disappearing from the resulting models. The distribution of a*b* values is skewed towards desaturated values, causing the color of minuscule objects to disappear.

Refer to caption
Figure 5: Color class to visual color conversion

III-C Solution Approach

Continuous Color Range to Discrete Color Classes The a* and b* color channels are continuous within the [128,127]128127[-128,127][ - 128 , 127 ] range. Each a*b* pair with a lightness value L𝐿Litalic_L forms an RGB color pixel. We can get an a*b* pair from a*b* color space, a 2-D space, where a* is one direction and b* is another. For a fixed L𝐿Litalic_L, a small change in the a*b* pair has no psychovisual effect. Because human perception of the information in an image normally does not involve quantitative analysis of every pixel value in the image. Colorization is a regression problem where the regression model predicts the continuous quantities of a* and b* for a given L𝐿Litalic_L. Taking advantage of the psycho-visual nature of humans, the colorization problem can be represented as a classification problem where the learning model predicts a discrete class level for an a*b* pair. To formulate the problem, the a*b* color space is divided into bins of a fixed grid size, and each bin is assigned a discrete class level. The formula is given below in Eq. 9.

𝒞=(bi*+βα)Δ+ai*+βα,iNformulae-sequence𝒞subscriptsuperscript𝑏𝑖𝛽𝛼Δsubscriptsuperscript𝑎𝑖𝛽𝛼for-all𝑖𝑁\small\mathcal{C}=\Big{(}\frac{b^{*}_{i}+\beta}{\alpha}\Big{)}\cdot\Delta+% \frac{a^{*}_{i}+\beta}{\alpha},\forall i\in Ncaligraphic_C = ( divide start_ARG italic_b start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_β end_ARG start_ARG italic_α end_ARG ) ⋅ roman_Δ + divide start_ARG italic_a start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_β end_ARG start_ARG italic_α end_ARG , ∀ italic_i ∈ italic_N (9)

where a* and b* are the continuous color channels, 𝒞𝒞\mathcal{C}caligraphic_C is the discrete color class, α2superscript𝛼2\alpha^{2}italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT is the area of a bin, β𝛽\betaitalic_β is a shifting constant that shifts a*b* color values into the positive quadrant, ΔΔ\Deltaroman_Δ is the number of grids in each a* or b* color channel, N𝑁Nitalic_N is the total number of pixels.
Color Class to Visual Color Mapping We need to extract a*b* pairs from the predicted color classes, 𝒞𝒞\mathcal{C}caligraphic_Cs, generated by the learning model for color image generation. Each bin is assigned by a fixed color class level 𝒞𝒞\mathcal{C}caligraphic_C driven by a*superscript𝑎a^{*}italic_a start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT and b*superscript𝑏b^{*}italic_b start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT. The formulas are given below in Eq. 10, 11, which is the reverse of Eq. 9.

a*=[(𝒞imodδ)α]β+α2,iNformulae-sequencesuperscript𝑎delimited-[]modulosubscript𝒞𝑖𝛿𝛼𝛽𝛼2for-all𝑖𝑁\small a^{*\prime}=[(\mathcal{C}_{i}\mod\delta)\cdot\alpha]-\beta+\frac{\alpha% }{2},\forall i\in Nitalic_a start_POSTSUPERSCRIPT * ′ end_POSTSUPERSCRIPT = [ ( caligraphic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT roman_mod italic_δ ) ⋅ italic_α ] - italic_β + divide start_ARG italic_α end_ARG start_ARG 2 end_ARG , ∀ italic_i ∈ italic_N (10)
b*=[(𝒞i÷δ)×α]β+α2,iNformulae-sequencesuperscript𝑏delimited-[]subscript𝒞𝑖𝛿𝛼𝛽𝛼2for-all𝑖𝑁\small b^{*\prime}=[(\mathcal{C}_{i}\div\delta)\times\alpha]-\beta+\frac{% \alpha}{2},\forall i\in Nitalic_b start_POSTSUPERSCRIPT * ′ end_POSTSUPERSCRIPT = [ ( caligraphic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ÷ italic_δ ) × italic_α ] - italic_β + divide start_ARG italic_α end_ARG start_ARG 2 end_ARG , ∀ italic_i ∈ italic_N (11)

According to the above equations, the maximum loss for each a* or b* value is α21𝛼21\frac{\alpha}{2}-1divide start_ARG italic_α end_ARG start_ARG 2 end_ARG - 1. The higher value of α𝛼\alphaitalic_α reduces the number of classes but makes the representation lossy as a large continuous range is converted to a single class. However, handling the problem with the lower class is easy. The lower value of α𝛼\alphaitalic_α increases the number of classes. In the colorization problem, more classes make the prediction less precise. It is important to adjust the number of class levels for a*b*a*b*italic_a * italic_b * color space so that modified a*b*superscript𝑎superscript𝑏a^{*}b^{*}italic_a start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT italic_b start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT can describe the image’s color nature.
Color Class reduction Based on Practical Appearance

Refer to caption
Figure 6: Real-time appearance of Color classes

The a* and b* color values are continuous within [128,127]128127[-128,127][ - 128 , 127 ] in the a*b* color channel. But in practice, the range is found within [100,99]10099[-100,99][ - 100 , 99 ]. We first transform the continuous [100,99]10099[-100,99][ - 100 , 99 ] ranged a*b* color channel to a single plane of 400 color classes by taking α=10𝛼10\alpha=10italic_α = 10, β=100𝛽100\beta=100italic_β = 100 and Δ=20Δ20\Delta=20roman_Δ = 20 in Equation 9. A 2D grid of bins as 20×20202020\times 2020 × 20 single plane array is then formed where horizontal axes indicate a* and vertical axes indicate b* color information. Each coordinate is assigned a class value. The class matrix is shown in Fig. 4. Fig. 5 shows the proposed color class to visual color. Fig. 5 shows that the whole image is a smooth representation of different colors. The colors of the nearest bins or blocks are almost similar. Color changes gradually, block by block.
The study focuses on the colorization of images using 400 color classes from the Place365 Validation dataset. We extracted the classes of 35040 images. The images were downsampled to 56×56565656\times 5656 × 56 to reduce class samples, resulting in 109885440 class samples with 400 class levels. Each color pixel represents a color class sample with a specific color level. A class level is considered for training samples with a minimum of 500 (0.000455%) class samples. Class samples under 500 with a specific class level are mapped to their nearest-neighbor present class levels using fixed centroid k𝑘kitalic_k-means clustering shown in Eq. 12. The final color bin contains 215 color classes within 400 color classes with more than 500 pixels Which is shown in Fig. 6 and their visual in Fig. 7. Class optimization is a major issue for the colorization model, as less class may make the model more error-free but may make some color visuals outside the bin. To keep rare color values in the predicted distribution, these visuals must be active in the training process.

kmeans(𝒞,μ)=argmini=1k𝒞cμi2𝑘𝑚𝑒𝑎𝑛𝑠𝒞𝜇𝑎𝑟𝑔𝑚𝑖𝑛superscriptsubscript𝑖1𝑘subscript𝒞superscriptnorm𝑐subscript𝜇𝑖2\vspace{-2mm}\small kmeans(\mathcal{C},\mu)=argmin\sum_{i=1}^{k}\sum_{\mathcal% {C}}||c-\mu_{i}||^{2}italic_k italic_m italic_e italic_a italic_n italic_s ( caligraphic_C , italic_μ ) = italic_a italic_r italic_g italic_m italic_i italic_n ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT caligraphic_C end_POSTSUBSCRIPT | | italic_c - italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT (12)

where 𝒞𝒞\mathcal{C}caligraphic_C is the input color class vector, μ𝜇\muitalic_μ is the approved color classes for training, and k𝑘kitalic_k is the number of color classes (215). We can define μ𝜇\muitalic_μ as the fixed value centroid. The iteration will happen a single time, and the centroid value will be unchanged.

Network Architecture We build our model based on an encoder-decoder architecture. We use DenseNet[54] for the encoder part of our feature extractor. The DenseNet is a high-level feature extractor suitable for good color value generation. For the decoder part, we use conventional CNN. The Network architecture of our proposed method is shown in Fig. 8.

Refer to caption
Figure 7: Visualization of Real-time appeared of Color classes
Refer to caption
Figure 8: Color classified Colorization
  • Feature Extraction DenseNet’s robust connections minimize gradient vanishing and semantic information loss during feature extraction. It concatenates output from each layer, adapts to grayscale input by changing the first convolutional layer, and discards the final linear layer to build a H32×W32×1024𝐻32𝑊321024\frac{H}{32}\times\frac{W}{32}\times 1024divide start_ARG italic_H end_ARG start_ARG 32 end_ARG × divide start_ARG italic_W end_ARG start_ARG 32 end_ARG × 1024 feature representation.

  • Colorization Network The network employs several convolutional and up-sampling layers after receiving an input of a H32×W32×1024𝐻32𝑊321024\frac{H}{32}\times\frac{W}{32}\times 1024divide start_ARG italic_H end_ARG start_ARG 32 end_ARG × divide start_ARG italic_W end_ARG start_ARG 32 end_ARG × 1024 feature representation. The fundamental nearest-neighbor method is what we employ for up-sampling. The 56×56×215565621556\times 56\times 21556 × 56 × 215 color class distribution is what the network outputs.

Loss Calculation Colorization is generally considered a regression problem as the color values are continuous. But we transform the continuous color values into the discrete color classes. So, we consider the problem a classification problem and use cross-entropy loss instead of MSE or other regression loss. The loss function is shown in Eq. 13.

LossCE=H,W𝒲c𝒞𝒦.log(𝒦¯).formulae-sequence𝐿𝑜𝑠subscript𝑠𝐶𝐸subscript𝐻𝑊subscript𝒲𝑐subscript𝒞𝒦𝑙𝑜𝑔¯𝒦\small Loss_{CE}=-\sum_{H,W}\mathcal{W}_{c}\sum_{\mathcal{C}}\mathcal{K}.log(% \overline{\mathcal{K}}).italic_L italic_o italic_s italic_s start_POSTSUBSCRIPT italic_C italic_E end_POSTSUBSCRIPT = - ∑ start_POSTSUBSCRIPT italic_H , italic_W end_POSTSUBSCRIPT caligraphic_W start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT caligraphic_C end_POSTSUBSCRIPT caligraphic_K . italic_l italic_o italic_g ( over¯ start_ARG caligraphic_K end_ARG ) . (13)

Where H𝐻Hitalic_H and W𝑊Witalic_W are the height and width of output 𝒦𝒦\mathcal{K}caligraphic_K distribution. κ𝜅\kappaitalic_κ is the true color class and 𝒦¯¯𝒦\overline{\mathcal{K}}over¯ start_ARG caligraphic_K end_ARG is the estimated color class. The 𝒲csubscript𝒲𝑐\mathcal{W}_{c}caligraphic_W start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT is the weights vector of color classes. The 𝒲csubscript𝒲𝑐\mathcal{W}_{c}caligraphic_W start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT is defined as follows in Eq. 14.

𝒲c=(1n𝒞),c𝒞formulae-sequencesubscript𝒲𝑐1subscript𝑛𝒞subscriptfor-all𝑐𝒞\small\mathcal{W}_{c}=\Big{(}\frac{1}{n_{\mathcal{C}}}\Big{)},\forall_{c}\in% \mathcal{C}caligraphic_W start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT = ( divide start_ARG 1 end_ARG start_ARG italic_n start_POSTSUBSCRIPT caligraphic_C end_POSTSUBSCRIPT end_ARG ) , ∀ start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ∈ caligraphic_C (14)

where nC=215=subscript𝑛𝐶215absentn_{C}=215=italic_n start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT = 215 = the number of color classes.
Class Confusion Based Weights Adjustment In realistic images, not all color classes are represented equally. Grayish visual color classes are found in a much larger proportion than bright color classes due to the large background areas. In the categorical cross-entropy loss, each true class gets 1N1𝑁\frac{1}{N}divide start_ARG 1 end_ARG start_ARG italic_N end_ARG weight during loss calculation, which is shown in Eq. 14. As the minor color classes are far smaller in the count values, the gradients disappear gradually iteration by iteration. To keep the rarely appearing color classes, we increase the weights of the rarely appearing color classes more than the mostly appearing color classes. However, this process increases the global loss. Therefore, the weights must be trade-offs to ensure plausible colors and a minimum loss. To trade off the weights, we proposed a new formula, which is given in Eq. 15.

𝒲new=(maxc𝒞(Nc)NcΥ+maxc𝒞(Nc)Φ),c𝒞formulae-sequencesubscript𝒲𝑛𝑒𝑤𝑐𝒞subscript𝑁𝑐subscript𝑁𝑐Υ𝑐𝒞subscript𝑁𝑐Φfor-all𝑐𝒞\small\mathcal{W}_{new}=\Bigg{(}\frac{\underset{c\in\mathcal{C}}{\max}(N_{c})}% {N_{c}\cdot\Upsilon+\underset{c\in\mathcal{C}}{\max}(N_{c})\cdot\Phi}\Bigg{)},% \forall c\in\mathcal{C}\vspace{-1mm}caligraphic_W start_POSTSUBSCRIPT italic_n italic_e italic_w end_POSTSUBSCRIPT = ( divide start_ARG start_UNDERACCENT italic_c ∈ caligraphic_C end_UNDERACCENT start_ARG roman_max end_ARG ( italic_N start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ) end_ARG start_ARG italic_N start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ⋅ roman_Υ + start_UNDERACCENT italic_c ∈ caligraphic_C end_UNDERACCENT start_ARG roman_max end_ARG ( italic_N start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ) ⋅ roman_Φ end_ARG ) , ∀ italic_c ∈ caligraphic_C (15)

where 𝒞𝒞\mathcal{C}caligraphic_C is the color classes of a particular batch, max(Nc)subscript𝑁𝑐{\max}(N_{c})roman_max ( italic_N start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ) is the maximum appearance value of a class, Ncsubscript𝑁𝑐N_{c}italic_N start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT is the appearance value of class c𝑐citalic_c, 𝒲newsubscript𝒲𝑛𝑒𝑤\mathcal{W}_{new}caligraphic_W start_POSTSUBSCRIPT italic_n italic_e italic_w end_POSTSUBSCRIPT is the new weights matrix of the particular batch, ΥΥ\Upsilonroman_Υ and ΦΦ\Phiroman_Φ is the trade-off factor where ΥΥ\Upsilonroman_Υ can ranges (0,1] and Φ=1nCΦ1subscript𝑛𝐶\Phi=\frac{1}{n_{C}}roman_Φ = divide start_ARG 1 end_ARG start_ARG italic_n start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT end_ARG .
We initially normalize weights by dividing the count of the maximum appeared class in a batch by the total count of each 215 classes individually, ensuring the weight of the maximum class is set to 1 and proportionally up-scaling others. However, this approach leads to a significant increase in the weight of classes appearing very infrequently. To strike a balance, we introduce a trade-off mechanism. This involves adding a term, (max(Nc)Φsubscript𝑁𝑐Φ{\max}(N_{c})\cdot\Phiroman_max ( italic_N start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ) ⋅ roman_Φ), to the product of the individual class count (Ncsubscript𝑁𝑐N_{c}italic_N start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT) and a trade-off factor (ΥΥ\Upsilonroman_Υ). This supplementary term helps control the influence of rare class occurrences, providing a more nuanced and balanced approach to class weight determination. Therefore, the loss function is now modified, as shown in Eq. 16.

LossCBCE=H,W𝒲newC𝒦.log(𝒦¯).formulae-sequence𝐿𝑜𝑠subscript𝑠𝐶𝐵𝐶𝐸subscript𝐻𝑊subscript𝒲𝑛𝑒𝑤subscript𝐶𝒦𝑙𝑜𝑔¯𝒦\small Loss_{CBCE}=-\sum_{H,W}\mathcal{W}_{new}\sum_{C}\mathcal{K}.log(% \overline{\mathcal{K}}).\vspace{-1mm}italic_L italic_o italic_s italic_s start_POSTSUBSCRIPT italic_C italic_B italic_C italic_E end_POSTSUBSCRIPT = - ∑ start_POSTSUBSCRIPT italic_H , italic_W end_POSTSUBSCRIPT caligraphic_W start_POSTSUBSCRIPT italic_n italic_e italic_w end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT caligraphic_K . italic_l italic_o italic_g ( over¯ start_ARG caligraphic_K end_ARG ) . (16)

Color Class Estimation The network outputs H×W×C×batch𝐻𝑊𝐶𝑏𝑎𝑡𝑐H\times W\times C\times batchitalic_H × italic_W × italic_C × italic_b italic_a italic_t italic_c italic_h tensor. Using a softmax probability distribution, we extract H×W×1×batch𝐻𝑊1𝑏𝑎𝑡𝑐H\times W\times 1\times batchitalic_H × italic_W × 1 × italic_b italic_a italic_t italic_c italic_h class representation.

𝒦¯=σ(𝒦)=e𝒦ij=1ke𝒦j¯𝒦𝜎𝒦superscript𝑒subscript𝒦𝑖superscriptsubscript𝑗1𝑘superscript𝑒subscript𝒦𝑗\small\overline{\mathcal{K}}=\sigma(\mathcal{K})=\frac{e^{\mathcal{K}_{i}}}{% \sum_{j=1}^{k}{e^{\mathcal{K}_{j}}}}\vspace{-1mm}over¯ start_ARG caligraphic_K end_ARG = italic_σ ( caligraphic_K ) = divide start_ARG italic_e start_POSTSUPERSCRIPT caligraphic_K start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT italic_e start_POSTSUPERSCRIPT caligraphic_K start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG (17)
Algorithm 1 SAM Empowered Object-Selective Color Harmonization
0:  Input gray image, Predicted a*b*superscript𝑎superscript𝑏a^{*}b^{*}italic_a start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT italic_b start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT
0:  Edge harmonized a*b*superscript𝑎superscript𝑏a^{*}b^{*}italic_a start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT italic_b start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT
  Extract segments (𝒮𝒮\mathcal{S}caligraphic_S) from the gray image using SAM
  for each segment s𝑠sitalic_s in 𝒮𝒮\mathcal{S}caligraphic_S do
     if the number of pixels in s>Ψ𝑠Ψs>\Psiitalic_s > roman_Ψ then
        Extract 𝒮asuperscript𝒮𝑎\mathcal{S}^{a}caligraphic_S start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT from a*superscript𝑎a^{*}italic_a start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT using coordinates of s𝑠sitalic_s
        Calculate mode value (asubscript𝑎\mathcal{M}_{a}caligraphic_M start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT) of 𝒮asuperscript𝒮𝑎\mathcal{S}^{a}caligraphic_S start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT
        for each pixel (𝒫𝒫\mathcal{P}caligraphic_P) in 𝒮asuperscript𝒮𝑎\mathcal{S}^{a}caligraphic_S start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT do
           if |𝒫a|>δa𝒫subscript𝑎subscript𝛿𝑎|\mathcal{P}-\mathcal{M}_{a}|>\delta_{a}| caligraphic_P - caligraphic_M start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT | > italic_δ start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT then
              Replace the value of 𝒫𝒫\mathcal{P}caligraphic_P with asubscript𝑎\mathcal{M}_{a}caligraphic_M start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT
           end if
        end for
        Extract 𝒮bsuperscript𝒮𝑏\mathcal{S}^{b}caligraphic_S start_POSTSUPERSCRIPT italic_b end_POSTSUPERSCRIPT from b*superscript𝑏b^{*}italic_b start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT using coordinates of s𝑠sitalic_s
        Calculate mode value (bsubscript𝑏\mathcal{M}_{b}caligraphic_M start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT) of 𝒮bsuperscript𝒮𝑏\mathcal{S}^{b}caligraphic_S start_POSTSUPERSCRIPT italic_b end_POSTSUPERSCRIPT
        for each pixel (𝒫𝒫\mathcal{P}caligraphic_P) in 𝒮bsuperscript𝒮𝑏\mathcal{S}^{b}caligraphic_S start_POSTSUPERSCRIPT italic_b end_POSTSUPERSCRIPT do
           if |𝒫b|>δb𝒫subscript𝑏subscript𝛿𝑏|\mathcal{P}-\mathcal{M}_{b}|>\delta_{b}| caligraphic_P - caligraphic_M start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT | > italic_δ start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT then
              Replace the value of 𝒫𝒫\mathcal{P}caligraphic_P with bsubscript𝑏\mathcal{M}_{b}caligraphic_M start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT
           end if
        end for
     end if
  end for
  return  Edge harmonized a*b*superscript𝑎superscript𝑏a^{*}b^{*}italic_a start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT italic_b start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT

III-D Chromatic Diversity

We propose a novel color image evaluation metric named Chromatic Number Ratio (CNR). The CNR quantifies the richness of color classes within the generated images compared to the ground truth images. It offers a comprehensive measure of the spectrum of colors in the generated images, enhancing our understanding of color diversity. The metric is shown in Eq. 18.

CNR=i=0m1j=0n1(1k=0i1l=0n1[𝒫i,j=𝒫k,l])i=0m1j=0n1(1k=0i1l=0n1[𝒢i,j=𝒢k,l])𝐶𝑁𝑅superscriptsubscript𝑖0𝑚1superscriptsubscript𝑗0𝑛11superscriptsubscript𝑘0𝑖1superscriptsubscript𝑙0𝑛1delimited-[]subscript𝒫𝑖𝑗subscript𝒫𝑘𝑙superscriptsubscript𝑖0𝑚1superscriptsubscript𝑗0𝑛11superscriptsubscript𝑘0𝑖1superscriptsubscript𝑙0𝑛1delimited-[]subscript𝒢𝑖𝑗subscript𝒢𝑘𝑙\small CNR=\frac{\sum\limits_{i=0}^{m-1}\sum\limits_{j=0}^{n-1}\left(1-\sum% \limits_{k=0}^{i-1}\sum\limits_{l=0}^{n-1}[\mathcal{P}_{i,j}=\mathcal{P}_{k,l}% ]\right)}{\sum\limits_{i=0}^{m-1}\sum\limits_{j=0}^{n-1}\left(1-\sum\limits_{k% =0}^{i-1}\sum\limits_{l=0}^{n-1}[\mathcal{G}_{i,j}=\mathcal{G}_{k,l}]\right)}% \vspace{-1mm}italic_C italic_N italic_R = divide start_ARG ∑ start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m - 1 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_j = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n - 1 end_POSTSUPERSCRIPT ( 1 - ∑ start_POSTSUBSCRIPT italic_k = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_l = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n - 1 end_POSTSUPERSCRIPT [ caligraphic_P start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT = caligraphic_P start_POSTSUBSCRIPT italic_k , italic_l end_POSTSUBSCRIPT ] ) end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m - 1 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_j = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n - 1 end_POSTSUPERSCRIPT ( 1 - ∑ start_POSTSUBSCRIPT italic_k = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_l = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n - 1 end_POSTSUPERSCRIPT [ caligraphic_G start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT = caligraphic_G start_POSTSUBSCRIPT italic_k , italic_l end_POSTSUBSCRIPT ] ) end_ARG (18)

where, 𝒫i,jsubscript𝒫𝑖𝑗\mathcal{P}_{i,j}caligraphic_P start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT and 𝒢i,jsubscript𝒢𝑖𝑗\mathcal{G}_{i,j}caligraphic_G start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT is the color class value at row i𝑖iitalic_i and column j𝑗jitalic_j of the generated color image 𝒫𝒫\mathcal{P}caligraphic_P and Ground truth image 𝒢𝒢\mathcal{G}caligraphic_G. m𝑚mitalic_m and n𝑛nitalic_n are the image’s dimensions in the color class space. The outer summation iterates through all rows (i𝑖iitalic_i) and columns (j𝑗jitalic_j) of the image in color class space. The inner summation compares each pixel (𝒫i,jsubscript𝒫𝑖𝑗\mathcal{P}_{i,j}caligraphic_P start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT / 𝒢i,jsubscript𝒢𝑖𝑗\mathcal{G}_{i,j}caligraphic_G start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT) with all previous pixels in the image to check for uniqueness. [𝒫i,j=𝒫k,l]delimited-[]subscript𝒫𝑖𝑗subscript𝒫𝑘𝑙[\mathcal{P}_{i,j}=\mathcal{P}_{k,l}][ caligraphic_P start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT = caligraphic_P start_POSTSUBSCRIPT italic_k , italic_l end_POSTSUBSCRIPT ] and [𝒫i,j=𝒫k,l]delimited-[]subscript𝒫𝑖𝑗subscript𝒫𝑘𝑙[\mathcal{P}_{i,j}=\mathcal{P}_{k,l}][ caligraphic_P start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT = caligraphic_P start_POSTSUBSCRIPT italic_k , italic_l end_POSTSUBSCRIPT ] is an indicator function that returns 1 if the condition is true (if pixel values are equal) and 0 if it’s false.

Through the CNR, we have tried to show how many different color components are picked in the color images. We initially set color classes to 400. But in the training set, majorly appeared color classes dominate the minor classes. That’s why the model overlooks the minor class in the prediction. We aim to ensure the minor classes are also in the predicted distribution while maintaining the other measurement criteria satisfactory. The CNR value 1 indicates the number of different color classes is the same. The predicted distribution can also pick more minor color classes making the visual more plausible and the CNR value greater than 1.

Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption

Gray    DeOldify[43] Iizuka[24] Larsson[20]   CIC[30]    Zhang[31]    Su[25]    Gain[22]    DD[55]       CCC       Ground True

Figure 9: Some results of our proposed method compared to other state-of-the-art methods.

IV SAM Empowered Object-Selective Color Harmonization

As we force regularization of the minor class, there is sometimes a little noise at the object’s edge. To make the edge more polished, we proposed SAM-empowered object-selective color harmonization. The SAM is a segmentation model with zero-shot generalization to unfamiliar objects and images without additional training[57]. Our proposed algorithm is shown in Algorithm 1.

V Experiment

Datasets We train the proposed model using the Place365 Train dataset[40]. Our model is developed in a self-supervised manner. We provide no external label for our data during the train. Instead, we generate the model’s supervisory signals or labels from the input data during training. For testing, we use multiple datasets. We use Place365 Test dataset[40]. The dataset has 328.5k images with 365 scene categories. Besides, we take randomly 50 images from ImageNet1k Validation[47], Oxford 102 Flower[48], CelebFaces(CelebA)[49], and COCO[56] datasets.

TABLE II: Different Hyper-parameter and trade-off factor values for CCC
H. P. α𝛼\alphaitalic_α β𝛽\betaitalic_β ΔΔ\Deltaroman_Δ ΥΥ\Upsilonroman_Υ ΦΦ\Phiroman_Φ  δasubscript𝛿𝑎\delta_{a}italic_δ start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT δbsubscript𝛿𝑏\delta_{b}italic_δ start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ΨΨ\Psiroman_Ψ
Value 10 100 20 0.5 .0046 8 8 500

Implementation Set Up The experiments were conducted on a workstation with an NVIDIA GEFORCE RTX 2080 Ti graphics processing unit (GPU). The network was constructed using PyTorch[41] version 1.28 in Python version 3.10.9. During training, the batch size is set to 64, the Adam optimizer is employed with the learning rate 1×1031superscript1031\times 10^{-3}1 × 10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT, and the momentum parameters β𝛽\betaitalic_β1 = 0.5 and β𝛽\betaitalic_β2 = 0.999 are used to update and compute the network parameters. Each ground truth a*b* tensor was resized into 56×56565656\times 5656 × 56 size to reduce the complexity of loss calculations. We systematically explore a spectrum of hyper-parameters and trade-off factors for our proposed model, with their values determined through methodical experimental analysis. The corresponding values are detailed in Tab. II.

TABLE III: Quantitative comparison of our proposed method with the baseline and SOTA methods using the visuals of Fig. 9.

MSE\downarrow

PSNR\uparrow

SSIM\uparrow

LPIPS\downarrow

UIQI\uparrow

FID\downarrow

CNR\uparrow

Deoldify

0.0064

21.71

0.898

0.183

0.872

0.395

0.615

Iizuka

0.0042 22.09

0.885

0.171

0.871

0.361

0.376

Larsson

0.0052

21.51

0.878

0.197

0.879

0.340

0.585

CIC

0.0061

20.82

0.864

0.194

0.864

0.319

0.619

Zhangs

0.0054

21.78

0.878

0.162

0.862

0.303

0.526

Su

0.0071

20.94

0.854

0.233

0.865

0.300

0.648

Gain

0.0059

21.01

0.871

0.210

0.871

0.315

0.691

DD

0.0051

21.68

0.878

0.165

0.869

0.276

0.798

Our

0.0055

21.46

0.878

0.207

0.872

0.288

0.884

Evaluation Metrics We use mean squared error (MSE[39]), peak signal-to-noise ratio (PSNR[42]), structural similarity index measure (SSIM[42]), learned perceptual image patch similarity (LPIPS[45]), universal image quality index (UIQI[44]), frechet inception distance score (FID[46]), and our proposed Chromatic Number Ratio (CNR) to compare our proposed model with the baselines and state-of-the-art (SOTA) colorization methods quantitatively.
Comparison with Baselines and SOTA: We compare our model with eight baselines and SOTA methods: DeOldify[43], Iizuka[24], Larsson[20], CIC[30], Zhang[31], Su[25], Gain[22], and DD[55]. In Fig. 9, we compare eight images visually against those methods with gray and ground truth. The figure shows that our proposed CCC method visually outperforms the baselines and SOTA methods. The proposed CCC method effectively colors the minor objects with the majors. In Tab. III, quantitatively evaluate the images of Fig. 9. From the visual and quantitative analysis, we find Deoldify has the best SSIM, Iizuka has the best MSE and PSNR, Larsson has the best UIQI, Zhang has the best LPIPS, DD has the best FID, and CCC has the best CNR. Visually, the CCC has a more plausible major and minor color combination than the others. Therefore, it is evident that MSE, PSNR, SSIM, LPIPS, and FID criteria are not completely suitable for ensuring the presence of minor colors. At first, ensuring the CNR and then maintaining those criteria may be the best possible solution for the appearance of major and minor colors in the generated images. In Tab. IV, we evaluate our proposed model against seven baselines and SOTA methods across three datasets using regression criteria. The table shows that our method performs well in all datasets and outperforms others in the ‘Oxford Flower’ dataset. Because ‘ADE’ predominantly features natural images, while ‘Celeba’ focuses on human faces, typically presenting a more limited range of color combinations. In contrast, the ‘Oxford Flowers’ dataset is characterized by its diverse array of flower species, each exhibiting a unique and varied color palette. This diversity in coloration within the ‘Oxford Flowers’ dataset provides a more complex and challenging environment for colorization, highlighting the efficacy of our method in handling a wide range of colors and complexities.

TABLE IV: Regression loss comparison of our proposed method with the baseline and SOTA methods using multiple datasets.
ADE Celeba Ox Flower

MSE\downarrow

PSNR\uparrow

MSE\downarrow

PSNR\uparrow

MSE\downarrow

PSNR\uparrow

DeOldify

.0043

25.66

.0045

26.06

.0295

16.46

Iizuka

.0035 26.22

.0045

26.00

.0211

18.01

Larsson

.0037

25.94

.0058

26.66

.0245

16.85

CIC

.0053

24.33

.0056

24.79

.0261

17.16

Zhang

.0036

26.07

.0041 26.78

.0295

16.80

Su

.0038

25.37

.0046

25.70

.0265

16.81

DD

.0039

25.22

.0066

25.70

.0273

16.88

CCC

.0058

24.03

.0061

24.12

.0201 18.06

In Tab. V, we evaluate our proposed model against those methods across three datasets using similarity measurement criteria. The table shows that our method performs well in all datasets while maintaining the minor color structure. Usually, it is easier to achieve good similarity by ignoring the minor color features and focusing only on major ones. However, our proposed method maintains satisfactory similarity while ensuring the minor color features.

TABLE V: Structural Similarity comparison of our proposed method with the baseline and SOTA methods using multiple datasets.
ADE Celeba Ox Flower

SSIM\uparrow

UIQI\uparrow

SSIM\uparrow

UIQI\uparrow

SSIM\uparrow

UIQI\uparrow

DeOldify

0.96 0.96

0.94

0.94 0.82

0.81

Iizuka

0.95

0.96 0.95 0.94

0.80

0.82

Larsson

0.95

0.96

0.94

0.93

0.82 0.83

CIC

0.95

0.95

0.93

0.92

0.81

0.80

Zhang

0.96 0.96 0.95

0.93

0.81

0.81

Su

0.92

0.96

0.93

0.93

0.77

0.81

DD

0.96 0.96

0.93

0.92

0.81

0.80

CCC

0.91

0.95

0.92

0.92

0.80

0.80

In Tab. VI, we evaluate our proposed model against those methods across three datasets using LPIPS and FID criteria. The table shows that our method performs well in all datasets and outperforms others in the ‘Oxford Flower’ dataset. Because ‘Oxford flowers’ have the highest diversity compared to ‘ADE’ and ‘Celeba.’

TABLE VI: Perceptual Image Patch Similarity and frethed image distance comparison of our proposed method with the baseline and SOTA methods using multiple datasets.
ADE Celeba Ox. Flower

LPIPS\downarrow

FID\downarrow

LPIPS\downarrow

FID\downarrow

LPIPS\downarrow

FID\downarrow

DeOldify

0.15

0.48

0.13

0.43

0.35

3.85

Iizuka

0.16

1.05

0.16

0.45

0.31

3.57

Larsson

0.16

0.62

0.14

0.37

0.34

2.42

CIC

0.18

1.31

0.17

0.58

0.35

4.20

Zhang

0.14

1.12

0.13

0.49

0.34

4.72

Su

0.21

1.24

0.18

0.28

0.41

4.51

DD

0.16

0.30

0.16

0.18

0.32

1.54

CCC

0.15

0.90

0.13

0.43

0.30 1.51

In Tab. VII, we evaluate our proposed model against those methods across five datasets using CNR criteria. The table shows that our method outperforms all methods in all datasets. The main objective of our proposed model is to ensure the presence of minor colors along with major. Minor color confirmation makes color images more diverse because an image contains one or two major colors as well as more minor colors.

TABLE VII: CNR comparison of our proposed method with the baseline and SOTA methods using multiple datasets.

ADE

Celeba

COCO

Ox. Flower

ImageNet

DeOldify

0.77

0.62

1.43

0.69

0.61

Iizuka

0.78

0.51

1.49

0.58

0.49

Larsson

0.77

0.64

0.73

0.73

0.63

CIC

0.81

0.86

1.57

0.67

0.58

Zhang

0.73

0.66

1.05

0.66

0.49

Su

0.77

0.80

2.89

0.66

0.66

DD

1.25

1.07

2.23

0.88

0.94

CCC

1.90 1.33 3.53 0.96 1.13

VI Conclusion

Automatic colorization of grayscale photographs with objects of varying colors and sizes is complex due to inter- and intra-object color variations and the limited area occupied by principal items. The learning process often favors dominating features, leading to biased models. A weighted function can address feature imbalance, assigning greater importance to minority features. In this paper, we propose a set of formulas to convert color values into corresponding color classes and vice versa. To achieve optimal performance, we optimize the class levels and establish a trade-off between the weights of major and minor classes, considering both types of classes for accurate class prediction. We also propose SAM-empowered object selective color harmonization that improves the stability of minor classes. We propose a novel color picture assessment measure called Chromatic Number Ratio (CNR) to assess color component richness quantitatively. We evaluated our model against eight baseline and SOTA models using five datasets, and experimental findings show that the proposed model surpasses previous models in terms of visualization and CNR measurement criteria while maintaining satisfactory performance in other regression criteria, MSE, PSNR, similarity criteria SSIM, LPIPS, UIQI, and generative criteria FID.

References

  • [1] Yi-Chin Huang, Yi-Shin Tung, Jun-Cheng Chen, Sung-Wen Wang, and Ja-Ling Wu. An adaptive edge detection based colorization algorithm and its applications. In ICM, 2005.
  • [2] Anat Levin, Dani Lischinski, and Yair Weiss. Colorization using optimization. In SIGGRAPH. pages 689-694, 2004.
  • [3] Liron Yatziv and Guillermo Sapiro. Fast image and video colorization using chrominance blending. IEEE Transactions on Image Processing, volume 15, number 5, pages 1120-1129, 2006.
  • [4] Yingge Qu, Tien-Tsin Wong, Pheng-Ann Heng. Manga colorization. ACM Transactions on Graphics (ToG). volume 25, number 3, pages 1214-1220, 2006.
  • [5] Qing Luan, Fang Wen, Daniel Cohen-Or, Lin Liang, Ying-Qing Xu, Heung-Yeung Shum. Natural image colorization. In ECRT. pages 309-320, 2007.
  • [6] Tomihisa Welsh, Michael Ashikhmin, Klaus Mueller. Transferring color to greyscale images. In ACCGIT. pages 277-280, 2002.
  • [7] Revital Ironi, Daniel Cohen-Or, Dani Lischinski. Colorization by Example. Rendering Techniques, volume 29, pages 201-210, 2005.
  • [8] Yu-Wing Tai, Jiaya Jia, Chi-Keung Tang. Local color transfer via probabilistic segmentation by expectation-maximization. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), volume 1, pages 747-754, 2005.
  • [9] Alex Yong-Sang Chia, Shaojie Zhuo, Raj Kumar Gupta, Yu-Wing Tai, Siu-Yeung Cho, Ping Tan, Stephen Lin. Semantic colorization with internet images. ACM Transactions on Graphics (ToG), volume 30, number 6, pages 1-8, 2011.
  • [10] Xiaopei Liu, Liang Wan, Yingge Qu, Tien-Tsin Wong, Stephen Lin, Chi-Sing Leung, Pheng-Ann Heng. Intrinsic colorization. In SIGGRAPH, pages 1-9, 2008.
  • [11] Austin Sousa, Rasoul Kabirzadeh, Patrick Blaes. Automatic colorization of grayscale images. Department of Electrical Engineering, Stanford University, 2013.
  • [12] Mingming He, Dongdong Chen, Jing Liao, Pedro V Sander, Lu Yuan. Deep exemplar-based colorization. ACM Transactions on Graphics (TOG), volume 37, number 4, pages 1-16, 2018.
  • [13] Di Wu, Jianhou Gan, Juxiang Zhou, Jun Wang, Wei Gao. Fine-grained semantic ethnic costume high-resolution image colorization with conditional GAN. International Journal of Intelligent Systems, volume 37, number 5, pages 2952-2968, 2022.
  • [14] Yanze Wu, Xintao Wang, Yu Li, Honglun Zhang, Xun Zhao, Ying Shan. Towards vivid and diverse image colorization with generative color prior. In CVPR, pages 14377-14386, 2021.
  • [15] Min Wu, Xin Jin, Qian Jiang, Shin-jye Lee, Wentao Liang, Guo Lin, Shaowen Yao. Remote sensing image colorization using symmetrical multi-scale DCGAN in YUV color space. The Visual Computer, volume 37, pages 1707-1729, 2021.
  • [16] Haojie Guo, Zhe Guo, Zhaojun Pan, Xuewen Liu. Bilateral Res-Unet for Image Colorization with Limited Data via GANs. In ICTAI, pages 729-735, 2021.
  • [17] Hyojin Bahng, Seungjoo Yoo, Wonwoong Cho, David Keetae Park, Ziming Wu, Xiaojuan Ma, Jaegul Choo. Coloring with words: Guiding image colorization through text-based palette generation. In ECCV, pages 431-447, 2018.
  • [18] Yihuai Liang, Dongho Lee, Yan Li, Byeong-Seok Shin. Unpaired medical image colorization using generative adversarial network. Multimedia Tools and Applications. volume 81, number 19, pages 26669-26683, 2022.
  • [19] Shengrui Zang, Min Chen, Zhenhua Ai, Jieru Chi, Guowei Yang, Chenglizhao Chen, Teng Yu. Texture-aware gray-scale image colorization using a bistream generative adversarial network with multi scale attention structure. Engineering Applications of Artificial Intelligence, volume 122, pages 106094, 2023.
  • [20] Gustav Larsson, Michael Maire, Gregory Shakhnarovich. Learning representations for automatic colorization. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part IV 14, pages 577-593, 2016.
  • [21] Mrityunjoy Gain, Md Arifur Rahman, Rameswar Debnath, Mrim M Alnfiai, Abdullah Sheikh, Mehedi Masud, Anupam Kumar Bairagi. An Improved Encoder-Decoder CNN with Region-Based Filtering for Vibrant Colorization. Comput. Syst. Sci. Eng.. volume 46, number 1, pages 1059-1077, 2023.
  • [22] Mrityunjoy Gain, Rameswar Debnath. A Novel Unbiased Deep Learning Approach (DL-Net) in Feature Space for Converting Gray to Color Image. IEEE Access. volume 11, pages 78918-78933, 2023.
  • [23] Jiancheng An, Kpeyiton Koffi Gagnon, Qingnan Shi, Hongwei Xie, Rui Cao. Image colorization with convolutional neural networks. In CISP-BMEI. pages 1-4, 2019.
  • [24] Satoshi Iizuka, Edgar Simo-Serra, Hiroshi Ishikawa. Let there be color! Joint end-to-end learning of global and local image priors for automatic image colorization with simultaneous classification. ACM Transactions on Graphics (ToG). volume 35, number 4, pages 1-11, 2016.
  • [25] Jheng-Wei Su, Hung-Kuo Chu, Jia-Bin Huang. Instance-aware image colorization. In CVPR. pages 7968-7977, 2020.
  • [26] K Ruwani M Fernando, Chris P Tsokos. Dynamically weighted balanced loss: class imbalanced learning and confidence calibration of deep neural networks. IEEE Transactions on Neural Networks and Learning Systems. volume 33, number 7, pages 2940-2951, 2021.
  • [27] Byron C Wallace, Issa J Dahabreh. Improving class probability estimates for imbalanced data. Knowledge and Information Systems. volume 41, number 1, pages 33-52, 2014.
  • [28] Ryan Dahl. Automatic colorization. Automatic colorization, 2016. [Online]. Available: https://tinyclouds.org/colorize.
  • [29] Federico Baldassarre, Diego González Morín, Lucas Rodés-Guirao. Deep koalarization: Image colorization using CNNs and Inception-ResNet-v2. arXiv preprint arXiv:1712.03400. 2017.
  • [30] Richard Zhang, Phillip Isola, Alexei A Efros. Colorful image colorization. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part III 14, pages 649-666, 2016.
  • [31] Richard Zhang, Jun-Yan Zhu, Phillip Isola, Xinyang Geng, Angela S Lin, Tianhe Yu, Alexei A Efros. Real-time user-guided image colorization with learned deep priors. ACM Transactions on Graphics (TOG). volume 36, number 4, pages 1-11, 2017.
  • [32] Lili Xia, Zhiyong Qu, Jianpeng An, Zhongke Gao. A Weakly Supervised Method with Colorization for Nuclei Segmentation using Point Annotations. IEEE Transactions on Instrumentation and Measurement, 2023.
  • [33] Mennatullah Hesham, Heba Khaled, Hossam Faheem. Image colorization using Scaled-YOLOv4 detector. International Journal of Intelligent Computing and Information Sciences, volume 21, number 3, pages 107-118, 2021.
  • [34] Gokhan Ozbulak. Image colorization by capsule networks. In CVPR Workshops, pages 0-0, 2019.
  • [35] Guangqian Kong, Huan Tian, Xun Duan, Huiyun Long. Adversarial edge-aware image colorization with semantic segmentation. IEEE Access. volume 9, pages 28194-28203, 2021.
  • [36] Guillaume Charpiat, Matthias Hofmann, Bernhard Schölkopf. Automatic image colorization via multimodal predictions. In Computer Vision–ECCV 2008: 10th European Conference on Computer Vision, Marseille, France, October 12-18, 2008, Proceedings, Part III 10, pages 126-139, 2008.
  • [37] Sandra Treneska, Eftim Zdravevski, Ivan Miguel Pires, Petre Lameski, Sonja Gievska. Gan-based image colorization for self-supervised visual feature learning. Sensors. volume 22, number 4, pages 1599, 2022.
  • [38] Alan R. Robertson. The CIE 1976 color-difference formulae. Color Research & Application. volume 2, number 1, pages 7-11, 1977.
  • [39] Zhou Wang, Alan C. Bovik. Mean squared error: Love it or leave it? A new look at signal fidelity measures. IEEE Signal Processing Magazine, volume 26, number 1, pages 98-117, 2009.
  • [40] Bolei Zhou, Agata Lapedriza, Aditya Khosla, Aude Oliva, Antonio Torralba. Places: A 10 million image database for scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence. volume 40, number 6, pages 1452-1464, 2017.
  • [41] Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, and others. PyTorch: An imperative style, high-performance deep learning library. Advances in Neural Information Processing Systems. volume 32, 2019.
  • [42] Alain Hore, Djemel Ziou. Image quality metrics: PSNR vs. SSIM. In ICPR, pages 2366-2369, 2010.
  • [43] J. Antic, “A deep learning based project for colorizing and restoring old images (and video!),” 2018. [Online]. Available: https://github.com/jantic/DeOldify.
  • [44] Zhou Wang, Alan C. Bovik. A universal image quality index. IEEE Signal Processing Letters. volume 9, number 3, pages 81-84, 2002.
  • [45] Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, Oliver Wang. The unreasonable effectiveness of deep features as a perceptual metric. In CVPR, pages 586-595, 2018.
  • [46] Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, Sepp Hochreiter. GANs trained by a two time-scale update rule converge to a local Nash equilibrium. Advances in Neural Information Processing Systems, volume 30, 2017.
  • [47] Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, Li Fei-Fei. ImageNet: A large-scale hierarchical image database. In CVPR. pages 248-255, 2009.
  • [48] Maria-Elena Nilsback, Andrew Zisserman. Automated flower classification over a large number of classes. In 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing, pages 722-729, 2008.
  • [49] Ziwei Liu, Ping Luo, Xiaogang Wang, Xiaoou Tang. Deep Learning Face Attributes in the Wild. In ICCV, December 2015.
  • [50] Jooyeol Yun, Sanghyeon Lee, Minho Park, Jaegul Choo. iColoriT: Towards Propagating Local Hints to the Right Region in Interactive Colorization by Leveraging Vision Transformer. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 1787-1796, 2023.
  • [51] Raj Kumar Gupta, Alex Yong-Sang Chia, Deepu Rajan, Ee Sin Ng, Zhiyong Huang. Image colorization using similar images. In ICM. pages 369-378, 2012.
  • [52] Aurélie Bugeau, Vinh-Thong Ta, Nicolas Papadakis. Variational exemplar-based image colorization. IEEE Transactions on Image Processing. volume 23, number 1, pages 298-307, 2013.
  • [53] Tram-Tran Nguyen-Quynh, Soo-Hyung Kim, Nhu-Tai Do. Image colorization using the global scene-context style and pixel-wise semantic segmentation. IEEE Access. volume 8, pages 214098-214114, 2020.
  • [54] Gao Huang, Zhuang Liu, Laurens Van Der Maaten, Kilian Q Weinberger. Densely connected convolutional networks. In CVPR. pages 4700-4708, 2017.
  • [55] Xiaoyang Kang, Tao Yang, Wenqi Ouyang, Peiran Ren, Lingzhi Li, Xuansong Xie. DDColor: Towards Photo-Realistic Image Colorization via Dual Decoders. In ICCV. pages 328-338, 2023.
  • [56] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, C Lawrence Zitnick. Microsoft coco: Common objects in context. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V, pages 740-755, 2014.
  • [57] Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C. Berg, Wan-Yen Lo, Piotr Dollar, Ross Girshick. Segment Anything. In ICCV, pages 4015-4026, October 2023.