Publications | Vision Lab

2026

Neurocomputing

Reference-Guided Transformer for Face Super-Resolution

Min-Yeong Kim, Seung-Wook Kim, and Keunsoo Koh

Neurocomputing, 2026

@article{reffsr,
  author = {Kim, Min-Yeong and Kim, Seung-Wook and Koh, Keunsoo},
  title = {Reference-Guided Transformer for Face Super-Resolution},
  journal = {Neurocomputing},
  year = {2026},
}

2025

ICCV
Fedwsq: Efficient federated learning with weight standardization and distribution-aware non-uniform quantization

Seung-Wook Kim, Seongyeol Kim, Jiah Kim, Seowon Ji, and Se-Ho Lee

In Proc. IEEE/CVF International Conference on Computer Vision (ICCV), 2025

Abs Bib PDF Code

Federated learning (FL) often suffers from performance degradation due to key challenges such as data heterogeneity and communication constraints. To address these limitations, we present a novel FL framework called FedWSQ, which integrates weight standardization (WS) and the proposed distribution-aware non-uniform quantization (DANUQ). WS enhances FL performance by filtering out biased components in local updates during training, thereby improving the robustness of the model against data heterogeneity and unstable client participation. In addition, DANUQ minimizes quantization errors by leveraging the statistical properties of local model updates. As a result, FedWSQ significantly reduces communication overhead while maintaining superior model accuracy. Extensive experiments on FL benchmark datasets demonstrate that FedWSQ consistently outperforms existing FL methods across various challenging FL settings, including extreme data heterogeneity and ultra-low-bit communication scenarios.
@inproceedings{fedwsq, author = {Kim, Seung-Wook and Kim, Seongyeol and Kim, Jiah and Ji, Seowon and Lee, Se-Ho}, title = {Fedwsq: Efficient federated learning with weight standardization and distribution-aware non-uniform quantization}, year = {2025}, address = {Honolulu, USA}, url = {https://arxiv.org/abs/2506.23516}, booktitle = {Proc. IEEE/CVF International Conference on Computer Vision (ICCV)}, }
TMM
Image enhancement based on pigment representation

Se-Ho Lee, Keunsoo Koh, and Seung-Wook Kim

IEEE Transactions on Multimedia, 2025

Abs Bib PDF

This paper presents a novel and efficient image enhancement method based on pigment representation. Unlike conventional methods where the color transformation is restricted to pre-defined color spaces like RGB, our method dynamically adapts to input content by transforming RGB colors into a high-dimensional feature space referred to as \textitpigments. The proposed pigment representation offers adaptability and expressiveness, achieving superior image enhancement performance. The proposed method involves transforming input RGB colors into high-dimensional pigments, which are then reprojected individually and blended to refine and aggregate the information of the colors in pigment spaces. Those pigments are then transformed back into RGB colors to generate an enhanced output image. The transformation and reprojection parameters are derived from the visual encoder which adaptively estimates such parameters based on the content in the input image. Extensive experimental results demonstrate the superior performance of the proposed method over state-of-the-art methods in image enhancement tasks, including image retouching and tone mapping, while maintaining relatively low computational complexity and small model size.
@article{pigment, author = {Lee, Se-Ho and Koh, Keunsoo and Kim, Seung-Wook}, title = {Image enhancement based on pigment representation}, journal = {IEEE Transactions on Multimedia}, year = {2025}, }
SPL
Adaptive video demoiréing network with subtraction-guided alignment

Seung-Hun Ok, Young-Min Choi, Seung-Wook Kim^*, and Se-Ho Lee

IEEE Signal Processing Letters, 2025

Abs Bib PDF

In this letter, we propose an adaptive video demoiréing network (AVDNet), which dynamically suppresses moiré patterns in video environments by leveraging both the spectral and temporal characteristics of moiré artifacts. It consists of two key modules: the adaptive bandpass block (ABB) and the subtraction-guided alignment block (SGAB). ABB performs frame-wise demoiréing in the implicit frequency domain using an adaptive bandpass filter that modulates its response to match the moiré spectral characteristics of each frame. SGAB exploits subtraction maps between adjacent frames to guide alignment and suppress the temporal propagation of moiré artifacts. Experimental results demonstrate that AVDNet outperforms state-of-the-art methods quantitatively and qualitatively while maintaining a compact model size and low computational cost.
@article{alignment, author = {Ok, Seung-Hun and Choi, Young-Min and Kim, Seung-Wook and Lee, Se-Ho}, title = {Adaptive video demoiréing network with subtraction-guided alignment}, journal = {IEEE Signal Processing Letters}, year = {2025}, }

2024

ICOIN
Decentralized and versatile edge encoding methods for task-oriented communication systems

Hun Lee, and Seung-Wook Kim^*

In Proc. International Conference on Information Networking (ICOIN), 2024

Abs Bib PDF

In this paper, we develop task-oriented edge networks in which separate edge nodes perform decentralized inference processes with the aid of a cloud. Individual ENs compress their local observations into uplink messages using task-oriented encoder neural networks (NNs). Then, the cloud carries out a remote inference task by leveraging received signals. We develop fronthaul-cooperative DNN architecture along with proper uplink coordination protocols. Inspired by the nomographic function, an efficient cloud inference model becomes an integration of a number of shallow DNNs. This modulized architecture brings versatile calculations that are independent of the number of ENs. Numerical results demonstrate the viability of the proposed method for optimizing task-oriented edge networks.
@inproceedings{Decentralized, author = {Lee, Hun and Kim, Seung-Wook}, title = {Decentralized and versatile edge encoding methods for task-oriented communication systems}, year = {2024}, address = {Ho Chi Minh City, Vietnam}, url = {https://ieeexplore.ieee.org/abstract/document/10572144}, booktitle = {Proc. International Conference on Information Networking (ICOIN)}, }
EL
Enhanced Blur-Robust Monocular Depth Estimation via Self-Supervised Learning

Chi-Hun Sung, Seong-Yeol Kim, Ho-Ju Shin, Se-Ho Lee, and Seung-Wook Kim^*

Electronics Letters, 2024

Abs Bib PDF

This letter presents a novel self-supervised learning strategy to improve the robustness of a monocular depth estimation (MDE) network against motion blur. Motion blur, a common problem in real-world applications like autonomous driving and scene reconstruction, often hinders accurate depth perception. Conventional MDE methods are effective under controlled conditions but struggle to generalise their performance to blurred images. To address this problem, we generate blur-synthesised data to train a robust MDE model without the need for preprocessing, such as deblurring. By incorporating self-distillation techniques and using blur-synthesised data, the depth estimation accuracy for blurred images is significantly enhanced without additional computational or memory overhead. Extensive experimental results demonstrate the effectiveness of the proposed method, enhancing existing MDE models to accurately estimate depth information across various blur conditions.
@article{depth, author = {Sung, Chi-Hun and Kim, Seong-Yeol and Shin, Ho-Ju and Lee, Se-Ho and Kim, Seung-Wook}, title = {Enhanced Blur-Robust Monocular Depth Estimation via Self-Supervised Learning}, journal = {Electronics Letters}, year = {2024}, }
JVCIR
DCPNet: Deformable Control Point Network for Image Enhancement

Se-Ho Lee, and Seung-Wook Kim^*

Journal of Visual Communication and Image Representation, 2024

Abs Bib PDF

In this paper, we present a novel image enhancement network consisting of global and local color enhancement. The proposed network model is constructed using global transformation functions, which are formed by a set of piece-wise quadratic curves and a local color enhancement network based on the encoder–decoder network. To adaptively and dynamically control the ranges of each piece-wise curve, we introduce deformable control points (DCPs), which determine the overall structure of the global transformation functions. The parameters for piece-wise quadratic curve fitting and DCPs are estimated using the proposed DCP network (DCPNet). DCPNet processes a down-sampled image to derive the DCP parameters: The DCP offsets and the curve parameters. Then, we obtain a set of DCPs from the DCP offsets and connect each adjacent DCP pair by using the curve parameter to construct a global transformation function for each color channel. The original input images are then transformed based on the resulting transformation functions to obtain globally enhanced images. Finally, the intermediate image is fed into the local enhancement network, which has a U-Net architecture, to produce the spatially enhanced images. Extensive experimental results demonstrate the superiority of the proposed method over state-of-the-art methods in various image enhancement tasks, such as image retouching, tone-mapping, and underwater image enhancement.
@article{DCPNet, author = {Lee, Se-Ho and Kim, Seung-Wook}, title = {DCPNet: Deformable Control Point Network for Image Enhancement}, journal = {Journal of Visual Communication and Image Representation}, year = {2024}, }
KICS
Color Filter Array Mapping Using Generative Adversarial Networks

Seong-Yeol Kim, Chi-Hun Sung, and Seung-Wook Kim^*

KICS Journal, 2024

Abs Bib PDF

The lack of paired data is a critical problem in raw image mapping since it is hard to capture the color filter arrays (CFAs) of the same scene from different cameras. This paper introduces a novel RGBW/RGB CFA data generation method using generative adversarial networks (GANs). The experimental results confirm that the performance of the RGBW-to-RGB CFA mapping can be improved by using the proposed data generation method based on GANs.
@article{color, author = {Kim, Seong-Yeol and Sung, Chi-Hun and Kim, Seung-Wook}, title = {Color Filter Array Mapping Using Generative Adversarial Networks}, journal = {KICS Journal}, year = {2024}, }
IEEE IoTJ
Task-Oriented Edge Networks: Decentralized Learning Over Wireless Fronthaul

Hoon Lee, and Seung-Wook Kim^*

IEEE Internet of Things Journal, 2024

Abs Bib PDF

This article studies task-oriented edge networks where multiple edge Internet of Things nodes execute machine learning tasks with the help of powerful deep neural networks (DNNs) at a network cloud. Separate edge nodes (ENs) result in a partially observable system where they can only get partitioned features of the global network states. These local observations need to be forwarded to the cloud via resource-constrained wireless fronthual links. Individual ENs compress their local observations into uplink fronthaul messages using task-oriented encoder DNNs. Then, the cloud carries out a remote inference task by leveraging received signals. Such a distributed topology requests a decentralized training and decentralized execution (DTDE) learning framework for designing edge-cloud cooperative inference rules and their decentralized training strategies. First, we develop fronthaul-cooperative DNN architecture along with proper uplink coordination protocols suitable for wireless fronthaul interconnection. Inspired by the nomographic function, an efficient cloud inference model becomes an integration of a number of shallow DNNs. This modulized architecture brings versatile calculations that are independent of the number of ENs. Next, we present a decentralized training algorithm of separate edge-cloud DNNs over downlink wireless fronthaul channels. An appropriate downlink coordination protocol is proposed, which backpropagates gradient vectors wirelessly from the cloud to the ENs. Numerical results demonstrate the viability of the proposed DTDE framework for optimizing task-oriented edge networks.
@article{colos, author = {Lee, Hoon and Kim, Seung-Wook}, title = {Task-Oriented Edge Networks: Decentralized Learning Over Wireless Fronthaul}, journal = {IEEE Internet of Things Journal}, year = {2024}, }

2023

JVCIR
Dual-Branch Vision Transformer for Blind Image Quality Assessment

Se-ho Lee, and Seung-Wook Kim

Journal of Visual Communication and Image Representation, 2023

Abs Bib PDF

Blind image quality assessment (BIQA) has always been a challenging problem due to the absence of reference images. In this paper, we propose a novel dual-branch vision transformer for BIQA, which simultaneously considers both local distortions and global semantic information. It first extracts dual-scale features from the backbone network, and then each scale feature is fed into one of the transformer encoder branches as a local feature embedding to consider the scale-variant local distortions. Each transformer branch obtains the context of global image distortion as well as the local distortion by adopting content-aware embedding. Finally, the outputs of the dual-branch vision transformer are combined by using multiple feed-forward blocks to predict the image quality scores effectively. Experimental results demonstrate that the proposed BIQA method outperforms the conventional methods on the six public BIQA datasets.
@article{dual, author = {Lee, Se-ho and Kim, Seung-Wook}, title = {Dual-Branch Vision Transformer for Blind Image Quality Assessment}, journal = {Journal of Visual Communication and Image Representation}, year = {2023}, }

2022

CVPR
XYDeblur: Divide and Conquer for Single Image Deblurring

Seo-Won Ji^†, Jeong-Min Lee^†, Seung-Wook Kim^†, Jun-Pyo Hong, Seung-Jin Baek, Seung-Won Jung, and Sung-Jea Ko

In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022

Abs Bib PDF Code

Many convolutional neural networks (CNNs) for single image deblurring employ a U-Net structure to estimate latent sharp images. Having long been proven to be effective in image restoration tasks, a single lane of encoder-decoder architecture overlooks the characteristic of deblurring, where a blurry image is generated from complicated blur kernels caused by tangled motions. Toward an effective network architecture, we present complemental sub-solutions learning with a one-encoder-two-decoder architecture for single image deblurring. Observing that multiple decoders successfully learn to decompose information in the encoded features into directional components, we further improve both the network efficiency and the deblurring performance by rotating and sharing kernels exploited in the decoders, which prevents the decoders from separating unnecessary components such as color shift. As a result, our proposed network shows superior results as compared to U-Net while preserving the network parameters, and the use of the proposed network as the base network can improve the performance of existing state-of-the-art deblurring networks.
@inproceedings{xydeblur, author = {Ji, Seo-Won and Lee, Jeong-Min and Kim, Seung-Wook and Hong, Jun-Pyo and Baek, Seung-Jin and Jung, Seung-Won and Ko, Sung-Jea}, title = {XYDeblur: Divide and Conquer for Single Image Deblurring}, year = {2022}, address = {New Orleans, USA}, url = {https://openaccess.thecvf.com/content/CVPR2022/html/Ji_XYDeblur_Divide_and_Conquer_for_Single_Image_Deblurring_CVPR_2022_paper.html}, booktitle = {Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, }
IEEE Access
Blur-Robust Object Detection Using Feature-Level Deblurring via Self-Guided Knowledge Distillation

Sung-Jin Cho, Seung-Wook Kim^*, Seung-Won Jung, and Sung-Jea Ko

IEEE Access, 2022

Abs Bib PDF

Images captured from real-world environments often include blur artifacts resulting from camera movement, dynamic object motion, or out-of-focus. Although such blur artifacts are inevitable, most object detection methods do not have special considerations for them; therefore, they may fail to detect objects in blurry images. One possible solution is applying image deblurring prior to object detection. However, this solution is computationally demanding and its performance heavily depends on image deblurring results. In this study, we propose a novel blur-aware object detection framework. First, we construct a synthetic but realistic dataset by applying a diverse set of motion blur kernels to blur-free images. Subsequently, we leverage self-guided knowledge distillation between the teacher and student networks that perform object detection using blur-free and blurry images, respectively. The teacher and student networks share most of their network parameters and jointly learn in a fully-supervised manner. The teacher network provides image features as hints for feature-level deblurring and also renders soft labels for the training of the student network. Guided by the hints and the soft labels from the teacher, the student network learns and expands their knowledge on object detection in blurry images. Experimental results show that the proposed framework improves the robustness of several widely used object detectors against image blurs.
@article{blur, author = {Cho, Sung-Jin and Kim, Seung-Wook and Jung, Seung-Won and Ko, Sung-Jea}, title = {Blur-Robust Object Detection Using Feature-Level Deblurring via Self-Guided Knowledge Distillation}, journal = {IEEE Access}, year = {2022}, }

2021

IEEE Access
Compression Artifacts Reduction Using Fusion of Multiple Restoration Networks

Sung-Jin Cho, Jae Ryun Chung, Seung-Wook Kim, Seung-Won Jung, and Sung-Jea Ko

IEEE Access, 2021

Abs Bib PDF

Lossy video compression achieves coding gains at the expense of the quality loss of the decoded images. Owing to the success of deep learning techniques, especially convolutional neural networks (CNNs), many compression artifacts reduction (CAR) techniques have been used to significantly improve the quality of decoded images by applying CNNs which are trained to predict the original artifact-free images from the decoded images. Most existing video compression standards control the compression ratio using a quantization parameter (QP), so the quality of the decoded images is strongly QP-dependent. Training individual CNNs for predetermined QPs is one of the common approaches to dealing with different levels of compression artifacts. However, compression artifacts are also dependent on the local characteristics of an image. Therefore, a CNN trained for specific QP cannot fully remove the compression artifacts of all images, even those encoded using the same QP. In this paper, we introduce a pixel-precise network selection network (PNSNet). From multiple reconstructed images obtained using multiple QP-specific CAR networks, PNSNet is trained to find the best CAR network for each pixel. The output of PNSNet is then used as an explicit spatial attention channel for an image fusion network that combines multiple reconstructed images. Experimental results demonstrated that the quality of decoded images can be significantly improved by the proposed multiple CAR network fusion method.
@article{compression, author = {Cho, Sung-Jin and Chung, Jae Ryun and Kim, Seung-Wook and Jung, Seung-Won and Ko, Sung-Jea}, title = {Compression Artifacts Reduction Using Fusion of Multiple Restoration Networks}, journal = {IEEE Access}, year = {2021}, }
IEEE Network
Edge Network-Assisted Real-Time Object Detection Framework for Autonomous Driving

Seung-Wook Kim, Keun-Soo Ko, Han-Eul Ko, and Victor Leung

IEEE Network, 2021

Abs Bib PDF

Computer vision tasks such as object detection are crucial for the operations of autonomous vehicles (AVs). Results of many tasks, even those requiring high computational power, can be obtained within a short delay by offloading them to edge clouds. However, although edge clouds are exploited, real-time object detection cannot always be guaranteed due to dynamic channel quality. To mitigate this problem, we propose an edge-network-assisted real-time object detection framework (EODF). In an EODF, AVs extract the region of interest (Rols) of the captured image when the channel quality is not sufficiently good for supporting real-time object detection. Then AVs compress the image data on the basis of the Rols and transmit the compressed one to the edge cloud. In so doing, real-time object detection can be achieved due to the reduced transmission latency. To verify the feasibility of our framework, we evaluate the probability that the results of object detection are not received within the inter-frame duration (i.e., outage probability) and their accuracy. From the evaluation, we demonstrate that the proposed EODF provides the results to AVs in real time and achieves satisfactory accuracy.
@article{edge, author = {Kim, Seung-Wook and Ko, Keun-Soo and Ko, Han-Eul and Leung, Victor}, title = {Edge Network-Assisted Real-Time Object Detection Framework for Autonomous Driving}, journal = {IEEE Network}, year = {2021}, }
TNNLS
PEPSI++: Fast and lightweight network for image inpainting

Yong-Goo Shin, Min-Cheol Sagong, Yoon-Jae Yeo, Seung-Wook Kim, and Sung-Jea Ko

IEEE Transactions on Neural Networks and Learning Systems, 2021

Abs Bib PDF

Among the various generative adversarial network (GAN)-based image inpainting methods, a coarse-to-fine network with a contextual attention module (CAM) has shown remarkable performance. However, due to two stacked generative networks, the coarse-to-fine network needs numerous computational resources, such as convolution operations and network parameters, which result in low speed. To address this problem, we propose a novel network architecture called parallel extended-decoder path for semantic inpainting (PEPSI) network, which aims at reducing the hardware costs and improving the inpainting performance. PEPSI consists of a single shared encoding network and parallel decoding networks called coarse and inpainting paths. The coarse path produces a preliminary inpainting result to train the encoding network for the prediction of features for the CAM. Simultaneously, the inpainting path generates higher inpainting quality using the refined features reconstructed via the CAM. In addition, we propose Diet-PEPSI that significantly reduces the network parameters while maintaining the performance. In Diet-PEPSI, to capture the global contextual information with low hardware costs, we propose novel rate-adaptive dilated convolutional layers that employ the common weights but produce dynamic features depending on the given dilation rates. Extensive experiments comparing the performance with state-of-the-art image inpainting methods demonstrate that both PEPSI and Diet-PEPSI improve the qualitative scores, i.e., the peak signal-to-noise ratio (PSNR) and structural similarity (SSIM), as well as significantly reduce hardware costs, such as computational time and the number of network parameters.
@article{pepsi++, author = {Shin, Yong-Goo and Sagong, Min-Cheol and Yeo, Yoon-Jae and Kim, Seung-Wook and Ko, Sung-Jea}, title = {PEPSI++: Fast and lightweight network for image inpainting}, journal = {IEEE Transactions on Neural Networks and Learning Systems}, year = {2021}, }

2020

ICCE
Parallel Feature Pyramid Network for Image Denoising

Sung-Jin Cho, Kwang-Hyun Uhm, Seung-Wook Kim, Seo-Won Ji, and Sung-Jea Koo

In Proc. IEEE International Conference on Consumer Electronics (ICCE), 2020

Abs Bib PDF

Image denoising is a classical and essential task in consumer electronics equipped with cameras. Recently, the convolutional neural network (CNN)-based denoising methods have been widely studied. These methods adopt single-scale features to separate image structures from the noisy observation. Single-scale features, however, have limitation in covering the full characteristics of image structures at different scales. In this paper, we propose a novel denoising network that makes use of the multi-scale feature pyramid where each feature map represents the characteristics of image structure at different scales. We then combine these multi-scale features to obtain the contextual information and utilize it to effectively generate clear denoised results. Experimental results show that our network achieves superior performance to other conventional methods.
@inproceedings{parallel, author = {Cho, Sung-Jin and Uhm, Kwang-Hyun and Kim, Seung-Wook and Ji, Seo-Won and Koo, Sung-Jea}, title = {Parallel Feature Pyramid Network for Image Denoising}, year = {2020}, address = {Las Vegas, USA}, url = {https://ieeexplore.ieee.org/abstract/document/9043111}, booktitle = {Proc. IEEE International Conference on Consumer Electronics (ICCE)}, }
IEEE Access
Simple but Effective Scale Estimation for Monocular Visual Odometry in Road Driving Scenarios

Fan Ming^†, Seung-Wook Kim^†, Sung-Tae Kim, Jee-Young Sun, and Sung-Jea Ko

IEEE Access, 2020

Abs Bib PDF

In large-scale environments, scale drift is a crucial problem of monocular visual simultaneous localization and mapping (SLAM). A common solution is to utilize the camera height, which can be obtained using the reconstructed 3D ground points (3DGPs) from two successive frames, as prior knowledge. Increasing the number of 3DGPs by using more proceeding frames can be a natural extension of this solution to estimate a more precise camera height. However, merely employing multiple frames based on conventional methods is hard to be directly applicable in a real-world scenario because the vehicle motion and inaccurate feature matching inevitably cause large uncertainty and noisy 3DGPs. In this study, we propose an elaborate method to collect confident 3DGPs from multiple frames for robust scale estimation. First, we gather 3DGP candidates that can be seen in more than a predefined number of frames. To verify the 3DGP candidates, we filter out the 3D points at the exterior of the road region obtained by the deep-learning-based road segmentation model. In addition, we formulate an optimization problem constrained by a simple but effective geometric assumption that the normal vector of the ground plane lies in the null space of a movement vector of the camera center, and provide a closed-form solution. ORB-SLAM with the proposed scale estimation method achieves the average translation error with 1.19% on the KITTI dataset, which outperforms the state-of-the-art conventional monocular visual SLAM methods in road driving scenarios.
@article{simple, author = {Ming, Fan and Kim, Seung-Wook and Kim, Sung-Tae and Sun, Jee-Young and Ko, Sung-Jea}, title = {Simple but Effective Scale Estimation for Monocular Visual Odometry in Road Driving Scenarios}, journal = {IEEE Access}, year = {2020}, }
IEEE Access
Quaternary Census Transform Based on the Human Visual System for Stereo Matching

Seo-Won Ji, Seung-Wook Kim, Sung-Ho Chae, Dong-Pan Lim, and Sung-Jea Ko

IEEE Access, 2020

Abs Bib PDF

The census transform is a non-parametric local transform that is widely used in stereo matching. This transform encodes the structural information of a local patch into a binary code stream representing the relative intensity ordering of the pixels within the patch. Despite its high performance in stereo matching, the census transform often generates identical binary code streams for two different patches because it simply thresholds the pixels within the patch at the center pixel intensity. To overcome this problem, we introduce a quaternary census transform that encodes the local structural information into a quaternary code stream by employing both the relative intensity ordering and the minimum visibility threshold of the human eye known as the just-noticeable difference. Moreover, because the human eye activates different areas of the retina based on brightness, the patch size for the proposed quaternary census transform adaptively varies depending on the luminance of each pixel. Experimental results on well-known Middlebury stereo datasets prove that the proposed transform outperforms the other census transform-based methods in terms of the accuracy of stereo matching.
@article{Quaternary, author = {Ji, Seo-Won and Kim, Seung-Wook and Chae, Sung-Ho and Lim, Dong-Pan and Ko, Sung-Jea}, title = {Quaternary Census Transform Based on the Human Visual System for Stereo Matching}, journal = {IEEE Access}, year = {2020}, }
IEEE Access
Fast and Accurate 3D Hand Pose Estimation via Recurrent Neural Network for Capturing Hand Articulations

Cheol-Hwan Yoo, Seo-Won Ji, Yong-Goo Shin, Seung-Wook Kim, and Sung-Jea Ko

IEEE Access, 2020

Abs Bib PDF

3D hand pose estimation from a single depth image plays an important role in computer vision and human-computer interaction. Although recent hand pose estimation methods using convolution neural network (CNN) have shown notable improvements in accuracy, most of them have a limitation that they rely on a complex network structure without fully exploiting the articulated structure of the hand. A hand, which is an articulated object, is composed of six local parts: the palm and five independent fingers. Each finger consists of sequential-joints that provide constrained motion, referred to as a kinematic chain. In this paper, we propose a hierarchically-structured convolutional recurrent neural network (HCRNN) with six branches that estimate the 3D position of the palm and five fingers independently. The palm position is predicted via fully-connected layers. Each sequential-joint, i.e. finger position, is obtained using a recurrent neural network (RNN) to capture the spatial dependencies between adjacent joints. Then the output features of the palm and finger branches are concatenated to estimate the global hand position. HCRNN directly takes the depth map as an input without a time-consuming data conversion, such as 3D voxels and point clouds. Experimental results on public datasets demonstrate that the proposed HCRNN not only outperforms most 2D CNN-based methods using the depth image as their inputs but also achieves competitive results with state-of-the-art 3D CNN-based methods with a highly efficient running speed of 285 fps on a single GPU.
@article{fast, author = {Yoo, Cheol-Hwan and Ji, Seo-Won and Shin, Yong-Goo and Kim, Seung-Wook and Ko, Sung-Jea}, title = {Fast and Accurate 3D Hand Pose Estimation via Recurrent Neural Network for Capturing Hand Articulations}, journal = {IEEE Access}, year = {2020}, }
SPL
Simple yet effective way for improving the performance of lossy image compression

Yoon-Jae Yeo, Yong-Goo Shin, Min-Cheol Sagong, Seung-Wook Kim, and Sung-Jea Ko

IEEE Signal Processing Letters, 2020

Abs Bib PDF

Lossy image compression methods with deep neural network (DNN) include a quantization process between encoder and decoder networks as an essential part to increase the compression rate. However, the quantization operation impedes the flow of gradient and often disturbs the optimal learning of the encoder, which results in distortion in the reconstructed images. To alleviate this problem, this paper presents a simple yet effective way that enhances the performance of lossy image compression without imposing training overhead or modifying the original network architectures. In the proposed method, we utilize an auxiliary branch called a shortcut which directly connects the encoder and decoder. Since the shortcut does not include the quantization process, it supports the optimal learning of the encoder by flowing the accurate gradient. Furthermore, to assist the decoder which should handle additional feature maps obtained via the shortcut, we also propose a residual refinement unit (RRU) following the quantizer. The experimental results show that the image compression network trained with the proposed method remarkably improves the performance in terms of peak signal-to-noise ratio (PSNR), structural similarity (SSIM), and multi-scale structural similarity (MS-SSIM).
@article{lossy, author = {Yeo, Yoon-Jae and Shin, Yong-Goo and Sagong, Min-Cheol and Kim, Seung-Wook and Ko, Sung-Jea}, title = {Simple yet effective way for improving the performance of lossy image compression}, journal = {IEEE Signal Processing Letters}, year = {2020}, }

2019

CVPR
PEPSI: Fast image inpainting with parallel decoding network

Min-Cheol Sagong, Yong-Goo Shin, Seung-Wook Kim, Seung Park, and Sung-Jea Ko

In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019

Abs Bib PDF

Recently, a generative adversarial network (GAN)-based method employing the coarse-to-fine network with the contextual attention module (CAM) has shown outstanding results in image inpainting. However, this method requires numerous computational resources due to its two-stage process for feature encoding. To solve this problem, in this paper, we present a novel network structure, called PEPSI: parallel extended-decoder path for semantic inpainting. PEPSI can reduce the number of convolution operations by adopting a structure consisting of a single shared encoding network and a parallel decoding network with coarse and inpainting paths. The coarse path produces a preliminary inpainting result with which the encoding network is trained to predict features for the CAM. At the same time, the inpainting path creates a higher-quality inpainting result using refined features reconstructed by the CAM. PEPSI not only reduces the number of convolution operation almost by half as compared to the conventional coarse-to-fine networks but also exhibits superior performance to other models in terms of testing time and qualitative scores.
@inproceedings{pepsi, author = {Sagong, Min-Cheol and Shin, Yong-Goo and Kim, Seung-Wook and Park, Seung and Ko, Sung-Jea}, title = {PEPSI: Fast image inpainting with parallel decoding network}, year = {2019}, address = {Long Beach, USA}, url = {https://openaccess.thecvf.com/content_CVPR_2019/html/Sagong_PEPSI__Fast_Image_Inpainting_With_Parallel_Decoding_Network_CVPR_2019_paper}, booktitle = {Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, }
ICEIC
Learning an object detector using zoomed object regions

Sung-Jin Cho, Seung-Wook Kim, Kwang-Hyun Uhm, Hyong-Keun Kook, and Sung-Jea Ko

In Proc. International Conference on Electronics, Information, and Communication, 2019

Abs Bib PDF

The single shot multi-box detector (SSD) is one of the first real-time detectors, which uses a convolutional neural network (CNN) and achieves the state-of-the-art detection performance. However, owing to the semantic gap between each feature layer of CNN, the SSD has a room for improvement. In this paper, we propose a novel training scheme to enhance the performance of the SSD. In object detection, ground truth (GT) box is a bounding box enclosing an object boundary. To improve the semantic level of the feature map, we generate additional GT boxes by zooming in to and out from the original GT boxes. Experimental results show that the SSD trained with our scheme outperforms the original one on public dataset.
@inproceedings{learning, author = {Cho, Sung-Jin and Kim, Seung-Wook and Uhm, Kwang-Hyun and Kook, Hyong-Keun and Ko, Sung-Jea}, title = {Learning an object detector using zoomed object regions}, year = {2019}, address = {Auckland, New Zealand}, url = {https://ieeexplore.ieee.org/abstract/document/8706381}, booktitle = {Proc. International Conference on Electronics, Information, and Communication}, }

ICCVW

AIM 2019 Challenge on RAW to RGB Mapping: Methods and Results

Ignatov al.

In Proc. IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), 2019

Bib

@inproceedings{aim2019,
  author = {et al., Ignatov},
  title = {AIM 2019 Challenge on RAW to RGB Mapping: Methods and Results},
  year = {2019},
  address = {Seoul, South Korea},
  url = {https://ieeexplore.ieee.org/document/9022218},
  booktitle = {Proc. IEEE/CVF International Conference on Computer Vision Workshop (ICCVW)},
}

ICCVW

W-Net: Two-Stage U-Net With Misaligned Data for Raw-to-RGB Mapping

Kwang-Hyun Uhm, Seung-Wook Kim, Seo-Won Ji, Sung-Jin Cho, Jun-Pyo Hong, and Sung-Jea Ko

In Proc. IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), 2019

Abs Bib PDF

@inproceedings{wnet,
  author = {Uhm, Kwang-Hyun and Kim, Seung-Wook and Ji, Seo-Won and Cho, Sung-Jin and Hong, Jun-Pyo and Ko, Sung-Jea},
  title = {W-Net: Two-Stage U-Net With Misaligned Data for Raw-to-RGB Mapping},
  year = {2019},
  address = {Seoul, South Korea},
  url = {https://arxiv.org/abs/1911.08656},
  booktitle = {Proc. IEEE/CVF International Conference on Computer Vision Workshop (ICCVW)},
}

ICCVW

Reverse and Boundary Attention Network for Road Segmentations

Jee-Young Sun, Seung-Wook Kim, Sang-Won Lee, Ye-Won Kim, and Sung-Jea Ko

In Proc. IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), 2019

Abs Bib PDF Code

@inproceedings{rev_attention,
  author = {Sun, Jee-Young and Kim, Seung-Wook and Lee, Sang-Won and Kim, Ye-Won and Ko, Sung-Jea},
  title = {Reverse and Boundary Attention Network for Road Segmentations},
  year = {2019},
  address = {Seoul, South Korea},
  url = {https://ieeexplore.ieee.org/document/9022120},
  booktitle = {Proc. IEEE/CVF International Conference on Computer Vision Workshop (ICCVW)},
}

CVPRW

Evaluating parameterization methods for convolutional neural network (CNN)-based image operators

Seung-Wook Kim, Sung-Jin Cho, Kwang-Hyun Uhm, Seo-Won Ji, Sang-Won Lee, and Sung-Jea Ko

In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshop (CVPRW), 2019

Abs Bib PDF

@inproceedings{param_conv,
  author = {Kim, Seung-Wook and Cho, Sung-Jin and Uhm, Kwang-Hyun and Ji, Seo-Won and Lee, Sang-Won and Ko, Sung-Jea},
  title = {Evaluating parameterization methods for convolutional neural network (CNN)-based image operators},
  year = {2019},
  address = {Long Beach, USA},
  url = {https://ieeexplore.ieee.org/document/9025633},
  booktitle = {Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshop (CVPRW)},
}

CVPRW

NTIRE 2019 challenge on real image denoising: Methods and Results

Abdelhamed al.

In Proc. the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshop (CVPRW), 2019

Abs Bib PDF

@inproceedings{nitre2019,
  author = {et al., Abdelhamed},
  title = {NTIRE 2019 challenge on real image denoising: Methods and Results},
  year = {2019},
  address = {Long Beach, USA},
  url = {https://ieeexplore.ieee.org/document/9025399},
  booktitle = {Proc. the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshop (CVPRW)},
}

SPIC
An optimization framework for inverse tone mapping using a single low dynamic range image

Ming Fan, Dae-Hong Lee, Seung-Wook Kim, Seung Park, and Sung-Jea Ko

Signal Processing: Image Communication, 2019

Abs Bib PDF

Conventional inverse tone-mapping (ITM) methods tend to produce contrast distortions such as contrast loss and contrast reversal in reconstructed high dynamic range (HDR) images. This paper proposes a novel ITM optimization framework based on the assumption that the input low dynamic range (LDR) image is similar to the LDR image obtained by tone mapping a true HDR image. In the proposed framework, an HDR image is initially reconstructed by applying a conventional tone-mapping function in a reverse manner, and then the reconstructed HDR image is iteratively modified toward the optimum HDR image by minimizing the difference between the input LDR image and a tone-mapped LDR image obtained from the reconstructed HDR image. The experimental results demonstrate that the proposed framework effectively reconstructs a high-quality HDR image and outperforms other conventional methods in terms of objective quality.
@article{dynamic, author = {Fan, Ming and Lee, Dae-Hong and Kim, Seung-Wook and Park, Seung and Ko, Sung-Jea}, title = {An optimization framework for inverse tone mapping using a single low dynamic range image}, journal = {Signal Processing: Image Communication}, year = {2019}, }
EL
Context-aware encoding for clothing semantic parsing

Chul-Hwan Yoo, Yong-Gu Shin, Seung-Wook Kim, and Sung-Jea Ko

Electronics Letters, 2019

Abs Bib PDF

Clothing parsing is a special type of semantic segmentation in which each pixel is assigned with clothing labels. Unlike general scene semantic segmentation, stylish match (e.g. skirts + blouse, jeans + T-shirt) is an important cue for recognising fine-grained categories in clothing parsing. In this Letter, the authors propose a context-aware outfit encoder (COE), as a side branch, that drives the convolutional neural network to take the stylish match into account for clothing parsing. The proposed COE provides information on matching clothes that can be utilised to improve the prediction accuracy of the base network significantly. Experimental results show that fully convolutional network and MobileNet with the COE improve the mean intersection of the union of those without the COE by 2.5 and 2.8%, respectively, on CFPD dataset.
@article{clothing, author = {Yoo, Chul-Hwan and Shin, Yong-Gu and Kim, Seung-Wook and Ko, Sung-Jea}, title = {Context-aware encoding for clothing semantic parsing}, journal = {Electronics Letters}, year = {2019}, }

2018

ECCV
Parallel feature pyramid network for object detection

Seung-Wook Kim, Hyong-Keun Kook, Jee-Young Sun, Mun-Cheon Kang, and Sung-Jea Ko

In Proc. European Conference on Computer Vision (ECCV), 2018

Abs Bib PDF Code

Recently developed object detectors employ a convolutional neural network (CNN) by gradually increasing the number of feature layers with a pyramidal shape instead of using a featurized image pyramid. However, the different abstraction levels of the CNN feature layers often limit the detection performance, especially on small objects. To overcome this limitation, we propose a CNN-based object detection architecture, referred to as a parallel feature pyramid (FP) network (PFPNet), where the FP is constructed by widening the network width instead of increasing the network depth. First, we adopt spatial pyramid pooling and some additional feature transformations to generate a pool of feature maps with different sizes. In PFPNet, the additional feature transformation is performed in parallel, which yields the feature maps with similar levels of semantic abstraction across the scales. We then resize the elements of the feature pool to a uniform size and aggregate their contextual information to generate each level of the final FP. The experimental results confirmed that PFPNet increases the performance of the latest version of the single-shot multi-box detector (SSD) by mAP of 6.4% AP and especially, 7.8% AP_small on the MS-COCO dataset.
@inproceedings{pyramid, author = {Kim, Seung-Wook and Kook, Hyong-Keun and Sun, Jee-Young and Kang, Mun-Cheon and Ko, Sung-Jea}, title = {Parallel feature pyramid network for object detection}, year = {2018}, address = {Munich, Germany}, url = {https://openaccess.thecvf.com/content_ECCV_2018/html/Seung-Wook_Kim_Parallel_Feature_Pyramid_ECCV_2018_paper.html}, booktitle = {Proc. European Conference on Computer Vision (ECCV)}, }

ITC-CSCC

Bi-Directional feature pyramid network for object detection

Sung-Jin Cho, Seung-Wook Kim, Jee-Young Sun, Kwang-Hyun Uhm, and Sung-Jea Ko

In ICT-CSCC, 2018

Abs Bib PDF

@inproceedings{bi,
  author = {Cho, Sung-Jin and Kim, Seung-Wook and Sun, Jee-Young and Uhm, Kwang-Hyun and Ko, Sung-Jea},
  title = {Bi-Directional feature pyramid network for object detection},
  year = {2018},
  address = {Bangkok, Thailand},
  url = {},
  booktitle = {ICT-CSCC},
}

CBMS
A novel gastric ulcer differentiation system using convolutional neural networks

Jee-Young Sun, Sang-Won Lee, Mun-Cheon Kang, Seung-Wook Kim, and Sung-Jea Ko

In Proc. IEEE International Symposium on Computer-Based Medical Systems (CBMS), 2018

Abs Bib PDF

Gastric cancer can present itself as a gastric ulcer, which can mimic a benign gastric ulcer. In this paper, we introduce an objective and precise gastric ulcer differentiation system based on deep convolutional neural network (CNN) which can support the specialists by improving the diagnostic accuracy of the endoscopic examination of gastric ulcers. We first generated a new dataset consisting of endoscopic images of gastric ulcers and their corresponding type labels obtained by biopsy. We then design various ulcer differentiation models using classification or detection networks, and evaluate the performance of the models on the new dataset. Experimental results confirm that the classification network-based method shows performance comparable to doctors’ diagnosis, and the detection network-based one, which first detects ulcer regions and then determines the type of ulcer based on the detection results, exhibits the best performance. The proposed method provides an unbiased diagnosis and it outperforms endoscopic diagnoses performed by the specialists in terms of total accuracy.
@inproceedings{gastric, author = {Sun, Jee-Young and Lee, Sang-Won and Kang, Mun-Cheon and Kim, Seung-Wook and Ko, Sung-Jea}, title = {A novel gastric ulcer differentiation system using convolutional neural networks}, year = {2018}, address = {Karlstad, Sweden}, url = {https://ieeexplore.ieee.org/abstract/document/8417263}, booktitle = {Proc. IEEE International Symposium on Computer-Based Medical Systems (CBMS)}, }

ICEIC

Object detection with multi-scale context aggregation

Hyong-Keun Kook, Seung-Wook Kim, Sang-Won Lee, Young-Hyun Kim, and Sung-Jea Ko

In Proc. International Conference on Electronics, Information, and Communication, 2018

Abs Bib PDF

@inproceedings{aggregation,
  author = {Kook, Hyong-Keun and Kim, Seung-Wook and Lee, Sang-Won and Kim, Young-Hyun and Ko, Sung-Jea},
  title = {Object detection with multi-scale context aggregation},
  year = {2018},
  address = {Hawaii, USA},
  url = {https://scholar.korea.ac.kr/handle/2021.sw.korea/20385},
  booktitle = {Proc. International Conference on Electronics, Information, and Communication},
}

ICEIC

Single shot object detection using spatial pyramid pooling

Seung-Wook Kim, Hyong-Keun Kook, Young-Hyun Kim, Jee-Young Sun, and Sung-Jea Ko

In Proc. International Conference on Electronics, Information, and Communication, 2018

Abs Bib PDF

@inproceedings{single,
  author = {Kim, Seung-Wook and Kook, Hyong-Keun and Kim, Young-Hyun and Sun, Jee-Young and Ko, Sung-Jea},
  title = {Single shot object detection using spatial pyramid pooling},
  year = {2018},
  address = {Hawaii, USA},
  url = {},
  booktitle = {Proc. International Conference on Electronics, Information, and Communication},
}

EL
CNN-based UGS method using Cartesian-to-polar coordinate transformation

Bo-Sang Kim, Jee-Young Sun, Seung-Wook Kim, Mun-Cheon Kang, and Sung-Jea Ko

Electronics Letters, 2018

Abs Bib PDF

The main concern of user-guided segmentation (UGS) is to achieve high segmentation accuracy with minimal user interaction. A novel convolutional neural network (CNN)-based UGS method is proposed, which employs a single click as the user interaction. In the proposed method, the input image in the Cartesian coordinate system is first converted into the polar transformed image with the user-guided point (UGP) as the origin of the polar coordinate system. The transformed image not only effectively delivers the UGP to the CNN, but also enables a single-scale convolution kernel to act as a multi-scale kernel, whose receptive field in the Cartesian coordinate system is altered based on the UGP without any extra parameters. In addition, a feature selection module (FSM) is introduced and utilised to additionally extract radial and angular features from the polar transformed image. Experimental results demonstrate that the proposed CNN using the polar transformed image improves the segmentation accuracy (mean intersection over union) by 3.69% on PASCAL VOC 2012 dataset compared with the CNN using the Cartesian coordinate image. The FSM achieves additional performance improvement of 1.32%. Moreover, the proposed method outperforms the conventional non-CNN-based UGS methods by 12.61% on average.
@article{UGS, author = {Kim, Bo-Sang and Sun, Jee-Young and Kim, Seung-Wook and Kang, Mun-Cheon and Ko, Sung-Jea}, title = {CNN-based UGS method using Cartesian-to-polar coordinate transformation}, journal = {Electronics Letters}, year = {2018}, }
SPIC
High dynamic range image tone mapping based on asymmetric model of retinal adaptation

Dae-Hong Lee, Ming Fan, Seung-Wook Kim, Mun-Cheon Kang, and Sung-Jea Ko

Signal Processing: Image Communication, 2018

Abs Bib PDF

Global tone mapping operators using the symmetrical retinal response model to light tend to produce a low dynamic range (LDR) image that exhibits loss of details of its corresponding high dynamic range (HDR) image in a bright or dark area. In this paper, we introduce a new asymmetric sigmoid curve (ASC) based on the model of retinal adaptation encompassing symmetrical S-shaped curve, and present two global tone mapping operators by using the ASC. In the proposed method, an ASC-based tone mapping function is obtained by using a well-known classic photography technique, called zone system. In addition, a contrast-enhancing tone mapping function is introduced by formulating a bi-criteria optimization problem with the luminance histogram of an input HDR image and the ASC-based mapping function. Experimental results demonstrate that the proposed method enhances the global contrast while preserving image details in the tone-mapped LDR image. Moreover, the objective assessment results using an image quality metric indicate that the proposed method shows a high performance to state-of-the-art global tone mapping operators.
@article{retinal, author = {Lee, Dae-Hong and Fan, Ming and Kim, Seung-Wook and Kang, Mun-Cheon and Ko, Sung-Jea}, title = {High dynamic range image tone mapping based on asymmetric model of retinal adaptation}, journal = {Signal Processing: Image Communication}, year = {2018}, }
SPIC
A novel contrast enhancement forensics based on convolutional neural networks

Jee-Young Sun, Seung-Wook Kim, Sang-Won Lee, and Sung-Jea Ko

Signal Processing: Image Communication, 2018

Abs Bib PDF

Contrast enhancement (CE), one of the most popular digital image retouching technologies, is frequently utilized for malicious purposes. As a consequence, verifying the authenticity of digital images in CE forensics has recently drawn significant attention. Current CE forensic methods can be performed using relatively simple handcrafted features based on first-and second-order statistics, but these methods have encountered difficulties in detecting modern counter-forensic attacks. In this paper, we present a novel CE forensic method based on convolutional neural network (CNN). To the best of our knowledge, this is the first work that applies CNN to CE forensics. Unlike the conventional CNN in other research fields that generally accepts the original image as its input, in the proposed method, we feed the CNN with the gray-level co-occurrence matrix (GLCM) which contains traceable features for CE forensics, and is always of the same size, even for input images of different resolutions. By learning the hierarchical feature representations and optimizing the classification results, the proposed CNN can extract a variety of appropriate features to detect the manipulation. The performance of the proposed method is compared to that of three conventional forensic methods. The comparative evaluation is conducted within a dataset consisting of unaltered images, contrast-enhanced images, and counter-forensically attacked images. The experimental results indicate that the proposed method outperforms conventional forensic methods in terms of forgery-detection accuracy, especially in dealing with counter-forensic attacks.
@article{forensics, author = {Sun, Jee-Young and Kim, Seung-Wook and Lee, Sang-Won and Ko, Sung-Jea}, title = {A novel contrast enhancement forensics based on convolutional neural networks}, journal = {Signal Processing: Image Communication}, year = {2018}, }

2017

SRT

Automatic facial pore analysis system using multi‐scale pore detection

Jee-Young Sun, Seung-Wook Kim, Sung-Ho Lee, Jae-Eun Choi, and Sung-Jea Ko

Skin Research and Technology, 2017

Abs Bib PDF

@article{pore,
  author = {Sun, Jee-Young and Kim, Seung-Wook and Lee, Sung-Ho and Choi, Jae-Eun and Ko, Sung-Jea},
  title = {Automatic facial pore analysis system using multi‐scale pore detection},
  journal = {Skin Research and Technology},
  year = {2017},
}

TCE
Content-preserving video stitching method for multi-camera systems

Bo-Sang Kim, Kang-A Chohi, Won-Jae Park, Seung-Wook Kim, and Sung-Jea Ko

IEEE Transactions on Consumer Electronics, 2017

Abs Bib PDF

In this paper, a novel content-preserving video stitching method is proposed. Video stitching has to deal with moving objects in the overlapped area (OA) which often cause problems of structural misalignment or ghost effect. To this end, the proposed method first finds an optimal seam that does not pass through the moving objects, by using the extended dynamic programing (DP) technique based on the energy minimization. Then the content-aware adaptive blending is performed which effectively reduces the color discontinuity while restricting the ghost effect caused by moving objects in the OA. In addition, to reduce the computational complexity, a partial seam-update (PSU) scheme is proposed, in which the seam is re-calculated only for a part of the seam passing through the moving object. Experimental results demonstrate that the proposed method is superior to the conventional ones in terms of both subjective quality and computational complexity. In addition, this method achieves real-time performance on mobile platform, making it applicable to consumer electronic devices.
@article{stitching, author = {Kim, Bo-Sang and Chohi, Kang-A and Park, Won-Jae and Kim, Seung-Wook and Ko, Sung-Jea}, title = {Content-preserving video stitching method for multi-camera systems}, journal = {IEEE Transactions on Consumer Electronics}, year = {2017}, }
JVCIR
High-dimensional feature extraction using bit-plane decomposition of local binary patterns for robust face recognition

Cheol-Hwan Yoo, Seung-Wook Kim, June-Young Jung, and Sung-Jea Ko

Journal of Visual Communication and Image Representation, 2017

Abs Bib PDF

Transforming an original image into a high-dimensional (HD) feature has been proven to be effective in classifying images. This paper presents a novel feature extraction method utilizing the HD feature space to improve the discriminative ability for face recognition. We observed that the local binary pattern can be decomposed into bit-planes, each of which has scale-specific directional information of the face image. Each bit-plane not only has the inherent local-structure of the face image but also has an illumination-robust characteristic. By concatenating all the decomposed bit-planes, we generate an HD feature vector with an improved discriminative ability. To reduce the computational complexity while preserving the incorporated local structural information, a supervised dimension reduction method, the orthogonal linear discriminant analysis, is applied to the HD feature vector. Extensive experimental results show that existing classifiers with the proposed feature outperform those with other conventional features under various illumination, pose, and expression variations.
@article{face, author = {Yoo, Cheol-Hwan and Kim, Seung-Wook and Jung, June-Young and Ko, Sung-Jea}, title = {High-dimensional feature extraction using bit-plane decomposition of local binary patterns for robust face recognition}, journal = {Journal of Visual Communication and Image Representation}, year = {2017}, }
EL
Illumination normalisation using convolutional neural network with application to face recognition

Young-Hyun Kim, Hoon Kim, Seung-Wook Kim, Hyo-Young Kim, and Sung-Jea Ko

Electronics Letters, 2017

Abs Bib PDF

A novel illumination normalisation (IN) method using a convolutional neural network (CNN) is proposed. The proposed network is composed of the local pattern extraction (LPE) and illumination elimination (IE) layers. The LPE layers model the relationships between the pixels in each local region in order to handle various types of local shadow and shading in the face image. Based on the commonly used assumption about the illumination field, the IE layers generate illumination-insensitive ratio images by calculating the ratio between the output pairs produced from the LPE layers. The final feature map obtained by combining the ratio images can possess an improved discriminative ability for face recognition (FR). For training the proposed network, the results produced by the Weber fraction-based IN methods as ground truths are utilised. The experimental results demonstrate that the proposed network performs better in terms of FR accuracy compared with the conventional non-CNN-based method and it can be combined with any CNN-based face classifier.
@article{Illumination, author = {Kim, Young-Hyun and Kim, Hoon and Kim, Seung-Wook and Kim, Hyo-Young and Ko, Sung-Jea}, title = {Illumination normalisation using convolutional neural network with application to face recognition}, journal = {Electronics Letters}, year = {2017}, }

2016

ICEIC

Improved pedestrian detection using joint aggregated channel features

Joon-Yeon Kim, Seung-Wook Kim, Hyo-Young Kim, Won-Jae Park, and Sung-Jea Ko

In Proc. International Conference on Electronics, Information, and Communication, 2016

Abs Bib PDF

@inproceedings{pedestrian,
  author = {Kim, Joon-Yeon and Kim, Seung-Wook and Kim, Hyo-Young and Park, Won-Jae and Ko, Sung-Jea},
  title = {Improved pedestrian detection using joint aggregated channel features},
  year = {2016},
  address = {Da Nang, Vietnam},
  url = {},
  booktitle = {Proc. International Conference on Electronics, Information, and Communication},
}

JDT
Camera-Based Color Calibration Method for Multiple Flat-Panel Displays Using Smartphone

June-Young Jung, Seung-Wook Kim, Seung Park, Byeong-Doo Choi, and Sung-Jea Ko

IEEE Journal of Display Technology, 2016

Abs Bib PDF

Most facial recognition (FR) systems first extract discriminative features from a facial image and then perform classification. This paper proposes a method aimed at representing human facial traits and a low-dimensional feature extraction method using orthogonal linear discriminant analysis (OLDA). The proposed feature relies on a local binary pattern to represent texture information and random ferns to build a structural model. By concatenating its feature vectors, the proposed method achieves a high-dimensional descriptor of the input facial image. In general, the feature dimension is highly related to its discriminative ability. However, higher dimensionality is more complex to compute. Thus, dimensionality reduction is essential for practical FR applications. OLDA is employed to reduce the dimension of the extracted features and improve discriminative performance. With a representative FR database, the proposed method demonstrates a higher recognition rate and low computational complexity compared to existing FR methods. In addition, with a facial image database with disguises, the proposed algorithm demonstrates outstanding performance.
@article{Calibration, author = {Jung, June-Young and Kim, Seung-Wook and Park, Seung and Choi, Byeong-Doo and Ko, Sung-Jea}, title = {Camera-Based Color Calibration Method for Multiple Flat-Panel Displays Using Smartphone}, journal = {IEEE Journal of Display Technology}, year = {2016}, }
TCE
LBP-ferns-based feature extraction for robust facial recognition

June-Young Jung, Seung-Wook Kim, Cheol-Hwan Yoo, Won-Jae Park, and Sung-Jea Ko

IEEE Transactions on Consumer Electronics, 2016

Abs Bib PDF

Most facial recognition (FR) systems first extract discriminative features from a facial image and then perform classification. This paper proposes a method aimed at representing human facial traits and a low-dimensional feature extraction method using orthogonal linear discriminant analysis (OLDA). The proposed feature relies on a local binary pattern to represent texture information and random ferns to build a structural model. By concatenating its feature vectors, the proposed method achieves a high-dimensional descriptor of the input facial image. In general, the feature dimension is highly related to its discriminative ability. However, higher dimensionality is more complex to compute. Thus, dimensionality reduction is essential for practical FR applications. OLDA is employed to reduce the dimension of the extracted features and improve discriminative performance. With a representative FR database, the proposed method demonstrates a higher recognition rate and low computational complexity compared to existing FR methods. In addition, with a facial image database with disguises, the proposed algorithm demonstrates outstanding performance.
@article{LBP, author = {Jung, June-Young and Kim, Seung-Wook and Yoo, Cheol-Hwan and Park, Won-Jae and Ko, Sung-Jea}, title = {LBP-ferns-based feature extraction for robust facial recognition}, journal = {IEEE Transactions on Consumer Electronics}, year = {2016}, }
Signal Processing
Retinex-based illumination normalization using class-based illumination subspace for robust face recognition

Seung-Wook Kim, June-Young Jung, Cheol-Hwan Yoo, and Sung-Jea Ko

Signal Processing, 2016

Abs Bib PDF

Recent illumination normalization (IN) methods first decompose a face image into a reflectance (R)-image having a lighting-invariant characteristic and an illuminance (I)-image including shading and shadowing effects. An illumination-normalized I-image is then obtained by eliminating the lighting-dependent image variations (LDIV) from the I-image. Finally, the normalized I-and R-images are recombined for face recognition (FR). However, the decomposed-reflectance is often contaminated with the lighting effects. Moreover, the lighting normalization tends to remove the valuable discriminant information in the I-image. To address these problems, we employ the local edge-preserving filter to generate the R-image whereby the lighting-invariant information is well preserved. In addition, we propose a subspace-based IN method that can retain the large facial-structure in the I-image. To construct the proposed subspace, we calculate the LDIV within the same class of people from the training database of face images. Then, we apply the singular value decomposition to the calculated LDIV to obtain the basis images of the subspace. By projecting the I-image onto these basis images, we can effectively extract and eliminate the LDIV from the I-image without discarding the discriminant information. Experimental results confirm that FR with the proposed method outperforms that with existing IN methods under varying lighting conditions.
@article{Retinex, author = {Kim, Seung-Wook and Jung, June-Young and Yoo, Cheol-Hwan and Ko, Sung-Jea}, title = {Retinex-based illumination normalization using class-based illumination subspace for robust face recognition}, journal = {Signal Processing}, year = {2016}, }
EL
2D histogram equalisation based on the human visual system

Seung-Wook Kim, Byeong-Doo Choi, Won-Jae Park, and Sung-Jea Ko

Electronics Letters, 2016

Abs Bib PDF

Histogram equalisation (HE) methods using the 2D histogram (2DH) have achieved a great success in contrast enhancement. The 2DH is constructed by using the occurrence of a local pixel pair (LPP) consisting of each pixel and its surrounding pixels. However, the 2DH-based methods often produce over-stretching artefacts because the low-textured regions primarily existing in the image induce a spike at some LPPs in the 2DH. To solve this problem, the 2DH is constructed by employing two properties of the human visual system (HVS) as follows: the HVS has the better brightness discrimination in the dark region according to Weber’s law, and the HVS is less sensitive to visual artefacts in the higher-textured regions. To create a spike-free 2DH, a weighting function reflecting these two properties of the HVS is designed for the LPP. As compared with the popular 2DH-based methods, the HE with the proposed 2DH can effectively enhance the image contrast while achieving the best perceptual similarity score between the input and output images.
@article{histogram, author = {Kim, Seung-Wook and Choi, Byeong-Doo and Park, Won-Jae and Ko, Sung-Jea}, title = {2D histogram equalisation based on the human visual system}, journal = {Electronics Letters}, year = {2016}, }

2015

ICGHIT

Automatic color calibration method for multiple display system using smart phone

Ming Fan, June-Woo Yun, Keun-Young Byun, Seung-Wook Kim, and Sung-Jea Ko

In Proc. International Conference on Green and Human Information Technology, 2015

Abs Bib PDF

@inproceedings{automatic,
  author = {Fan, Ming and Yun, June-Woo and Byun, Keun-Young and Kim, Seung-Wook and Ko, Sung-Jea},
  title = {Automatic color calibration method for multiple display system using smart phone},
  year = {2015},
  address = {Da Nang, Vietnam},
  url = {},
  booktitle = {Proc. International Conference on Green and Human Information Technology},
}

Neurocomputing
Random projection-based partial feature extraction for robust face recognition

Chunfei Ma, June-Young Jung, Seung-Wook Kim, and Sung-Jea Ko

Neurocomputing, 2015

Abs Bib PDF

In this paper, a novel feature extraction method for robust face recognition (FR) is proposed. The proposed method combines a simple yet effective dimensionality increasing (DI) method with an information-preserving dimensionality reduction (DR) method. For the proposed DI method, we employ the rectangle filters which sum the pixel values within a randomized rectangle window on the face image to extract the feature. By convolving the face image with all possible rectangle filters having various locations and scales, the face image in the image space is projected to a very high-dimensional feature space where more discriminative information can be incorporated. In order to significantly reduce the computational complexity while preserving the most informative features, we adopt a random projection method based on the compressed sensing theory for DR. Unlike the traditional holistic-based feature extraction methods requiring the time-consuming data-dependent training procedure, the proposed method has the partial-based and data-independent properties. Extensive experimental results on representative FR databases show that, as compared with conventional feature extraction methods, our proposed method not only achieves the higher recognition accuracy but also shows better robustness to corruption, occlusion, and disguise.
@article{Random, author = {Ma, Chunfei and Jung, June-Young and Kim, Seung-Wook and Ko, Sung-Jea}, title = {Random projection-based partial feature extraction for robust face recognition}, journal = {Neurocomputing}, year = {2015}, }

2014

ICEIC

Enhanced illumination normalization for LBP-based face recognition

Seung-Wook Kim, June-Young Jung, Seung Park, and Sung-Jea Ko

In Proc. International Conference on Electronics, Information, and Communication, 2014

Abs Bib PDF

@inproceedings{illumination,
  author = {Kim, Seung-Wook and Jung, June-Young and Park, Seung and Ko, Sung-Jea},
  title = {Enhanced illumination normalization for LBP-based face recognition},
  year = {2014},
  address = {Kota Kinabalu, Malaysia},
  url = {},
  booktitle = {Proc. International Conference on Electronics, Information, and Communication},
}

TCE
Eigen directional bit-planes for robust face recognition

Lei Lei, Seung-Wook Kim, Won-jae Park, Dae-hwan Kim, and Sung-jea Ko

IEEE Transactions on Consumer Electronics, 2014

Abs Bib PDF

A visible image-based face recognition system can be seriously degraded in real-life environments by various factors including illumination changes, expression changes, occlusion, and disguise. In this paper, a novel feature descriptor for robust face recognition, Eigen Directional Bit-Plane (EDBP), is introduced to address these issues. It is observed that Local Binary Pattern (LBP) can be decomposed into 8 directional bit-planes (DBP), each of which represents certain directional information of the facial image. Principal Component Analysis (PCA) is then applied to the DBP space to obtain a more compact feature, the EDBP. For face recognition, the proposed EDBP is integrated into conventional state-of-the-art classification methods. Simulation results demonstrate that classifiers with EDBP outperform those with existing feature descriptors under illumination changes, expression changes, occlusion, and disguise.
@article{Eigen, author = {Lei, Lei and Kim, Seung-Wook and Park, Won-jae and Kim, Dae-hwan and Ko, Sung-jea}, title = {Eigen directional bit-planes for robust face recognition}, journal = {IEEE Transactions on Consumer Electronics}, year = {2014}, }

2013

ICCE
Sensor fusion-based people counting system using the active appearance models

Seung-Wook Kim, June-Young Jung, Seung-Jun Lee, Aldo W. Morales, and Sung-Jea Ko

In Proc. IEEE International Conference on Consumer Electronics (ICCE), 2013

Abs Bib PDF

The paper presents a novel robust people counting system using the active appearance model (AAM). Conventional people counting methods utilizing the monoscopic or stereoscopic image data often fail due to occasional illumination change and crowded environment. The proposed algorithm uses both the vision and depth image captured by a vision-plus-depth camera mounted on the ceiling. Then, we construct a 3D human model from the depth image using the AAM to segment and recognize human objects. Experimental results show that the proposed algorithm achieves over 97% accuracy in various testing environments.
@inproceedings{sensor, author = {Kim, Seung-Wook and Jung, June-Young and Lee, Seung-Jun and Morales, Aldo W. and Ko, Sung-Jea}, title = {Sensor fusion-based people counting system using the active appearance models}, year = {2013}, address = {Las Vegas, USA}, url = {https://ieeexplore.ieee.org/abstract/document/6486797}, booktitle = {Proc. IEEE International Conference on Consumer Electronics (ICCE)}, }