您现在的位置:首页>外文期刊>Image and Vision Computing

期刊信息

  • 期刊名称:

    Image and Vision Computing

  • 中文名称: 影像与视觉计算
  • 刊频: 1.474
  • ISSN: 0262-8856
  • 出版社: -
  • 简介:
  • 排序:
  • 显示:
  • 每页:
全选(0
<1/20>
2258条结果
  • 机译 Spec-Net和Spec-CGAN:深度学习模型,可从面部去除镜面反射
    • 作者:;
    • 刊名:Image and Vision Computing
    • 2020年第Jan.期
    摘要: The process of splitting an image into specular and diffuse components is a fundamental problem in computer vision, because most computer vision algorithms, such as image segmentation and tracking, assume diffuse surfaces, so existence of specular reflection can mislead algorithms to make incorrect decisions. Existing decomposition methods tend to work well for images with low specularity and high chromaticity, but they fail in cases of high intensity specular light and on images with low chromaticity. In this paper, we address the problem of removing high intensity specularity from low chromaticity images (faces). We introduce a new dataset, Spec-Face, comprising face images corrupted with specular lighting and corresponding ground truth diffuse images. We also introduce two deep learning models for specularity removal, Spec-Net and Spec-CGAN. Spec-Net takes an intensity channel as input and produces an output image that is very close to ground truth, while Spec-CGAN takes an RGB image as input and produces a diffuse image very similar to the ground truth RGB image. On Spec-Face, with Spec-Net, we obtain a peak signal-to-noise ratio (PSNR) of 3.979, a local mean squared error (LMSE) of 0.000071, a structural similarity index (SSIM) of 0.899, and a Frechet Inception Distance (FID) of 20.932. With Spec-CGAN, we obtain a PSNR of 3.360, a LMSE of 0.000098, a SSIM of 0.707, and a FID of 31.699. With Spec-Net and Spec-CGAN, it is now feasible to perform specularity removal automatically prior to other critical complex vision processes for real world images, i.e., faces. This will potentially improve the performance of algorithms later in the processing stream, such as face recognition and skin cancer detection. (C) 2019 Elsevier B.V. All rights reserved.
  • 机译 基于鉴别器和提取器的集成船舶分割方法
    • 作者:;
    • 刊名:Image and Vision Computing
    • 2020年第Jan.期
    摘要: Ship segmentation is an important task in maritime surveillance systems. A great deal of research on image segmentation has been done in the past few years, but there appears to be some problems when directly utilizing them for ship segmentation under complex maritime background. The interference factors decreasing segmentation performance usually are from the peculiarity of complex maritime background, such as the existence of sea fog, large wakes and large waves. To deal with these interference factors, this paper presents an integrated ship segmentation method based on discriminator and extractor (ISDE). Different from traditional segmentation methods, our method consists of two components in light of the structure: Interference Factor Discriminator (IFD) and Ship Extractor (SE). SqueezeNet is employed for the implementation of IFD as the first step to make a judgment on what interference factors are contained in the input image. While DeepLabv3 + and improved DeepLabv3 + are employed for the implementation of SE as the second step to finally extract ships. We collect a ship segmentation dataset and conduct intensive experiments on it. The experimental results demonstrate that our method for ship segmentation outperforms state-of-the-art methods in terms of segmentation accuracy, especially for the images contain sea fog. Besides our method can run in real time as well. (C) 2019 Elsevier B.V. All rights reserved.
  • 机译 从2D图像进行深度预测:分类法和评估研究
    • 作者:;
    • 刊名:Image and Vision Computing
    • 2020年第Jan.期
    摘要: Among the various cues that help us understand and interact with our surroundings, depth is of particular importance. It allows us to move in space and grab objects to complete different tasks. Therefore, depth prediction has been an active research field for decades and many algorithms have been proposed to retrieve depth. Some imitate human vision and compute depth through triangulation on correspondences found between pixels or handcrafted features in different views of the same scene. Others rely on simple assumptions and semantic knowledge of the structure of the scene to get the depth information. Recently, numerous algorithms based on deep learning have emerged from the computer vision community. They implement the same principles as the non-deep learning methods and leverage the ability of deep neural networks of automatically learning important features that help to solve the task. By doing so, they produce new state-of-the-art results and show encouraging prospects. In this article, we propose a taxonomy of deep learning methods for depth prediction from 2D images. We retained the training strategy as the sorting criterion. Indeed, some methods are trained in a supervised manner which means depth labels are needed during training while others are trained in an unsupervised manner. In that case, the models learn to perform a different task such as view synthesis and depth is only a by-product of this learning. In addition to this taxonomy, we also evaluate nine models on two similar datasets without retraining. Our analysis showed that (i) most models are sensitive to sharp discontinuities created by shadows or colour contrasts and (ii) the post processing applied to the results before computing the commonly used metrics can change the model ranking. Moreover, we showed that most metrics agree with each other and are thus redundant. (C) 2019 Elsevier B.V. All rights reserved.
  • 机译 在计算机视觉任务中转移学习:记住您来自哪里
    • 作者:;
    • 刊名:Image and Vision Computing
    • 2020年第Jan.期
    摘要: Fine-tuning pre-trained deep networks is a practical way of benefiting from the representation learned on a large database while having relatively few examples to train a model. This adjustment is nowadays routinely performed so as to benefit of the latest improvements of convolutional neural networks trained on large databases. Fine-tuning requires some form of regularization, which is typically implemented by weight decay that drives the network parameters towards zero. This choice conflicts with the motivation for fine-tuning, as starting from a pre-trained solution aims at taking advantage of the previously acquired knowledge. Hence, regularizers promoting an explicit inductive bias towards the pre-trained model have been recently proposed. This paper demonstrates the versatility of this type of regularizer across transfer learning scenarios. We replicated experiments on three state-of-the-art approaches in image classification, image segmentation, and video analysis to compare the relative merits of regularizers. These tests show systematic improvements compared to weight decay. Our experimental protocol put forward the versatility of a regularizer that is easy to implement and to operate that we eventually recommend as the new baseline for future approaches to transfer learning relying on fine-tuning. (C) 2019 Elsevier B.V. All rights reserved.
  • 机译 用于6D姿态估计的区域外关键点定位
    • 作者:;
    • 刊名:Image and Vision Computing
    • 2020年第Jan.期
    摘要: This paper addresses the problem of instance level 6D pose estimation from a single RGB image. Our approach simultaneously detects objects and recovers poses by predicting the 2D image locations of the object's 3D bounding box vertices. Specifically, we focus on the challenge of locating virtual keypoints outside the object region proposals, and propose a boundary-based keypoint representation which incorporates classification and regression schemes to reduce output space. Moreover, our method predicts localization confidences and alleviates the influence of difficult keypoints by a voting process. We implement the proposed method based on 2D detection pipeline, meanwhile bridge the feature gap between detection and pose estimation. Our network has real-time processing capability, which runs 30 fps on a GTX 1080Ti GPU. For single object and multiple objects pose estimation on two benchmark datasets, our approach achieves competitive or superior performance compared with state-of-the-art RGB based pose estimation methods. (C) 2019 Elsevier B.V. All rights reserved.
  • 机译 用于人类动作识别的多流深度学习模型
    • 作者:;
    • 刊名:Image and Vision Computing
    • 2020年第Jan.期
    摘要: Human action recognition is one of the most important and challenging topic in the fields of image processing. Unlike object recognition, action recognition requires motion feature modeling which contains not only spatial but also temporal information. In this paper, we use multiple models to characterize both global and local motion features. Global motion patterns are represented efficiently by the depth-based 3-channel motion history images (MHIs). Meanwhile, the local spatial and temporal patterns are extracted from the skeleton graph. The decisions of these two streams are fused. At the end, the domain knowledge, which is the object/action dependency is considered. The proposed framework is evaluated on two RGB-D datasets. The experimental results show the effectiveness of our proposed approach. The performance is comparable with the state-of-the-art. (C) 2019 Elsevier B.V. All rights reserved.
  • 机译 基于区域的重叠椭圆拟合及其在细胞分割中的应用
    • 作者:;
    • 刊名:Image and Vision Computing
    • 2020年第Jan.期
    摘要: We present RFOVE, a region-based method for approximating an arbitrary 2D shape with an automatically determined number of possibly overlapping ellipses. RFOVE is completely unsupervised, operates without any assumption or prior knowledge on the object's shape and extends and improves the Decremental Ellipse Fitting Algorithm (DEFA) [1]. Both RFOVE and DEFA solve the multi-ellipse fitting problem by performing model selection that is guided by the minimization of the Akaike Information Criterion on a suitably defined shape complexity measure. However, in contrast to DEFA, RFOVE minimizes an objective function that allows for ellipses with higher degree of overlap and, thus, achieves better ellipse-based shape approximation. A comparative evaluation of RFOVE with DEFA on several standard datasets shows that RFOVE achieves better shape coverage with simpler models (less ellipses). As a practical exploitation of RFOVE, we present its application to the problem of detecting and segmenting potentially overlapping cells in fluorescence microscopy images. Quantitative results obtained in three public datasets (one synthetic and two with more than 4000 actual stained cells) show the superiority of RFOVE over the state of the art in overlapping cells segmentation. (C) 2019 Elsevier B.V. All rights reserved.
  • 机译 通过分段交叉熵损失进行细粒度图像检索
    • 作者:;
    • 刊名:Image and Vision Computing
    • 2020年第Jan.期
    摘要: Fine-Grained Image Retrieval is an important problem in computer vision. It is more challenging than the task of content-based image retrieval because it has small diversity within the different classes but large diversity in the same class. Recently, the cross entropy loss can be utilized to make Convolutional Neural Network (CNN) generate distinguish feature for Fine-Grained Image Retrieval, and it can obtain further improvement with some extra operations, such as Normalize-Scale layer. In this paper, we propose a variant of the cross entropy loss, named Piecewise Cross Entropy loss function, for enhancing model generalization and promoting the retrieval performance. Besides, the Piecewise Cross Entropy loss is easy to implement. We evaluate the performance of the proposed scheme on two standard fine-grained retrieval benchmarks, and obtain significant improvements over the state-of-the-art, with 11.8% and 3.3% over the previous work on CARS196 and CUB-200-2011, respectively. (C) 2019 Published by Elsevier B.V. reserved.
  • 机译 产品概念数据的面向概念标签的多标签学习
    • 作者:;
    • 刊名:Image and Vision Computing
    • 2020年第Jan.期
    摘要: In the designing field, designers usually retrieve the images for reference according to product attributes when designing new proposals. To obtain the attributes of the product, the designers take lots of time and effort to collect product images and annotate them with multiple labels. However, the labels of product images represent the concept of subjective perception, which makes the multi-label learning more challenging to imitate the human aesthetic rather than discriminate the appearance. In this paper, a Feature Correlation Learning (FCL) network is proposed to solve this problem by exploiting the potential feature correlations of product images. Given a product image, the FCL network calculates the features of different levels and their correlations via gram matrices. The FCL is aggregated with the DenseNet to predict the labels of the input product image. The proposed method is compared with several outstanding multi-label learning methods, as well as DenseNet. Experimental results demonstrate that the proposed method outperforms the state-of-the-arts for multi-label learning problem of product image data. (C) 2019 Published by Elsevier B.V. reserved.
  • 机译 移动场景中的面部表情攻击检测:全面评估
    • 作者:;
    • 刊名:Image and Vision Computing
    • 2020年第Jan.期
    摘要: The vulnerability of face recognition systems to different presentation attacks has aroused increasing concern in the biometric community. Face presentation detection (PAD) techniques, which aim to distinguish real face samples from spoof artifacts, are the efficient countermeasure. In recent years, various methods have been proposed to address 2D type face presentation attacks, including photo print attack and video replay attack. However, it is difficult to tell which methods perform better for these attacks, especially in practical mobile authentication scenarios, since there is no systematic evaluation or benchmark of the state-of-the-art methods on a common ground (i.e., using the same databases and protocols). Therefore, this paper presents a comprehensive evaluation of several representative face PAD methods (30 in total) on three public mobile spoofing datasets to quantitatively compare the detection performance. Furthermore, the generalization ability of existing methods is tested under cross-database testing scenarios to show the possible database bias. We also summarize meaningful observations and give some insights that will help promote both academic research and practical applications. (C) 2019 Elsevier B.V. All rights reserved.
  • 机译 SAANet:用于自动驾驶中物体检测的空间自适应对准网络
    • 作者:;
    • 刊名:Image and Vision Computing
    • 2020年第Feb.期
    摘要: images and point clouds are beneficial for object detection in a visual navigation module for autonomous driving. The spatial relationships between different objects at different times in a bimodal space can vary significantly. It is difficult to combine bimodal descriptions into a unified model to effectively detect objects in an efficient amount of time. In addition, conventional voxelization methods resolve point clouds into voxels at a global level, and often overlook local attributes of the voxels. To address these problems, we propose a novel fusionbased deep framework named SAANet. SAANet utilizes a spatial adaptive alignment (SAA) module to align point cloud features and image features, by automatically discovering the complementary information between point clouds and images. Specifically, we transform the point clouds into 3D voxels, and introduce local orientation encoding to represent the point clouds. Then, we use a sparse convolutional neural network to learn a point cloud feature. Simultaneously, a ResNet-like 2D convolutional neural network is used to extract an image feature. Next, the point cloud feature and image feature are fused by our SAA block to derive a comprehensive feature. Then, the labels and 3D boxes for objects are learned using a multi-task learning network Finally, an experimental evaluation on the KITFI benchmark demonstrates the advantages of our method in terms of average precision and inference time, as compared to previous state-of-the-art results for 3D object detection. (C) 2020 Elsevier B.V. All rights reserved.
  • 机译 皮肤检测和轻量级加密可在实时监控应用中保护隐私
    • 作者:;
    • 刊名:Image and Vision Computing
    • 2020年第Feb.期
    摘要: An individual's privacy is a significant concern in surveillance videos. Existing research work into the location of individuals on the basis of detecting their skin is focused either on different techniques for detecting human skin on protecting individuals from the consequences of applying such techniques. This paper considers both lines of research and proposes a hybrid scheme for human skin detection and subsequent privacy protection by utilizing color information in dynamically varying illumination and environmental conditions. For those purposes, dynamic and explicit skin-detection approaches are implemented, simultaneously considering multiple colorspaces, i.e. RGB, perceptual (HSV) and orthogonal (YCbCr) color-spaces, and then detecting the human skin by the proposed Combined Threshold Rule (CTR)-based segmentation. Comparative qualitative and quantitative detection results with an average 93.73% accuracy, imply that the proposed scheme achieves considerable accuracy without incurring a training cost. Once skin detection has been performed, the detected skin pixels (including false positives) are encrypted, when standard AES-CFB encryption of skin pixels is shown to be preferable compared to selective encryption of a whole video frame. The scheme preserves the behavior of the subjects within the video. Hence, subsequent image processing and behavior analysis, if required, can be performed by an authorized user. The experimental results are encouraging, as they show that the average encryption time is 8.268 s and the Encryption Space Ratio (ESR) is an average 7.25% for a high definition video (1280 x 720 pixels/frame). A performance comparison in terms of Correct Detection Rate (CDR) showed an average 91.5% for CTB-based segmentation compared to using only one color space for segmentation, such as using RGB with 85.86%, HSV with 80.93% and YCbCr with an average 84.8%, which implies that the proposed method of combining color-space skin identifications has a higher ability to detect skin accurately. Security analysis confirmed that the proposed scheme could be a suitable choice for real-time surveillance applications operating on resource-constrained devices. (C) 2019 Elsevier B.V. All rights reserved.
  • 机译 通过双重融合方法对单个图像进行除雾
    • 作者:;
    • 刊名:Image and Vision Computing
    • 2020年第Feb.期
    摘要: Single image dehazing is a challenging task because of the hue and brightness distortion problems. In this paper, we propose a dual-fusion method for single image dehazing. By a segmentation method creating two divided regions, the sky and non-sky regions can be obtained. To properly optimize the transmission, a multi-region fusion method is proposed for single image smooth. An exposure fusion method is constructed by the brightness transform function to effectively remove the haze from a single image. Experimental results show that this method outperforms state-of-the-art dehazing methods in terms of both efficiency and the dehazing effect. (C) 2019 Elsevier B.V. All rights reserved.
  • 机译 集体运动:用于集体活动识别的多任务数据集
    • 作者:;
    • 刊名:Image and Vision Computing
    • 2020年第Feb.期
    摘要: Collective activity recognition is an important subtask of human action recognition, where the existing datasets are mostly limited. In this paper, we look into this issue and introduce the Collective Sports (C-Sports) dataset, which is a novel benchmark dataset for multi-task recognition of both collective activity and sports categories. Various state-of-the-art techniques are evaluated on this dataset, together with multi-task variants which demonstrate increased performance. From the experimental results, we can say that while sports categories of the videos are inferred accurately, there is still room for improvement for collective activity recognition, especially regarding the generalization ability beyond previously unseen sports categories. In order to evaluate this ability, we introduce a novel evaluation protocol called unseen sports, where the training and test are carried out on disjoint sets of sports categories. The relatively lower recognition performances in this evaluation protocol indicate that the recognition models tend to be influenced by the surrounding context, rather than focusing on the essence of the collective activities. We believe that C-Sports dataset will stir further interest in this research direction. (C) 2020 Elsevier B.V. All rights reserved.
  • 机译 解决相机姿态估计问题的快速准确的迭代方法
    • 作者:;
    • 刊名:Image and Vision Computing
    • 2020年第Feb.期
    摘要: This paper presents a fast and accurate iterative method for camera pose estimation problem. The dependence on initial values is reduced by replacing unknown angular parameters with three independent non-angular parameters. Image point coordinates are treated as observations with errors and a new model is built using a conditional adjustment with parameters for relative orientation. This model allows for the estimation of the errors in the observations. The estimated observation errors are then used iteratively to detect and eliminate gross errors in the adjustment. A total of 22 synthetic datasets and 10 real datasets are used to compare the proposed method with the traditional iterative method, the 5-point-RANSAC and the state-of-the-art 5-point-USAC methods. Preliminary results show that our proposed method is not only faster than the other methods, but also more accurate and stable. (C) 2019 Elsevier B.V. All rights reserved.
  • 机译 使用约束优势集的多特征融合图像检索
    • 作者:;
    • 刊名:Image and Vision Computing
    • 2020年第Feb.期
    摘要: Aggregating different image features for image retrieval has recently shown its effectiveness. While highly effective, though, the question of how to uplift the impact of the best features for a specific query image persists as an open computer vision problem. in this paper, we propose a computationally efficient approach to fuse several hand-crafted and deep features, based on the probabilistic distribution of a given membership score of a constrained cluster in an unsupervised manner. First, we introduce an incremental nearest neighbor (NN) selection method, whereby we dynamically select k-NN to the query. We then build several graphs from the obtained NN sets and employ constrained dominant sets (CDS) on each graph G to assign edge weights which consider the intrinsic manifold structure of the graph, and detect false matches to the query. Finally, we elaborate the computation of feature positive-impact weight (PIW) based on the dispersive degree of the characteristics vector. To this end, we exploit the entropy of a cluster membership-score distribution. In addition, the final NN set bypasses a heuristic voting scheme. Experiments on several retrieval benchmark datasets show that our method can improve the state-of-the-art result. (C) 2019 Elsevier B.V. All rights reserved.
  • 机译 使用顺序轨迹先验在线最大地跟踪多个对象的后验
    • 作者:;
    • 刊名:Image and Vision Computing
    • 2020年第Feb.期
    摘要: In this paper, we address the problem of online multi-object tracking based on the Maximum a Posteriori (MAP) framework. Given the observations up to the current frame, we estimate the optimal object trajectories via two MAP estimation stages: object detection and data association. By introducing the sequential trajectory prior. i.e., the prior information from previous frames about "good" trajectories. into the two MAP stages, the inference of optimal detections is refined and the association correctness between trajectories and detections is enhanced. Furthermore, the sequential trajectory prior allows the two MAP stages to interact with each other in a sequential manner, which jointly optimizes the detections and trajectories to facilitate online multi-object tracking. Compared with existing methods, our approach is able to alleviate the association ambiguity caused by noisy detections and frequent inter-object interactions without using sophisticated association likelihood models. The experiments on publicly available challenging datasets demonstrate that our approach provides superior tracking performance over state-of-the-art algorithms in various complex scenes. (C) 2019 Elsevier B.V. All rights reserved.
  • 机译 耦合生成对抗网络用于异质人脸识别
    • 作者:;
    • 刊名:Image and Vision Computing
    • 2020年第Feb.期
    摘要: The large modality gap between faces captured in different spectra makes heterogeneous face recognition (HFR) a challenging problem. In this paper, we present a coupled generative adversarial network (CpGAN) to address the problem of matching non-visible facial imagery against a gallery of visible faces. Our CpGAN architecture consists of two sub-networks one dedicated to the visible spectrum and the other sub-network dedicated to the non-visible spectrum. Each sub-network consists of a generative adversarial network (GAN) architecture. Inspired by a dense network which is capable of maximizing the information flow among features at different levels, we utilize a densely connected encoder-decoder structure as the generator in each GAN sub-network. The proposed CpGAN framework uses multiple loss functions to force the features from each sub-network to be as close as possible for the same identities in a common latent subspace. To achieve a realistic photo reconstruction while preserving the discriminative information, we also added a perceptual loss function to the coupling loss function. An ablation study is performed to show the effectiveness of different loss functions in optimizing the proposed method. Moreover, the superiority of the model compared to the state-of-the-art models in HFR is demonstrated using multiple datasets. (C) 2019 Elsevier B.V. All rights reserved.
  • 机译 使用基于FC-Dense u-net的深度表示聚类和基于多维特征融合的区域合并进行自下而上的无监督图像分割
    • 作者:;
    • 刊名:Image and Vision Computing
    • 2020年第Feb.期
    摘要: Recent advances in system resources provide ease in the applicability of deep learning approaches in computer vision. In this paper, we propose a deep learning-based unsupervised image segmentation approach for natural image segmentation. Image segmentation aims to transform an image into regions, representing various objects in the image. Our method consists of a fully convolutional dense network-based unsupervised deep representation oriented clustering, followed by shallow features based high-dimensional region merging to produce the final segmented image. We evaluate our proposed approach on the BSD300 database and perform a comparison with several classical and some recent deep learning-based unsupervised segmentation methods. The experimental results represent that the proposed method is comparable and confirm the efficacy of the proposed approach.
  • 机译 学习可靠的空间和空间变化正则化相关滤波器以进行视觉跟踪
    • 作者:;
    • 刊名:Image and Vision Computing
    • 2020年第Feb.期
    摘要: Single-object tracking is a significant and challenging computer vision problem. Recently, discriminative correlation filters (DCF) have shown excellent performance. But there is a theoretical defects that the boundary effect, caused by the periodic assumption of training samples, greatly limit the tracking performance. Spatially regularized DCF (SRDCF) introduces a spatial regularization to penalize the filter coefficients depending on their spatial location, which improves the tracking performance a lot. However, this simple regularization strategy implements unequal penalties for the target area filter coefficients, which makes the filter learn a distorted object appearance model. In this paper, a novel spatial regularization strategy is proposed, utilizing a reliability map to approximate the target area and to keep the penalty coefficients of relevant region consistent. Besides, we introduce a spatial variation regularization component that the second-order difference of the filter, which smooths changes of filter coefficients to prevent the filter over-fitting current frame. Furthermore, an efficient optimization algorithm called alternating direction method of multipliers (ADMM) is developed. Comprehensive experiments are performed on three benchmark datasets: OTB-2013, OTB-2015 and TempleColor-128, and our algorithm achieves a more favorable performance than several state-of-the-art methods. Compared with SRDCF, our approach obtains an absolute gain of 6.6% and 5.1% in mean distance precision on OTB-2013 and OTB-2015, respectively. Our approach runs in real-time on a CPU. (C) 2020 Elsevier B.V. All rights reserved.
  • 联系方式:010-58892860转803 (工作时间) 18141920177 (微信同号)
  • 客服邮箱:kefu@zhangqiaokeyan.com
  • 京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-1 六维联合信息科技(北京)有限公司©版权所有
  • 客服微信
  • 服务号