miercuri, 30 martie 2022

Pastiche Master: Exemplar-Based High-Resolution Portrait Style Transfer

1. Introduction

    

    Artistic portraits are popular in our daily lives and especially in industries related to comics, animations, posters,and advertising.
In this paper, we focus on exemplar-based portrait style transfer, a core problem that aims to transfer the style of an exemplar artistic portrait onto a target face.
    Recent studies on StyleGAN (A style-based generator architecture for generative adversarial networks)show high performance on  artistic portrait generation by transfer learning with limited data. In this paper, we explore more challenging exemplar-based high-resolution portrait style transfer by introducing a novel DualStyleGAN with flexible control of dual styles of the original face domain and the extended artistic portrait domain. Different from StyleGAN, DualStyleGAN provides a natural way of style transfer by characterizing the content and style of a portrait with an intrinsic style path and a new extrinsic style path, respectively. The delicately designed extrinsic style path enables our model to modulate both the color and complex structural styles hierarchically to precisely pastiche the style example



2. Related Work


    Style-GAN synthesizes high-resolution face images with hierarchical style control. StyleGAN was fine tuned on limited cartoon data, and found it promising in generating plausible cartoon faces. The original model and fine-tuned model exhibit a reasonable degree of semantic alignment, However, the alignment gets weakened along with unconditional fine-tuning without valid supervision, eventually leading to a failure in layer swappin.

By comparison, our model has an explicit extrinsic style path that can be conditionally trained to characterize the structural styles. Moreover, supervision for learning diverse styles is provided via facial destylization.


3. Portrait Style Transfer via DualStyleGAN


    DualStyleGAN can be built  based on a pre-trained StyleGAN, which can be transferred to a new domain and characterize the styles of both the original and the extended domains. Unconditional fine-tuning translates the StyleGAN generative space as a whole, leading to the loss of diversity of the captured style. Our key idea is to seek valid supervision to learn diverse styles and to explicitly model the two kinds of styles with two individual style paths, We train DualStyleGAN with a principled progressive strategy for robust conditional fine-tuning.

    We train DualStyleGAN with a principled progressive strategy for

robust conditional fine-tuning, since the two domains might have a large appearance discrepancy, the challenge to balance between face realism and fidelity to the portrait appears. A possible solution for this problem could be multi-stage destylization:

  • Stage I: Latent initialization. The artistic portrait S is first embedded   into the StyleGAN latent space by an encoder E
  • Stage II: Latent optimization. In a face image is stylized by optimizing a latent code of g to reconstruct this image and applying this code to a fine-tuned model g′
  • Stage III: Image embedding. The result has reasonable facial structures, providing valid supervision on how to deform and abstract the facial structures to imitate S


3.3. Progressive Fine-Tuning

  
 3.3. Progressive Fine-Tuning

    A progressive fine-tuning scheme is used to smoothly transform the generative space of DualStyleGAN towards the target domain. The scheme borrows the idea of curriculum learning to gradually increase the task difficulty in three stages:
  • Stage I: Color transfer on source domain
  • Stage II: Structure transfer on source domain.
  • Stage III: Style transfer on target domain.


4. Conclusion

    
    We extend StyleGAN to accept style condition from new domains while preserving its style control in the original domain. This results in an interesting application of high-resolution exemplar-based portrait style transfer with a friendly data requirement. DualStyleGAN, with an additional style path to StyleGAN, can effectively model and modulate the intrinsic and extrinsic styles for flexible and diverse artistic portrait generation. We show that valid transfer learning on DualStyleGAN can be achieved with
a special architecture design and progressive training strategy. We believe our idea of model extension in terms of both architecture and data can be potentially applied to other tasks such as more general image-to-image translation and knowledge distillation.

luni, 28 martie 2022

Unveiling COVID-19 from Chest X-ray with deep learning: a hurdles race with small data


Introduction:

COVID-19 virus has rapidly spread in mainland China and into multiple countries worldwide . Early diagnosis is a key element for proper treatment of the patients and prevention of the spread of the disease. Given the high tropism of COVID-19 for respiratory airways and lung epithelium, identification of lung involvement in infected patients can be relevant for treatment and monitoring of the disease.


Fig. 1: Example Chest X-Ray images of: (a) non-COVID19 infection, and (b) COVID19 viral infection


 

Figure 2: Example Chest X-Ray images from the dataset, which comprises of 13,975 Chest X-Ray images across 13,870 patient cases from five open access data repositories: (a) COVID-19 Image Data Collection, (b) COVID-19 Chest X-Ray Dataset Initiative, (c) RSNA Pneumonia Detection challenge dataset, (d) ActualMed COVID-19 Chest X-Ray Dataset Initiative, and (e) COVID-19 radiography database.

 

Deep Learning for chest x- Ray

Deep Convolutional Networks (DCNNs) are being constructed to analyze chest images and diagnose common thorax diseases and differentiate between viral pneumonia and non-viral pneumonia. While many common viruses can cause pneumonia, the ones with viral pneumonia cause substantial differences in X-Ray images. Which means that every case of viral pneumonia will contain variable visual appearances. Moreover, finding a dataset with positive samples poses another problem. Therefore, it is crucial to develop a model which can overcome these pathological abnormalities and detect the virus with high accuracy.


Fig. 3: Original image (a) and extracted lung segmented image

 

How does it work?

CNN has the ability to learn automatically from domain-specific images and hence differentiates itself from classical machine learning methods. Different strategies can be implemented to train CNN architecture to acquire the desired accuracy and results. In this paper, we have used a similar model of deep convolutional neural network for the analysis of chest X-Rays. The collection of medical data and reports is a difficult task. So, the dataset used is a combination of five open-source datasets.

 

Conclusion

It was proposed a deep convolutional neural network designed specifically for the detection of COVID-19 cases by implementing computer vision and image analysis on Chest X-Ray images gathered from five open access data repositories. The experimental results show that the proposed model had the best performance accuracy on the validation set. Further, it was investigated and applied different model parameters in order to gain deeper insights on the Chest X-Ray features critical for classifying Covid and non-Covid patients which can aid clinicians in improved screening as well as improve trust and transparency.


Resources:

https://arxiv.org/ftp/arxiv/papers/2201/2201.09952.pdf

https://arxiv.org/pdf/2004.05405v1.pdf

Echipa AVA

 

Restoring and attributing ancient texts

using deep neural networks

 

Introduction

The research from the article shows how models such as Ithaca can unlock the cooperative potential between artificial intelligence and historians, impacting the way that we study and write about one of the most important periods in human history.

 

Here we present Ithaca, a deep neural network for the textual restoration, geographical attribution and chronological attribution of ancient Greek inscriptions. Ithaca is designed to assist and expand the historian’s workflow.

 

A map of a city

Description automatically generated with low confidence

Fig. 1 | Restoration of a damaged inscription. This inscription (Inscriptiones Graecae, volume 1, edition 3, document 4, face B (IG I3 4B)) records a decree concerning the Acropolis of Athens and dates to 485/4 bc. Marsyas, Epigraphic Museum, WikiMedia CC BY 2.5.

 

While Ithaca alone achieves 62% accuracy when restoring damaged texts, the use of Ithaca by historians improved their accuracy from 25% to 72%, confirming the synergistic effect of this research tool. Ithaca can attribute inscriptions to their original location with an accuracy of 71% and can date them to less than 30 years of their ground-truth ranges, redating key texts of Classical Athens and contributing to topical debates in ancient history.

 

 

Problem (Why?)

Ancient history relies on disciplines such as the study of inscribed texts known as inscriptions - epigraphy - for evidence of the thought, language, society and history of past civilizations. However, over the centuries, many inscriptions have been damaged to the point of illegibility, transported far from their original location and their date of writing is steeped in uncertainty.

Specialist epigraphers must then reconstruct the missing text, a process known as text restoration, and establish the original place and date of writing, tasks known as geographical attribution and chronological attribution, respectively. These three tasks are crucial steps towards placing an inscription both in history and within the world of the people who wrote and read it.

 

These tasks are non-trivial, and traditional methods in epigraphy involve highly complex, time-consuming and specialized workflows. When restoring damaged inscriptions, epigraphers rely on accessing vast repositories of information to find textual and contextual parallels. These repositories primarily consist of a researcher’s mnemonic repertoire of parallels and, more recently, of digital corpora for performing ‘string matching’ searches. However, differences in the search query can exclude or obfuscate relevant results, and it is almost impossible to estimate the true probability distribution of possible restorations. Attributing an inscription is equally problematic—if it was moved, or if useful internal dating elements are missing, historians must find alternative criteria to attribute the place and date of writing (such as letterforms, dialects). Inevitably, a high level of generalization is often involved (chronological attribution intervals can be very long)

 

 

Proposed Solution(What?)

Deep learning for epigraphy

Here we overcome the constraints of current epigraphic methods by using state-of-the-art machine learning research. Inspired by biological neural networks, deep neural networks can discover and harness intricate statistical patterns in vast quantities of data. This choice was due to two main reasons:

·        First, the variability of contents and context of the Greek epigraphic record, which makes it an excellent challenge for language processing;

·        and second, the availability of digitized corpora for ancient Greek, an essential resource for training machine learning models.

 

Methodology(How?)

By developing  a pipeline to retrieve the unprocessed Packard Humanities Institute (PHI)dataset, which consists of the transcribed texts of 178,551 inscriptions. This process required rendering the text machine-actionable, normalizing epigraphic notations, reducing noise and efficiently handling all irregularities.

Each PHI inscription is assigned a unique numerical ID, and is labelled with metadata relating to the place and time of writing.

PHI lists a total of 84 ancient regions; whereas the chronological information is noted in a wide variety of formats, varying from historical eras to precise year intervals, written in several languages, lacking in standardized notation and often using fuzzy wording

After crafting an extended ruleset to process and filter the data (Methods), the resulting dataset I.PHI is to our knowledge the largest multitask dataset of machine-actionable epigraphical text, containing 78,608 inscriptions. Ithaca is a model for epigraphic tasks

 

To begin, contextual information is captured more comprehensively by representing the inputs as words; however, parts of words could have been lost over the centuries. To address this challenge, we process the input text as character and word representations jointly, representing damaged, missing or unknown words with a special symbol ‘[unk]’.

Next, to enable large-scale processing, Ithaca’s torso is based on a neural network architecture called the transformer22, which uses an attention mechanism to weigh the influence of different parts of the input (such as characters, words) on the model’s decision-making process. The attention mechanism is informed of the position of each part of the input text by concatenating the input character and word representations with their sequential positional information

 

Graphical user interface

Description automatically generated

 In the example shown in Fig. 2, the restoration head predicts the three missing characters; the geographical attribution head classifies the inscription among 84 regions; and the chronological attribution head dates it to between 800 bc and ad 800. Interpreting the outputs

 

Our intention was to maximize the collaborative potential between historians and deep learning. Ithaca’s architecture was therefore designed to provide intelligible outputs, while featuring multiple visualization methods to augment the interpretability of the model’s predictive hypotheses.

 

For the task of restoration, instead of providing historians with a single restoration hypothesis, Ithaca offers a set of the top 20 decoded predictions ranked by probability (Fig. 3a). This first visualization facilitates the pairing of Ithaca’s suggestions with historians’ contextual knowledge, therefore assisting human decision-making. This is complemented by saliency maps, a method used to identify which unique input features contributed the most to the model’s predictions, for both the restoration and attribution tasks (Fig. 3d and Extended Data Fig. 5a). For the geographical attribution task, Ithaca classifies the input text among 84 regions, and the ranked list of possible region predictions is visually implemented with both a map and a bar chart (Fig. 3b). Finally, to expand interpretability for the chronological attribution task, instead of outputting a single date value, we predict a categorical distribution over dates (Fig. 3c). By so doing, Ithaca can handle ground-truth labels more effectively, as the labels correspond to date intervals. More precisely, Ithaca discretizes all dates between 800 bc and ad 800 into 10-year bins, resulting in 160 decades. For example, the date range 300–250 bc is represented as 5 decades of equal 20% probability, whereas an inscription dated to 305 bc would be assigned to the single-decade-bin 300–310 bc with 100% probability.

.

Experimental evaluation

To compare performance in the three epigraphic tasks, we use four methods:

·        First, we evaluate the difficulty of the restoration task by assigning two evaluators with epigraphical expertise (‘ancient historian’) a set of damaged inscriptions to restore, using the training set to search for textual parallels.

·        Second, we provide the human experts with a ranked list of Ithaca’s top 20 restoration hypotheses to inform their predictions (‘ancient historian and Ithaca’), therefore assessing the true impact of our work as a cooperative research aid.

·        Third, as a computational baseline we reimplement our previous work Pythia15— a sequence-to-sequence recurrent neural network for the task of ancient-text restoration.

·        Finally, for the attribution tasks, we introduce an ablation of the epigrapher’s workflow, the ‘onomastics’ baseline: annotators were tasked with attributing a set of texts, exclusively using the known distribution of Greek personal names across time and space to infer geographical and chronological indicia27.

 

We introduce the following metrics to measure each method’s performance.

For restoration, to obviate the lack of ground truths in damaged inscriptions, we artificially hide 1 to 10 characters of undamaged input text and treat the original sequences as the target.

 

The first metric used is the character error rate (CER), which counts the normalized differences between the top predicted restoration sequence and the target sequence. Furthermore, we use top-k accuracy to measure whether the correct restoration or region label for geographical attribution is among the top k predictions, therefore quantifying Ithaca’s potential as an assistive tool.

 

 

As shown in Table 1, for the task of restoration, Ithaca consistently outperforms the competing methods, scoring a 26.3% CER and 61.8% top 1 accuracy. Specifically, our model achieves a 2.2× lower (that is, better) CER (cauti ce inseamna cer) compared with human experts, whereas Ithaca’s top 20 predictions achieve a 1.5× improved performance compared with Pythia, with an accuracy of 78.3%. Notably, when pairing historians with Ithaca (ancient historian and Ithaca), human experts achieve an 18.3% CER and 71.7% top 1 accuracy, therefore demonstrating a considerable 3.2× and 2.8× improvement compared with their original CER and top 1 scores. Regarding the attribution to regions, Ithaca has 70.8% top 1 and 82.1% top 3 predictive accuracy.

 

Finally, for chronological attribution, whereas the onomastics human baseline predictions are within an average of 144.4 and median of 94.5 years from the ground-truth date intervals, Ithaca’s predictions, based on the totality of texts, have an average distance of 29.3 years from the target dating brackets, with a median distance of only 3 years.

 

Table

Description automatically generated

 

Conclusion

Historians may now use Ithaca’s interpretability-augmenting aids (such as saliency maps) to examine these predictions further and bring more clarity to Athenian history.

 

 

 

 

Text

Description automatically generated

Text

Description automatically generated

Bibliography:  https://www.nature.com/articles/s41586-022-04448-z.pdf

duminică, 20 martie 2022

Deep Rectangling for Image Stitching: A Learning Baseline




Problem:



Stitching together multiple images provide a wider field of view, but traditional methods of stitching suffer from irregular boundaries and distortions. Cropping the stitched image defeats the purpose of the operation by restricting the field of view, while completion methods don’t reproduce the scenery correctly. For high content fidelity, image rectangling has been used, but these methods suffer when the images contain few strait lines, as they are based on line detection for the mesh deformation.


Proposed solution:


To address the problem, a one-stage learning baseline is proposed in which we predefine a rigid target mesh and predict the initial one by using a fully CNN to estimate a content-aware mesh from a stitched image using a residual progressive regression strategy.

 


Methodology:


Feature Extractor: 

A stack of simple convolution-pooling blocks to extract high-level semantic features from the input

Mesh motion regressor: 

After feature extraction, an adaptive pooling layer is utilized to fix the resolution of feature maps. Subsequently, we design a fully convolutional structure as the mesh motion regressor to predict the horizontal and vertical motions of every vertex based on the regular mesh.

Residual progressive regression:

Estimate accurate mesh motions through a progressive manner. We warp the intermediate feature maps, improving the performance with a slight increase in the computation. Then, we design two regressors with the same structure to predict primary mesh motions and residual mesh motions, respectively.

The motivation of image rectangling is that the users are not satisfied with the irregular boundaries in stitched images. Therefore, the goal is to produce rectangular images that please most users.

Data:

As there is no proper dataset of pairs of stitched images and rectangular images, we build a deep image rectangling dataset with a wide range of irregular boundaries and scenes.

Results:

Stitched images are used for rectangling using different algorithms. The results are shown below, where the solution produces fewer distortions in rectangling results.

More cross-dataset results are displayed in the figure below, which shows the superiority of rectangling over other solutions such as cropping and completion.



The proposed learning solution is significantly better than the traditional solution in every metric on DIR-D. This remarkable improvement is attributed to the content-preserving property that can preserve both linear and non-linear structures.


 

Bibliography:

  • https://arxiv.org/pdf/2203.03831v1.pdf

Pastiche Master: Exemplar-Based High-Resolution Portrait Style Transfer

1. Introduction           Artistic portraits are popular in our daily lives and especially in industries related to comics, animations, post...