EXPERT SYSTEMS@UPT 2022: Echipa AVA

Restoring and attributing ancient texts

using deep neural networks

Introduction

The research from the article shows how models such as Ithaca can unlock the cooperative potential between artificial intelligence and historians, impacting the way that we study and write about one of the most important periods in human history.

Here we present Ithaca, a deep neural network for the textual restoration, geographical attribution and chronological attribution of ancient Greek inscriptions. Ithaca is designed to assist and expand the historian’s workflow.

A map of a city

Description automatically generated with low confidence

Fig. 1 | Restoration of a damaged inscription. This inscription (Inscriptiones Graecae, volume 1, edition 3, document 4, face B (IG I3 4B)) records a decree concerning the Acropolis of Athens and dates to 485/4 bc. Marsyas, Epigraphic Museum, WikiMedia CC BY 2.5.

While Ithaca alone achieves 62% accuracy when restoring damaged texts, the use of Ithaca by historians improved their accuracy from 25% to 72%, confirming the synergistic effect of this research tool. Ithaca can attribute inscriptions to their original location with an accuracy of 71% and can date them to less than 30 years of their ground-truth ranges, redating key texts of Classical Athens and contributing to topical debates in ancient history.

Problem (Why?)

Ancient history relies on disciplines such as the study of inscribed texts known as inscriptions - epigraphy - for evidence of the thought, language, society and history of past civilizations. However, over the centuries, many inscriptions have been damaged to the point of illegibility, transported far from their original location and their date of writing is steeped in uncertainty.

Specialist epigraphers must then reconstruct the missing text, a process known as text restoration, and establish the original place and date of writing, tasks known as geographical attribution and chronological attribution, respectively. These three tasks are crucial steps towards placing an inscription both in history and within the world of the people who wrote and read it.

These tasks are non-trivial, and traditional methods in epigraphy involve highly complex, time-consuming and specialized workflows. When restoring damaged inscriptions, epigraphers rely on accessing vast repositories of information to find textual and contextual parallels. These repositories primarily consist of a researcher’s mnemonic repertoire of parallels and, more recently, of digital corpora for performing ‘string matching’ searches. However, differences in the search query can exclude or obfuscate relevant results, and it is almost impossible to estimate the true probability distribution of possible restorations. Attributing an inscription is equally problematic—if it was moved, or if useful internal dating elements are missing, historians must find alternative criteria to attribute the place and date of writing (such as letterforms, dialects). Inevitably, a high level of generalization is often involved (chronological attribution intervals can be very long)

Proposed Solution(What?)

Deep learning for epigraphy

Here we overcome the constraints of current epigraphic methods by using state-of-the-art machine learning research. Inspired by biological neural networks, deep neural networks can discover and harness intricate statistical patterns in vast quantities of data. This choice was due to two main reasons:

· First, the variability of contents and context of the Greek epigraphic record, which makes it an excellent challenge for language processing;

· and second, the availability of digitized corpora for ancient Greek, an essential resource for training machine learning models.

Methodology(How?)

By developing a pipeline to retrieve the unprocessed Packard Humanities Institute (PHI)dataset, which consists of the transcribed texts of 178,551 inscriptions. This process required rendering the text machine-actionable, normalizing epigraphic notations, reducing noise and efficiently handling all irregularities.

Each PHI inscription is assigned a unique numerical ID, and is labelled with metadata relating to the place and time of writing.

PHI lists a total of 84 ancient regions; whereas the chronological information is noted in a wide variety of formats, varying from historical eras to precise year intervals, written in several languages, lacking in standardized notation and often using fuzzy wording

After crafting an extended ruleset to process and filter the data (Methods), the resulting dataset I.PHI is to our knowledge the largest multitask dataset of machine-actionable epigraphical text, containing 78,608 inscriptions. Ithaca is a model for epigraphic tasks

To begin, contextual information is captured more comprehensively by representing the inputs as words; however, parts of words could have been lost over the centuries. To address this challenge, we process the input text as character and word representations jointly, representing damaged, missing or unknown words with a special symbol ‘[unk]’.

Next, to enable large-scale processing, Ithaca’s torso is based on a neural network architecture called the transformer22, which uses an attention mechanism to weigh the influence of different parts of the input (such as characters, words) on the model’s decision-making process. The attention mechanism is informed of the position of each part of the input text by concatenating the input character and word representations with their sequential positional information

Graphical user interface

Description automatically generated

In the example shown in Fig. 2, the restoration head predicts the three missing characters; the geographical attribution head classifies the inscription among 84 regions; and the chronological attribution head dates it to between 800 bc and ad 800. Interpreting the outputs

Our intention was to maximize the collaborative potential between historians and deep learning. Ithaca’s architecture was therefore designed to provide intelligible outputs, while featuring multiple visualization methods to augment the interpretability of the model’s predictive hypotheses.

For the task of restoration, instead of providing historians with a single restoration hypothesis, Ithaca offers a set of the top 20 decoded predictions ranked by probability (Fig. 3a). This first visualization facilitates the pairing of Ithaca’s suggestions with historians’ contextual knowledge, therefore assisting human decision-making. This is complemented by saliency maps, a method used to identify which unique input features contributed the most to the model’s predictions, for both the restoration and attribution tasks (Fig. 3d and Extended Data Fig. 5a). For the geographical attribution task, Ithaca classifies the input text among 84 regions, and the ranked list of possible region predictions is visually implemented with both a map and a bar chart (Fig. 3b). Finally, to expand interpretability for the chronological attribution task, instead of outputting a single date value, we predict a categorical distribution over dates (Fig. 3c). By so doing, Ithaca can handle ground-truth labels more effectively, as the labels correspond to date intervals. More precisely, Ithaca discretizes all dates between 800 bc and ad 800 into 10-year bins, resulting in 160 decades. For example, the date range 300–250 bc is represented as 5 decades of equal 20% probability, whereas an inscription dated to 305 bc would be assigned to the single-decade-bin 300–310 bc with 100% probability.

Experimental evaluation

To compare performance in the three epigraphic tasks, we use four methods:

· First, we evaluate the difficulty of the restoration task by assigning two evaluators with epigraphical expertise (‘ancient historian’) a set of damaged inscriptions to restore, using the training set to search for textual parallels.

· Second, we provide the human experts with a ranked list of Ithaca’s top 20 restoration hypotheses to inform their predictions (‘ancient historian and Ithaca’), therefore assessing the true impact of our work as a cooperative research aid.

· Third, as a computational baseline we reimplement our previous work Pythia15— a sequence-to-sequence recurrent neural network for the task of ancient-text restoration.

· Finally, for the attribution tasks, we introduce an ablation of the epigrapher’s workflow, the ‘onomastics’ baseline: annotators were tasked with attributing a set of texts, exclusively using the known distribution of Greek personal names across time and space to infer geographical and chronological indicia27.

We introduce the following metrics to measure each method’s performance.

For restoration, to obviate the lack of ground truths in damaged inscriptions, we artificially hide 1 to 10 characters of undamaged input text and treat the original sequences as the target.

The first metric used is the character error rate (CER), which counts the normalized differences between the top predicted restoration sequence and the target sequence. Furthermore, we use top-k accuracy to measure whether the correct restoration or region label for geographical attribution is among the top k predictions, therefore quantifying Ithaca’s potential as an assistive tool.

As shown in Table 1, for the task of restoration, Ithaca consistently outperforms the competing methods, scoring a 26.3% CER and 61.8% top 1 accuracy. Specifically, our model achieves a 2.2× lower (that is, better) CER (cauti ce inseamna cer) compared with human experts, whereas Ithaca’s top 20 predictions achieve a 1.5× improved performance compared with Pythia, with an accuracy of 78.3%. Notably, when pairing historians with Ithaca (ancient historian and Ithaca), human experts achieve an 18.3% CER and 71.7% top 1 accuracy, therefore demonstrating a considerable 3.2× and 2.8× improvement compared with their original CER and top 1 scores. Regarding the attribution to regions, Ithaca has 70.8% top 1 and 82.1% top 3 predictive accuracy.

Finally, for chronological attribution, whereas the onomastics human baseline predictions are within an average of 144.4 and median of 94.5 years from the ground-truth date intervals, Ithaca’s predictions, based on the totality of texts, have an average distance of 29.3 years from the target dating brackets, with a median distance of only 3 years.

Table

Description automatically generated

Conclusion

Historians may now use Ithaca’s interpretability-augmenting aids (such as saliency maps) to examine these predictions further and bring more clarity to Athenian history.

Text

Description automatically generated