Here you will find all publications created and published as a result of the MediaVerse project. 
The most recent will be on top. 

A click on the headlines will reveal the author(s) and an abstract.

MemeFier: Dual-stage Modality Fusion for Image Meme Classification

April 2023, Link

Christos Koutlis, Manos Schinas, Symeon Papadopoulos


Hate speech is a societal problem that has significantly grown through the Internet. New forms of digital content such as image memes have given rise to spread of hate using multimodal means, being far more difficult to analyse and detect compared to the unimodal case. Accurate automatic processing, analysis and understanding of this kind of content will facilitate the endeavor of hindering hate speech proliferation through the digital world. To this end, we propose MemeFier, a deep learning-based architecture for fine-grained classification of Internet image memes, utilizing a dual-stage modality fusion module. The first fusion stage produces feature vectors containing modality alignment information that captures non-trivial connections between the text and image of a meme. The second fusion stage leverages the power of a Transformer encoder to learn inter-modality correlations at the token level and yield an informative representation. Additionally, we consider external knowledge as an additional input, and background image caption supervision as a regularizing component. Extensive experiments on three widely adopted benchmarks, i.e., Facebook Hateful Memes, Memotion7k and MultiOFF, indicate that our approach competes and in some cases surpasses state-of-the-art. Our code is available on this URL.

The visible subtitler: Blockchain technology towards right management and minting

February 2023, Link

Pilar Orero, Anna Fernandez Torner, Estella Oncins


Background: Subtitles are produced through different workflows and technologies: from fully automatic to human in open source web editors or in-house platforms, and increasingly through hybrid human-machine interaction. There is little agreement regarding subtitle copyright beyond the understanding that it is a derivative work. While same language verbatim subtitles may have little room for creativity, interlingual subtitling is heavily dependent on the subtitler skills to translate, prioritise, and condense information. These days creative subtitles are increasingly being used as one more aesthetic element in audiovisual narrative. Though they may be in the same language, the visual attributes that contribute to the narrative development make creative subtitles one more element that should be acknowledged and copyright protected.
Methods: The paper will present a short introduction to subtitling copyright. It will then describe centralised and decentralised copyright management — where blockchain technology can be applied to aid subtitler identification. A focus group with expert professional subtitlers was organised, and feedback is reported.
Conclusions: Subtitle copyright is country dependent, still subtitling working practices and media asset distribution have no geographical borders. Blockchain technology -as a concept- could aid subtitle traceability. This can be useful beyond financial and moral right management and work towards media sustainability, allowing for reuse and repurpose of existing media assets.

MemeTector: Enforcing deep focus for meme detection

January 2023, Link

Christos KoutlisManos SchinasSymeon Papadopoulos


Image memes and specifically their widely-known variation image macros, is a special new media type that combines text with images and is used in social media to playfully or subtly express humour, irony, sarcasm and even hate. It is important to accurately retrieve image memes from social media to better capture the cultural and social aspects of online phenomena and detect potential issues (hate-speech, disinformation). Essentially, the background image of an image macro is a regular image easily recognized as such by humans but cumbersome for the machine to do so due to feature map similarity with the complete image macro. Hence, accumulating suitable feature maps in such cases can lead to deep understanding of the notion of image memes. To this end, we propose a methodology that utilizes the visual part of image memes as instances of the regular image class and the initial image memes as instances of the image meme class to force the model to concentrate on the critical parts that characterize an image meme. Additionally, we employ a trainable attention mechanism on top of a standard ViT architecture to enhance the model’s ability to focus on these critical parts and make the predictions interpretable. Several training and test scenarios involving web-scraped regular images of controlled text presence are considered in terms of model robustness and accuracy. The findings indicate that light visual part utilization combined with sufficient text presence during training provides the best and most robust model, surpassing state of the art. Source code and dataset are available here.

Leveraging Large-scale Multimedia Datasets to Refine Content Moderation Models

December 2022 (pre-print), Link

Ioannis Sarridis, Christos Koutlis, Olga Papadopoulou, Symeon Papadopoulos


The sheer volume of online user-generated content has rendered content moderation technologies essential in order to protect digital platform audiences from content that may cause anxiety, worry, or concern. Despite the efforts towards developing automated solutions to tackle this problem, creating accurate models remains challenging due to the lack of adequate task-specific training data. The fact that manually annotating such data is a highly demanding procedure that could severely affect the annotators’ emotional well-being is directly related to the latter limitation. In this paper, we propose the CM-Refinery framework that leverages large-scale multimedia datasets to automatically extend initial training datasets with hard examples that can refine content moderation models, while significantly reducing the involvement of human annotators. We apply our method on two model adaptation strategies designed with respect to the different challenges observed while collecting data, i.e. lack of (i) task-specific negative data or (ii) both positive and negative data. Additionally, we introduce a diversity criterion applied to the data collection process that further enhances the generalization performance of the refined models. The proposed method is evaluated on the Not Safe for Work (NSFW) and disturbing content detection tasks on benchmark datasets achieving 1.32% and 1.94% accuracy improvements compared to the state of the art, respectively. Finally, it significantly reduces human involvement, as 92.54% of data are automatically annotated in case of disturbing content while no human intervention is required for the NSFW task.

COVID-Related Misinformation Migration to BitChute and Odysee

November 2022, Link

Olga Papadopoulou, Evangelia Kartsounidou, Symeon Papadopoulos


The overwhelming amount of information and misinformation on social media platforms has created a new role that these platforms are inclined to take on, that of the Internet custodian. Mainstream platforms, such as Facebook, Twitter and YouTube, are under tremendous public and political pressure to combat disinformation and remove harmful content. Meanwhile, smaller platforms, such as BitChute and Odysee, have emerged and provide fertile ground for disinformation as a result of their low content-moderation policy. In this study, we analyze the phenomenon of removed content migration from YouTube to BitChute and Odysee. In particular, starting from a list of COVID-related videos removed from YouTube due to violating its misinformation policy, we find that ∼15% (1114 videos) of them migrated to the two low content-moderation platforms under study. This amounts to 4096 videos on BitChute and 1810 on Odysee. We present an analysis of this video dataset, revealing characteristics of misinformation dissemination similar to those on YouTube and other mainstream social media platforms. The BitChute–Odysee COVID-related dataset is publicly available for research purposes on misinformation analysis.

A Multi-Stream Fusion Network for Image Splicing Localization

November 2022, Link

Maria Siopi, Giorgos Kordopatis-Zilos, Polychronis Charitidis, Ioannis Kompatsiaris, Symeon Papadopoulos


Images have long been considered reliable evidence when corroborating facts. However, the latest advancements in the field of image editing and the wide availability of easy-to-use software create very big risks of image tampering by malicious actors.

In this paper, we address the problem of image splicing localization with a multi-stream network architecture that processes the raw RGB image in parallel with other handcrafted forensic signals.

DnS: Distill-and-Select for Efficient and Accurate Video Indexing and Retrieval

August 2022 (pre-print), Link

Giorgos Kordopatis-ZilosChristos TzelepisSymeon PapadopoulosIoannis KompatsiarisIoannis Patras


In this paper, we address the problem of high performance and computationally efficient content-based video retrieval in large-scale datasets. Current methods typically propose either: (i) fine-grained approaches employing spatio-temporal representations and similarity calculations, achieving high performance at a high computational cost or (ii) coarse-grained approaches representing/indexing videos as global vectors, where the spatio-temporal structure is lost, providing low performance but also having low computational cost. In this work, we propose a Knowledge Distillation framework, which we call Distill-and-Select (DnS), that starting from a well-performing fine-grained Teacher Network learns: a) Student Networks at different retrieval performance and computational efficiency trade-offs and b) a Selection Network that at test time rapidly directs samples to the appropriate student to maintain both high retrieval performance and high computational efficiency. We train several students with different architectures and arrive at different trade-offs of performance and efficiency, i.e., speed and storage requirements, including fine-grained students that store index videos using binary representations. Importantly, the proposed scheme allows Knowledge Distillation in large, unlabelled datasets — this leads to good students. We evaluate DnS on five public datasets on three different video retrieval tasks and demonstrate a) that our students achieve state-of-the-art performance in several cases and b) that our DnS framework provides an excellent trade-off between retrieval performance, computational speed, and storage space. In specific configurations, our method achieves similar mAP with the teacher but is 20 times faster and requires 240 times less storage space. Our collected dataset and implementation are publicly available: this https URL.

MediaVerse Standard contribution (ISO/CD 6273) promoted in Japanese Magazine on Accessibility

July 2022, Link

Pilar Orero


Based on results from UAB’s pilot acvtivities, Pilar Orero contributed to standardisation with ISO reporting on the activities of WD 6273, in which they are defining guidelines for assistive products for persons with impaired sensory functions.

Her findings were published in the magazine Incl (which comes from ‘Inclusion’) which is published 6 times a year and distributed between 300-400 people/companies. 

InDistill: Transferring Knowledge From Pruned Intermediate Layers

May 2022 (pre-print), Link

Ioannis SarridisChristos KoutlisSymeon PapadopoulosIoannis Kompatsiaris


Deploying deep neural networks on hardware with limited resources, such as smartphones and drones, constitutes a great challenge due to their computational complexity. Knowledge distillation approaches aim at transferring knowledge from a large model to a lightweight one, also known as teacher and student respectively, while distilling the knowledge from intermediate layers provides an additional supervision to that task. The capacity gap between the models, the information encoding that collapses its architectural alignment, and the absence of appropriate learning schemes for transferring multiple layers restrict the performance of existing methods. In this paper, we propose a novel method, termed InDistill, that can drastically improve the performance of existing single-layer knowledge distillation methods by leveraging the properties of channel pruning to both reduce the capacity gap between the models and retain the architectural alignment. Furthermore, we propose a curriculum learning based scheme for enhancing the effectiveness of transferring knowledge from multiple intermediate layers. The proposed method surpasses state-of-the-art performance on three benchmark image datasets.

Culture meets immersive environments: a new media landscape across Europe

October 2021, Link

Marta Brescia-Zapata


The traditional media landscape is in the middle of a monumental shift: the new prosumer profile, the need for faster and more efficient communication, and the search for more user-driven and accessible multimedia experiences. New technologies (and more specifically, immersive environments) can provide great opportunities in the entertainment sector, and also in communication, learning, arts and culture. These technologies are gaining popularity due to the COVID-19 crisis as they enable interactive, hyper-personalised and engaging experiences anytime and anywhere. The EU-funded projects TRACTION (870610) and MEDIAVERSE (957252) are embracing new technologies in order to establish an effective participatory production workflow and are exploring novel audio-visual art representation formats. TRACTION will provide a bridge between opera professionals and specific communities at risk of exclusion based on trials, understood as experimental attempts at fostering an effective community dialogue between diverse individuals at risk of exclusion in three different situations, across three countries: Ireland, Portugal and Spain. MEDIAVERSE will enable the creation of a decentralised network of intelligent, automated, and accessible services, tools, and authoring platforms for digital asset management; legal and monetisable discovery and distribution of verified content, and barrier-free usage and integration in target media and platforms.


Marta Brescia-Zapata. Culture meets immersive environments: a new media landscape across Europe. In: Avanca Cinema 2021. pp. 1029-1033.

A Graph Diffusion Scheme for Decentralized Content Search based on Personalized PageRank

April 2022, Link

Nikolaos Giatsoglou, Emmanouil Krasanakis, Symeon Papadopoulos, Ioannis Kompatsiaris


Decentralization is emerging as a key feature of the future Internet. However, effective algorithms for search are missing from state-of-the-art decentralized technologies, such as distributed hash tables and blockchain. This is surprising, since decentralized search has been studied extensively in earlier peer-to-peer (P2P) literature. In this work, we adopt a fresh outlook for decentralized search in P2P networks that is inspired by advancements in dense information retrieval and graph signal processing. In particular, we generate latent representations of P2P nodes based on their stored documents and diffuse them to the rest of the network with graph filters, such as personalized PageRank. We then use the diffused representations to guide search queries towards relevant content. Our preliminary approach is successful in locating relevant documents in nearby nodes but the accuracy declines sharply with the number of stored documents, highlighting the need for more sophisticated techniques.


Giatsoglou, N., Krasanakis, E., Papadopoulos, S., Kompatsiaris, I.: A Graph Diffusion Scheme for Decentralized Content Search based on Personalized PageRank;  to be published in the Decentralized Internet, Networks, Protocols, and Systems (DINPS 2022) workshop hosted at the 42nd IEEE International Conference on Distributed Computing Systems (ICDCS 2022); cite as: arXiv:2204.12902

Leveraging Selective Prediction for Reliable Image Geolocation

March 2022, Link

Apostolos Panagiotopoulos, Giorgos Kordopatis-Zilos, and Symeon Papadopoulos


Reliable image geolocation is crucial for several applications, ranging from social media geo-tagging to media verification. State-ofthe-art geolocation methods surpass human performance on the task of geolocation estimation from images. However, no method assesses the suitability of an image for this task, which results in unreliable and erroneous estimations for images containing ambiguous or no geolocation clues. In this paper, we define the task of image localizability, i.e. suitability of an image for geolocation, and propose a selective prediction methodology to address the task. In particular, we propose two novel selection functions that leverage the output probability distributions of geolocation models to infer localizability at different scales. Our selection functions are benchmarked against the most widely used selective prediction baselines, outperforming them in all cases. By abstaining from predicting non-localizable images, we improve geolocation accuracy from 27.8% to 70.5% at the city-scale, and thus make current geolocation models reliable for real-world applications.


Panagiotopoulos, A., Kordopatis-Zilos, G., Papadopoulos, S. (2022). Leveraging Selective Prediction for Reliable Image Geolocation. In: , et al. MultiMedia Modeling. MMM 2022. Lecture Notes in Computer Science, vol 13142. Springer, Cham.

Making Media Accessible to All

November 2021, Link

Andy Quested, Pradipta Biswas, Masahito Kawamori, Pilar Orero


Selecting an accessible media service is often a binary option – either on or off, where one option is supplied to all no matter the degree or need. Audience requirements are very different and range from 100% loss of a sense to occasional need for assistance. Furthermore, accessible media services continue to only address sight and sound assistance, which does not help participation for those with reduce motor functions or with understanding or learning difficulties – often more than one condition is present leading to co-morbidity.

Developers need to understand and incorporate the wide range of requirements for people with a range of abilities. A ‘one-size-fits-all’ approach can be the easiest option to implement, rather than developing different options for the same website, application or audiovisual media, for people with a range of abilities. Solutions are often not scalable when applied to platforms with a range of accessibility options.

The role of the ITU Audio Visual Accessibility Group is to investigate and suggest options and solutions that can be applied to any form of media no matter how produced, distributed, or reproduced.


Quested, A., Biswas, P., Kawamori, M., Orero, P.: Making Media Accessible to All. In: Proceedings of “Shape the Future: Research and Development Questions in Digital Accessibility”. Online Research Symposium, 10 November 2021

Testing Times: Introduction

October 2021, Link

Pilar Orero, David Hernández Falagán


That COVID-19 touched all walks of life is an understatement. With the risk of sounding frivolous, compared with other impacts, COVID-19 had direct implications in research, and particularly in funded research activities with a strict schedule. Luckily, in the field of audiovisual translation we do not require any live samples or animals to be fed while in lockdown. Still, experimental programmed tests with people required alternative approaches. This special issue presents the social distancing challenges faced in user-centric research methodologies when human interaction is required.


Orero, P. & D. Hernández Falagán (2021). Testing Times: Introduction. Journal of Audiovisual Translation, 4(2), 1–4.

Translation, Accessibility and Minorities

September 2021, Paper form

Pilar Orero


Audiovisual media in the 21st century is ubiquitous. Audiovisual media is technology driven, and its development elicits exponential complexity at all stages: creation, distribution, and how can be consumed. Having access to media content in all devices, and formats has been for years now a Human Right, since we live in the Information Society. The study of translation in this technology driven society needs to be approached from a complex multidisciplinary collaboration since it is difficult to separate the imbedded media content from its technological creation tools. The discipline studying communication for all citizens in this new audiovisual media access context is Media Accessibility. The chapter first proposes a new hierarchy for Translation Studies in the Society Information era, where the consumer is at the center of research, hence user centric approaches are the methodological context. The second part focuses on accessibility in general and media accessibility in particular. The chapter finishes revisiting the concept of minorities, beyond languages, cultures, and people, since artificial intelligence will soon offer a new research context from where society may be reorganized — fairly or not.


Orero, Pilar (2021) Translation, Accessibility and Minorities. In E. Bielsa (ed.) The Routledge Handbook of Translation and Media. London: Taylor Francis. pp. 384-399.

Audio Description Personalisation

September 2021, Paper format

Pilar Orero


Technology is opening new and fascinating opportunities along the audio description workflow: from production to consumption. For years now academics have been looking at alternative ways of offering audio descriptions aimed at meeting end user needs and expectations. In the past, technology didn’t allow for personalization beyond the sound volume or colour contrast. The shift from analog to digital opened the possibility to enjoy simultaneously different audio description styles and languages (Orero et al 2014). Nowadays most audio description components may be altered with a view to achieving a higher level of interaction with the audience, the venue requirements, the media genres and the different age groups or cultural background of users (Mazur and Chmiel 2016). This personalization was studied in the Pear Tree Stories (Orero 2008) from a narratological point of view (Mazur and Kruger 2012). Audio description research has also focused on delivery where alternative voicing strategies have been studied (Szarkowska 2011; Caruso 2012; Fernández-Torné & Matamala 2015). Making the content easier to understand has also been under analysis when modifying degrees of information explicitation, intonation, or speed (Cabeza-Cáceres 2013). How to make audio description easier to understand has also been studied through the impact on information recall (Orero 2012; Fresno 2014; Fresno et al 2014; Bernabé & Orero 2019). 

Technology is now enabling the delivery and user personalization of the many traits that impact audio description reception with a view to heightening its enjoyment and understanding – or vice versa? The chapter departs from the concept of personalisation and then describes the possible personalization features available today, pointing towards new research avenues, to conclude with a state of the art bibliography. The chapter has avoided technical terminology as much as possible.


Orero, Pilar (2021) Audio Description Personalisation. In Christopher Taylor and Elisa Perergo (Eds) The Routledge Handbook of Audio Description. London: Taylor Francis, pp. 121-134.

Let's put standardisation in practice: accessibility services and interaction

October 2020, Link

Estel·la Oncins and Pilar Orero


Technology is developing at a fast pace to produce new interactions, which turn into new communication barriers, some of which might be avoidable. Looking at recommendations from some accessibility standards at the design stage could solve many issues and help towards native accessible technology.

This article looks at existing standards related to accessibility and media communication. The first part of the article looks at different standardisation agencies and the need to produce harmonised standards for accessibility at IEC, ITU, ISO and W3C. The second part of the article outlines how standards are produced and implemented at a European level by the European Standardisation Organisations (CEN, CENELEC and ETSI). It then lists existing standards for each media accessibility service: subtitling, audio description, audio subtitling and sign language. Mention is made of Easy to Read as a new emerging accessibility modality. The final part of the article will provide conclusions and directions for further research.


Oncins, E., & Orero, P. (2021). Let’s put standardisation in practice: Accessibility services and interaction. Hikma, 20(1), 71-90. [pdf]

Holistic Requirements Analysis For Specifying New Systems For 3D Media Production and Promotion

July 2021, Link

Christos Mouzakis, Dimitrios Ververidis, Luis Miguel Girao, Nicolas Patz, Spiros Nikolopoulos and Ioannis Kompatsiaris


This paper presents a requirements specification analysis for driving the design of new systems that will allow 3D media creators to further promote and monetize from their work. The provided requirements analysis is based on the IEEE 830 standard for requirements specification. It allows us to elucidate system requirements through existing (AS-IS) and envisioned (TO-BE) scenarios affected by the latest trends on design methodologies and content promotion in social media. A total of 30 tools for content creation, promotion and monetization are reviewed. The target groups, i.e. creator groups, are divided in 10 types according to their role in 3D media production. Based on this division 10 candidate TO-BE scenarios have been identified and out of these 10 scenarios, we have selected 6 scenarios for validation by media creators. The validation was performed through a survey of 24 statements on a 5 Likert scale by 47 individuals from the domains of Media, Fine arts, Architecture, and Informatics. Useful evaluation results and comments have been collected that can be useful for future systems design.


Mouzakis, C., Ververidis, D., Girao, L. M., Patz, N., Nikolopoulos, S., & Kompatsiaris, I. (2021). Holistic Requirements Analysis for Specifying New Systems for 3D Media Production and Promotion. Sustainability, 13(15), 8155. doi:10.3390/su13158155 [pdf]

Immersive captioning : developing a framework for evaluating user needs

December 2020, Link

Hughes, C.J. Zapata, M.B.Johnston, M. and Orero, P.


This article focuses on captioning for immersive environments and the research aims to identify how to display them for an optimal viewing experience. This work began four years ago with some partial findings. This second stage of research, built from the lessons learnt, focuses on the design requirements cornerstone: prototyping. A tool has been developed towards quick and realistic prototyping and testing. The framework integrates methods used in existing solutions. Given how easy it is to contrast and compare, the need to further the first framework was obvious. A second improved solution was developed, almost as a showcase on how ideas can quickly be implemented for user testing. After an overview on captions in immersive environments, the article describes its implementation, based on web technologies opening for any device with a web browser. This includes desktop computers, mobile devices and head mounted displays. The article finishes with a description of the new caption modes leading to improved methods, hoping to be a useful tool towards testing and standardisation.


Hughes, CJ, Zapata, MB, Johnston, M and Orero, P 2020, Immersive captioning : developing a framework for evaluating user needs , in: IEEE AIVR 2020 : 3rd International Conference on Artificial Intelligence & Virtual Reality 2020, 14th-18th December 2020, Online.

Evaluating subtitle readability in media immersive environments

02 December 2020, Link

Pilar Orero, Marta Brescia-Zapata, Chris Hughes


The advances in VR technology have led to immersive videos rapidly gaining popularity. Accessibility to immersive media should be offered and subtitles are the most popular accessibility service. Research on subtitle readability has led to guidelines and standards (W3C, ISO/IEC/ITU 20071-23:2018). More research into subtitle presentation modes in 360º is needed in order to move towards understanding optimum readability. Evaluating readability for subtitles in immersive media environments requires a flexible and user-friendly framework for both creating the subtitles and presenting the generated subtitle file in a fully functional immersive video player, in order to understand the final view in the environment and assess its quality. This article starts by looking at the readability recommendations in W3C and ISO/IEC/ITU. The second part will describe the new features required in immersive subtitle presentations. The final section will describe the new web-based framework that allows the generation of immersive subtitles where readability may be tested. The framework has adopted a contrast and comparison approach towards instant readability evaluation.


Orero, P., Brescia-Zapata, M., Hughes, C. (2020). Evaluating subtitle readability in media immersive environments. In: DSAI 2020: 9th International Conference on Software Development and Technologies for Enhancing Accessibility and Fighting Info-exclusion Pages 51–54.

Easy to Read Standardisation: Some steps towards an international standard

December 2020, Link

Pilar Orero, Clara Delgado, Anna Matamala


Reading is a means to make Human Rights effective, mainly those related to full participation in society under equal conditions. Literacy is not natural but acquired, and it depends on many factors from personal capabilities to access to education from a geographical or financial point of view. Even in more developed countries where education is compulsory until adolescence, a growing number of children do not fully develop their reading skills. This fact makes reading a universal barrier towards equal opportunities. Learning to read is one solution to the problem, on the other hand, generating texts which are easier to read may also help. This article presents the Easy to Read existing standards, and describes some further standardisation requirements such as terminology, intended audience, workflows, formats, and languages that should be taken into consideration towards a 21st century Easy to Read recommendation.


Orero, P., Delgado, C., Matamala, A.: Easy to Read Standardisation: Some steps towards an international standard. In: DSAI 2020: 9th International Conference on Software Development and Technologies for Enhancing Accessibility and Fighting Info-exclusion (Pages 44–46.