Here you will find all publications created and published as a result of the MediaVerse project.
The most recent will be on top. 

A click on the headlines will reveal the author(s) and an abstract.

Towards a Decentralized Solution for Copyrights Management in Audiovisual Translation and Media Accessibility

July 2023, Link

Iris Serrat-Roozen, Estella Oncins


With the development of new technologies in the audiovisual sector, significant changes are taking place in the way information is processed, distributed and accessed. In this regard, blockchain technology is undoubtedly at the epicentre of the technological revolution and, despite its undeniable application in different industries, it seems to remain ignored in some academic fields, particularly in Translation Studies. This technology can be used for various purposes in our field —translating data in blocks, creating a more transparent and secure workflow in the translation process, tracking translation quality— as well as to address copyright issues and to rethink the ways in which we use, reuse, distribute and monetise the content we create.

This paper addresses two key issues in the digital media industry, namely blockchain technology and intellectual property rights management, and presents an intellectual property rights (IPR) management tool developed as part of the MediaVerse project. In addition, we will analyse the results of two focus groups conducted to validate the effectiveness of this tool among audiovisual translators and media accessibility professionals. By exploring these critical issues and demonstrating the benefits of the IPR management tool, we aim to contribute to the ongoing discourse on digital media accessibility and its importance in the current media landscape.

Self-Supervised Video Similarity Learning

June 2023, Link

Giorgos Kordopatis-Zilos, Giorgos Tolias, Christos Tzelepis, Ioannis Kompatsiaris, Ioannis Patras, Symeon Papadopoulos


We introduce S^2VS, a video similarity learning approach with self-supervision. Self-Supervised Learning (SSL) is typically used to train deep models on a proxy task so as to have strong transferability on target tasks after fine-tuning. Here, in contrast to prior work, SSL is used to perform video similarity learning and address multiple retrieval and detection tasks at once with no use of labeled data. This is achieved by learning via instance-discrimination with task-tailored augmentations and the widely used InfoNCE loss together with an additional loss operating jointly on self-similarity and hard-negative similarity. We benchmark our method on tasks where video relevance is defined with varying granularity, ranging from video copies to videos depicting the same incident or event. We learn a single universal model that achieves state-of-the-art performance on all tasks, surpassing previously proposed methods that use labeled data. The code and pretrained models are publicly available at: Find more Information here.

Digital Assets Rights Management through Smart Legal Contracts and Smart Contracts

June 2023, Link

Enrico Ferro, Marco Saltarella, Domenico Rotondi, Marco Giovanelli, Giacomo Corrias, Roberto Moncada, Andrea Cavallaro, Alfredo Favenza


Intellectual property rights (IPR) management needs to evolve in a digital world where not only companies but also many independent content creators contribute to our culture with their art, music, and videos. In this respect, blockchain has recently emerged as a promising infrastructure providing a trustworthy and immutable environment that, thanks to smart contracts, may enable more agile management of digital rights and streamline royalty payments. However, no widespread consensus has been reached on the ability of this technology to adequately manage and transfer IPR. This paper presents an innovative approach to digital rights management developed within the scope of an international research endeavour co-financed by the European Commission named MediaVerse. The approach proposes the combined usage of smart legal contracts and blockchain smart contracts to take care of the legally-binding contractual aspects of IPR and, at the same time, the need for notarization, rights transfer, and royalty payments. The work conducted represents a contribution to advancing the current literature on IPR management that may lead to an improved and fairer monetization process for content creators as a means of individual empowerment.

MAAM: Media Asset Annotation and Management

June 2023, Link

Manos Schinas, Panagiotis Galopoulos, Symeon Papadopoulos


Artificial intelligence can facilitate the management of large amounts of media content and enable media organisations to extract valuable insights from their data. Although AI for media understanding has made rapid progress over the recent years, its deployment in applications and professional sectors poses challenges, especially to organizations with no AI expertise. This motivated the creation of the Media Asset Annotation and Management platform (MAAM) that employs state-of-the-art deep learning models to annotate and facilitate the management of image and video assets. Annotation models provided by MAAM include automatic captioning, object detection, action recognition and moderation models, such as NSFW and disturbing content classifiers. By annotating media assets with these models, MAAM can support easy navigation, filtering and retrieval of media assets. In addition, our platform leverages the power of deep learning to support advanced visual and multi-modal retrieval capabilities. That allows accurately identifying assets that convey a similar idea, or concept even if they are not visually identical, and support a state-of-the-art reverse search facility for images and videos.

MemeFier: Dual-stage Modality Fusion for Image Meme Classification

April 2023, Link

Christos Koutlis, Manos Schinas, Symeon Papadopoulos


Hate speech is a societal problem that has significantly grown through the Internet. New forms of digital content such as image memes have given rise to spread of hate using multimodal means, being far more difficult to analyse and detect compared to the unimodal case. Accurate automatic processing, analysis and understanding of this kind of content will facilitate the endeavor of hindering hate speech proliferation through the digital world. To this end, we propose MemeFier, a deep learning-based architecture for fine-grained classification of Internet image memes, utilizing a dual-stage modality fusion module. The first fusion stage produces feature vectors containing modality alignment information that captures non-trivial connections between the text and image of a meme. The second fusion stage leverages the power of a Transformer encoder to learn inter-modality correlations at the token level and yield an informative representation. Additionally, we consider external knowledge as an additional input, and background image caption supervision as a regularizing component. Extensive experiments on three widely adopted benchmarks, i.e., Facebook Hateful Memes, Memotion7k and MultiOFF, indicate that our approach competes and in some cases surpasses state-of-the-art. Our code is available on this URL.

Empirical Evaluation of Easy Language Recommendations: A Systematic Literature Review from Journal Research in Catalan, English, and Spanish

March 2023, Link

Mariona Gonzáles-Sordé, Anna Matamala


Easy Language is a language variety that aims to make information more comprehensible and, ultimately, more accessible. Content in this variety is written and designed following a set of recommendations that have been published in different guidelines. However, it remains uncertain to what extent these recommendations are backed up by empirical research. The aim of this study is to review the existing literature that evaluates current recommendations in Easy Language guidelines, on the basis of the following research questions: (a) is there empirical research that evaluates current international Easy Language recommendations? and, (b) if so, what current international Easy Language recommendations are supported by empirical research and what results were obtained? To this end, we conducted a systematic literature review based on journal articles in three languages: Catalan, English, and Spanish. First, a systematic search was designed and performed in 10 databases of different fields of science. Then, we reviewed every article that resulted from the search and found that 6 publications out of the initial 617 met the inclusion criteria and could be considered relevant for the study. Based on the data extracted from the included publications, and after an overall review of our systematic search results, we safely state that there is indeed empirical research on some current Easy Language recommendations. Nevertheless, empirical research in the field (at least in the publication format and languages considered in our study) is not enough in terms of the number of publications, and the findings obtained are far from generalisable. Our literature review suggests future lines of research, and we hope that it fosters empirical studies in the field that help support the existing findings. Find out more here.

The Visible Subtitler: Blockchain Technology Towards Right Management and Minting

February 2023, Link

Pilar Orero, Anna Fernandez Torner, Estella Oncins


Background: Subtitles are produced through different workflows and technologies: from fully automatic to human in open source web editors or in-house platforms, and increasingly through hybrid human-machine interaction. There is little agreement regarding subtitle copyright beyond the understanding that it is a derivative work. While same language verbatim subtitles may have little room for creativity, interlingual subtitling is heavily dependent on the subtitler skills to translate, prioritise, and condense information. These days creative subtitles are increasingly being used as one more aesthetic element in audiovisual narrative. Though they may be in the same language, the visual attributes that contribute to the narrative development make creative subtitles one more element that should be acknowledged and copyright protected.
Methods: The paper will present a short introduction to subtitling copyright. It will then describe centralised and decentralised copyright management — where blockchain technology can be applied to aid subtitler identification. A focus group with expert professional subtitlers was organised, and feedback is reported.
Conclusions: Subtitle copyright is country dependent, still subtitling working practices and media asset distribution have no geographical borders. Blockchain technology -as a concept- could aid subtitle traceability. This can be useful beyond financial and moral right management and work towards media sustainability, allowing for reuse and repurpose of existing media assets.

MemeTector: Enforcing Deep Focus for Meme Detection

January 2023, Link

Christos KoutlisManos SchinasSymeon Papadopoulos


Image memes and specifically their widely-known variation image macros, is a special new media type that combines text with images and is used in social media to playfully or subtly express humour, irony, sarcasm and even hate. It is important to accurately retrieve image memes from social media to better capture the cultural and social aspects of online phenomena and detect potential issues (hate-speech, disinformation). Essentially, the background image of an image macro is a regular image easily recognized as such by humans but cumbersome for the machine to do so due to feature map similarity with the complete image macro. Hence, accumulating suitable feature maps in such cases can lead to deep understanding of the notion of image memes. To this end, we propose a methodology that utilizes the visual part of image memes as instances of the regular image class and the initial image memes as instances of the image meme class to force the model to concentrate on the critical parts that characterize an image meme. Additionally, we employ a trainable attention mechanism on top of a standard ViT architecture to enhance the model’s ability to focus on these critical parts and make the predictions interpretable. Several training and test scenarios involving web-scraped regular images of controlled text presence are considered in terms of model robustness and accuracy. The findings indicate that light visual part utilization combined with sufficient text presence during training provides the best and most robust model, surpassing state of the art. Source code and dataset are available here.

Leveraging Large-Scale Multimedia Datasets to Refine Content Moderation Models

December 2022 (pre-print), Link

Ioannis Sarridis, Christos Koutlis, Olga Papadopoulou, Symeon Papadopoulos


The sheer volume of online user-generated content has rendered content moderation technologies essential in order to protect digital platform audiences from content that may cause anxiety, worry, or concern. Despite the efforts towards developing automated solutions to tackle this problem, creating accurate models remains challenging due to the lack of adequate task-specific training data. The fact that manually annotating such data is a highly demanding procedure that could severely affect the annotators’ emotional well-being is directly related to the latter limitation. In this paper, we propose the CM-Refinery framework that leverages large-scale multimedia datasets to automatically extend initial training datasets with hard examples that can refine content moderation models, while significantly reducing the involvement of human annotators. We apply our method on two model adaptation strategies designed with respect to the different challenges observed while collecting data, i.e. lack of (i) task-specific negative data or (ii) both positive and negative data. Additionally, we introduce a diversity criterion applied to the data collection process that further enhances the generalization performance of the refined models. The proposed method is evaluated on the Not Safe for Work (NSFW) and disturbing content detection tasks on benchmark datasets achieving 1.32% and 1.94% accuracy improvements compared to the state of the art, respectively. Finally, it significantly reduces human involvement, as 92.54% of data are automatically annotated in case of disturbing content while no human intervention is required for the NSFW task.

Media Accessibility: An Opportunity for Diversity, Inclusion and Education

December 2022, Link

Pilar Orero


Pilar Orero from UAB (Universitat Autònoma de Barcelona) wrote chapter 3 in “Accesibilidad, Communicaión y Educación para todas las personas” (translation: Accessibility, Communication and Education for everyone), and mentions MediaVerse. The book (written in Spanish) provides concepts, reflections and arguments on the importance of being a creator in the fields of communication, design and education, especially in a highly digitalized society. An emphasis lies on reflecting barriers and conditions of accessibility as well as showing corresponding political, social, cultural and technical dimensions. Orero’s chapter is titled “Media accessibility. An Opportunity for Diversity, Inclusion and Education”. She and her UAB colleagues also include such themes in their MediaVerse use case.

COVID-Related Misinformation Migration to BitChute and Odysee

November 2022, Link

Olga Papadopoulou, Evangelia Kartsounidou, Symeon Papadopoulos


The overwhelming amount of information and misinformation on social media platforms has created a new role that these platforms are inclined to take on, that of the Internet custodian. Mainstream platforms, such as Facebook, Twitter and YouTube, are under tremendous public and political pressure to combat disinformation and remove harmful content. Meanwhile, smaller platforms, such as BitChute and Odysee, have emerged and provide fertile ground for disinformation as a result of their low content-moderation policy. In this study, we analyze the phenomenon of removed content migration from YouTube to BitChute and Odysee. In particular, starting from a list of COVID-related videos removed from YouTube due to violating its misinformation policy, we find that ∼15% (1114 videos) of them migrated to the two low content-moderation platforms under study. This amounts to 4096 videos on BitChute and 1810 on Odysee. We present an analysis of this video dataset, revealing characteristics of misinformation dissemination similar to those on YouTube and other mainstream social media platforms. The BitChute–Odysee COVID-related dataset is publicly available for research purposes on misinformation analysis.

MediaVerse goes to ITACA

November 2022, Link

Anna Matamala, Estella Oncins


As part of one of MediaVerse’s use cases, our partners at UAB (Universitat Autònoma de Barcelona) are developing the ITACA programme. It offers face-to-face activities to secondary school students who, due to their socio-economic situation, may not consider pursuing higher education studies. Transmedia Catalonia has proposed an activity in which the young students will co-create an accessible 360º Tik-Tok video and add audio description to it. The result will be shared via the MediaVerse platform. The objectives, methods, results and conclusions of the ITACA undertaking were summarized on a poster and presented at the 2022 “IV Congreso International de Innovaión Docente e Investigación en Educaión Superior“. Find out more about the ITACA use case on the MediaVerse website.

A Multi-Stream Fusion Network for Image Splicing Localization

November 2022, Link

Maria Siopi, Giorgos Kordopatis-Zilos, Polychronis Charitidis, Ioannis Kompatsiaris, Symeon Papadopoulos


Images have long been considered reliable evidence when corroborating facts. However, the latest advancements in the field of image editing and the wide availability of easy-to-use software create very big risks of image tampering by malicious actors.

In this paper, we address the problem of image splicing localization with a multi-stream network architecture that processes the raw RGB image in parallel with other handcrafted forensic signals.

DnS: Distill-and-Select for Efficient and Accurate Video Indexing and Retrieval

August 2022 (pre-print), Link

Giorgos Kordopatis-ZilosChristos TzelepisSymeon PapadopoulosIoannis KompatsiarisIoannis Patras


In this paper, we address the problem of high performance and computationally efficient content-based video retrieval in large-scale datasets. Current methods typically propose either: (i) fine-grained approaches employing spatio-temporal representations and similarity calculations, achieving high performance at a high computational cost or (ii) coarse-grained approaches representing/indexing videos as global vectors, where the spatio-temporal structure is lost, providing low performance but also having low computational cost. In this work, we propose a Knowledge Distillation framework, which we call Distill-and-Select (DnS), that starting from a well-performing fine-grained Teacher Network learns: a) Student Networks at different retrieval performance and computational efficiency trade-offs and b) a Selection Network that at test time rapidly directs samples to the appropriate student to maintain both high retrieval performance and high computational efficiency. We train several students with different architectures and arrive at different trade-offs of performance and efficiency, i.e., speed and storage requirements, including fine-grained students that store index videos using binary representations. Importantly, the proposed scheme allows Knowledge Distillation in large, unlabelled datasets — this leads to good students. We evaluate DnS on five public datasets on three different video retrieval tasks and demonstrate a) that our students achieve state-of-the-art performance in several cases and b) that our DnS framework provides an excellent trade-off between retrieval performance, computational speed, and storage space. In specific configurations, our method achieves similar mAP with the teacher but is 20 times faster and requires 240 times less storage space. Our collected dataset and implementation are publicly available: this https URL.

MediaVerse Standard Contribution (ISO/CD 6273) promoted in Japanese Magazine on Accessibility

July 2022, Link

Pilar Orero


Based on results from UAB’s pilot acvtivities, Pilar Orero contributed to standardisation with ISO reporting on the activities of WD 6273, in which they are defining guidelines for assistive products for persons with impaired sensory functions.

Her findings were published in the magazine Incl (which comes from ‘Inclusion’) which is published 6 times a year and distributed between 300-400 people/companies. 

InDistill: Transferring Knowledge From Pruned Intermediate Layers

May 2022 (pre-print), Link

Ioannis SarridisChristos KoutlisSymeon PapadopoulosIoannis Kompatsiaris


Deploying deep neural networks on hardware with limited resources, such as smartphones and drones, constitutes a great challenge due to their computational complexity. Knowledge distillation approaches aim at transferring knowledge from a large model to a lightweight one, also known as teacher and student respectively, while distilling the knowledge from intermediate layers provides an additional supervision to that task. The capacity gap between the models, the information encoding that collapses its architectural alignment, and the absence of appropriate learning schemes for transferring multiple layers restrict the performance of existing methods. In this paper, we propose a novel method, termed InDistill, that can drastically improve the performance of existing single-layer knowledge distillation methods by leveraging the properties of channel pruning to both reduce the capacity gap between the models and retain the architectural alignment. Furthermore, we propose a curriculum learning based scheme for enhancing the effectiveness of transferring knowledge from multiple intermediate layers. The proposed method surpasses state-of-the-art performance on three benchmark image datasets.

A Graph Diffusion Scheme for Decentralized Content Search based on Personalized PageRank

April 2022, Link

Nikolaos Giatsoglou, Emmanouil Krasanakis, Symeon Papadopoulos, Ioannis Kompatsiaris


Decentralization is emerging as a key feature of the future Internet. However, effective algorithms for search are missing from state-of-the-art decentralized technologies, such as distributed hash tables and blockchain. This is surprising, since decentralized search has been studied extensively in earlier peer-to-peer (P2P) literature. In this work, we adopt a fresh outlook for decentralized search in P2P networks that is inspired by advancements in dense information retrieval and graph signal processing. In particular, we generate latent representations of P2P nodes based on their stored documents and diffuse them to the rest of the network with graph filters, such as personalized PageRank. We then use the diffused representations to guide search queries towards relevant content. Our preliminary approach is successful in locating relevant documents in nearby nodes but the accuracy declines sharply with the number of stored documents, highlighting the need for more sophisticated techniques.

Leveraging Selective Prediction for Reliable Image Geolocation

March 2022, Link

Apostolos Panagiotopoulos, Giorgos Kordopatis-Zilos, Symeon Papadopoulos


Reliable image geolocation is crucial for several applications, ranging from social media geo-tagging to media verification. State-ofthe-art geolocation methods surpass human performance on the task of geolocation estimation from images. However, no method assesses the suitability of an image for this task, which results in unreliable and erroneous estimations for images containing ambiguous or no geolocation clues. In this paper, we define the task of image localizability, i.e. suitability of an image for geolocation, and propose a selective prediction methodology to address the task. In particular, we propose two novel selection functions that leverage the output probability distributions of geolocation models to infer localizability at different scales. Our selection functions are benchmarked against the most widely used selective prediction baselines, outperforming them in all cases. By abstaining from predicting non-localizable images, we improve geolocation accuracy from 27.8% to 70.5% at the city-scale, and thus make current geolocation models reliable for real-world applications.

Co-Creation for Social Inclusion: The CROMA project

December 2021, Link

Anna Matamala


Anna Matamala from UAB (Universitat Autònoma de Barcelona) contributed to the book “Innovación Docente e Investigación en Arte y Humanidades” (translation: “Teaching Innovation and Research in the Arts and Humanities”). The Spanish publication broadly focuses on work in the field of education and educational innovation. Matamala writes about co-creation for social inclusion and the CROMA project that is part of one of UAB’s MediaVerse use cases.

Making Media Accessible to All

November 2021, Link

Andy Quested, Pradipta Biswas, Masahito Kawamori, Pilar Orero


Selecting an accessible media service is often a binary option – either on or off, where one option is supplied to all no matter the degree or need. Audience requirements are very different and range from 100% loss of a sense to occasional need for assistance. Furthermore, accessible media services continue to only address sight and sound assistance, which does not help participation for those with reduce motor functions or with understanding or learning difficulties – often more than one condition is present leading to co-morbidity.

Developers need to understand and incorporate the wide range of requirements for people with a range of abilities. A ‘one-size-fits-all’ approach can be the easiest option to implement, rather than developing different options for the same website, application or audiovisual media, for people with a range of abilities. Solutions are often not scalable when applied to platforms with a range of accessibility options.

The role of the ITU Audio Visual Accessibility Group is to investigate and suggest options and solutions that can be applied to any form of media no matter how produced, distributed, or reproduced.

Testing Times: Introduction

October 2021, Link

Pilar Orero, David Hernández Falagán


That COVID-19 touched all walks of life is an understatement. With the risk of sounding frivolous, compared with other impacts, COVID-19 had direct implications in research, and particularly in funded research activities with a strict schedule. Luckily, in the field of audiovisual translation we do not require any live samples or animals to be fed while in lockdown. Still, experimental programmed tests with people required alternative approaches. This special issue presents the social distancing challenges faced in user-centric research methodologies when human interaction is required.

Culture meets Immersive Environments: A new Media Landscape across Europe

October 2021, Link

Marta Brescia-Zapata


The traditional media landscape is in the middle of a monumental shift: the new prosumer profile, the need for faster and more efficient communication, and the search for more user-driven and accessible multimedia experiences. New technologies (and more specifically, immersive environments) can provide great opportunities in the entertainment sector, and also in communication, learning, arts and culture. These technologies are gaining popularity due to the COVID-19 crisis as they enable interactive, hyper-personalised and engaging experiences anytime and anywhere. The EU-funded projects TRACTION (870610) and MEDIAVERSE (957252) are embracing new technologies in order to establish an effective participatory production workflow and are exploring novel audio-visual art representation formats. TRACTION will provide a bridge between opera professionals and specific communities at risk of exclusion based on trials, understood as experimental attempts at fostering an effective community dialogue between diverse individuals at risk of exclusion in three different situations, across three countries: Ireland, Portugal and Spain. MEDIAVERSE will enable the creation of a decentralised network of intelligent, automated, and accessible services, tools, and authoring platforms for digital asset management; legal and monetisable discovery and distribution of verified content, and barrier-free usage and integration in target media and platforms.

Audio Description Personalisation

September 2021, Paper format

Pilar Orero


Technology is opening new and fascinating opportunities along the audio description workflow: from production to consumption. For years now academics have been looking at alternative ways of offering audio descriptions aimed at meeting end user needs and expectations. In the past, technology didn’t allow for personalization beyond the sound volume or colour contrast. The shift from analog to digital opened the possibility to enjoy simultaneously different audio description styles and languages (Orero et al 2014). Nowadays most audio description components may be altered with a view to achieving a higher level of interaction with the audience, the venue requirements, the media genres and the different age groups or cultural background of users (Mazur and Chmiel 2016). This personalization was studied in the Pear Tree Stories (Orero 2008) from a narratological point of view (Mazur and Kruger 2012). Audio description research has also focused on delivery where alternative voicing strategies have been studied (Szarkowska 2011; Caruso 2012; Fernández-Torné & Matamala 2015). Making the content easier to understand has also been under analysis when modifying degrees of information explicitation, intonation, or speed (Cabeza-Cáceres 2013). How to make audio description easier to understand has also been studied through the impact on information recall (Orero 2012; Fresno 2014; Fresno et al 2014; Bernabé & Orero 2019). 

Technology is now enabling the delivery and user personalization of the many traits that impact audio description reception with a view to heightening its enjoyment and understanding – or vice versa? The chapter departs from the concept of personalisation and then describes the possible personalization features available today, pointing towards new research avenues, to conclude with a state of the art bibliography. The chapter has avoided technical terminology as much as possible.

Translation, Accessibility and Minorities

September 2021, Paper format

Pilar Orero


Audiovisual media in the 21st century is ubiquitous. Audiovisual media is technology driven, and its development elicits exponential complexity at all stages: creation, distribution, and how can be consumed. Having access to media content in all devices, and formats has been for years now a Human Right, since we live in the Information Society. The study of translation in this technology driven society needs to be approached from a complex multidisciplinary collaboration since it is difficult to separate the imbedded media content from its technological creation tools. The discipline studying communication for all citizens in this new audiovisual media access context is Media Accessibility. The chapter first proposes a new hierarchy for Translation Studies in the Society Information era, where the consumer is at the center of research, hence user centric approaches are the methodological context. The second part focuses on accessibility in general and media accessibility in particular. The chapter finishes revisiting the concept of minorities, beyond languages, cultures, and people, since artificial intelligence will soon offer a new research context from where society may be reorganized — fairly or not.

Holistic Requirements Analysis For Specifying New Systems For 3D Media Production and Promotion

July 2021, Link

Christos Mouzakis, Dimitrios Ververidis, Luis Miguel Girao, Nicolas Patz, Spiros Nikolopoulos, Ioannis Kompatsiaris


This paper presents a requirements specification analysis for driving the design of new systems that will allow 3D media creators to further promote and monetize from their work. The provided requirements analysis is based on the IEEE 830 standard for requirements specification. It allows us to elucidate system requirements through existing (AS-IS) and envisioned (TO-BE) scenarios affected by the latest trends on design methodologies and content promotion in social media. A total of 30 tools for content creation, promotion and monetization are reviewed. The target groups, i.e. creator groups, are divided in 10 types according to their role in 3D media production. Based on this division 10 candidate TO-BE scenarios have been identified and out of these 10 scenarios, we have selected 6 scenarios for validation by media creators. The validation was performed through a survey of 24 statements on a 5 Likert scale by 47 individuals from the domains of Media, Fine arts, Architecture, and Informatics. Useful evaluation results and comments have been collected that can be useful for future systems design.

Easy to Read Standardisation: Some Steps Towards an International Standard

December 2020, Link

Pilar Orero, Clara Delgado, Anna Matamala


Reading is a means to make Human Rights effective, mainly those related to full participation in society under equal conditions. Literacy is not natural but acquired, and it depends on many factors from personal capabilities to access to education from a geographical or financial point of view. Even in more developed countries where education is compulsory until adolescence, a growing number of children do not fully develop their reading skills. This fact makes reading a universal barrier towards equal opportunities. Learning to read is one solution to the problem, on the other hand, generating texts which are easier to read may also help. This article presents the Easy to Read existing standards, and describes some further standardisation requirements such as terminology, intended audience, workflows, formats, and languages that should be taken into consideration towards a 21st century Easy to Read recommendation.

Immersive Captioning : Developing a Framework for Evaluating User Needs

December 2020, Link

Chris Hughes, Marta Brescia-Zapata, Matthew Johnston, Pilar Orero


This article focuses on captioning for immersive environments and the research aims to identify how to display them for an optimal viewing experience. This work began four years ago with some partial findings. This second stage of research, built from the lessons learnt, focuses on the design requirements cornerstone: prototyping. A tool has been developed towards quick and realistic prototyping and testing. The framework integrates methods used in existing solutions. Given how easy it is to contrast and compare, the need to further the first framework was obvious. A second improved solution was developed, almost as a showcase on how ideas can quickly be implemented for user testing. After an overview on captions in immersive environments, the article describes its implementation, based on web technologies opening for any device with a web browser. This includes desktop computers, mobile devices and head mounted displays. The article finishes with a description of the new caption modes leading to improved methods, hoping to be a useful tool towards testing and standardisation.

Evaluating Subtitle Readability in Media Immersive Environments

December 2020, Link

Pilar Orero, Marta Brescia-Zapata, Chris Hughes


The advances in VR technology have led to immersive videos rapidly gaining popularity. Accessibility to immersive media should be offered and subtitles are the most popular accessibility service. Research on subtitle readability has led to guidelines and standards (W3C, ISO/IEC/ITU 20071-23:2018). More research into subtitle presentation modes in 360º is needed in order to move towards understanding optimum readability. Evaluating readability for subtitles in immersive media environments requires a flexible and user-friendly framework for both creating the subtitles and presenting the generated subtitle file in a fully functional immersive video player, in order to understand the final view in the environment and assess its quality. This article starts by looking at the readability recommendations in W3C and ISO/IEC/ITU. The second part will describe the new features required in immersive subtitle presentations. The final section will describe the new web-based framework that allows the generation of immersive subtitles where readability may be tested. The framework has adopted a contrast and comparison approach towards instant readability evaluation.

Let's put Standardisation in Practice: Accessibility Services and Interaction

October 2020, Link

Estella Oncins, Pilar Orero


Technology is developing at a fast pace to produce new interactions, which turn into new communication barriers, some of which might be avoidable. Looking at recommendations from some accessibility standards at the design stage could solve many issues and help towards native accessible technology.

This article looks at existing standards related to accessibility and media communication. The first part of the article looks at different standardisation agencies and the need to produce harmonised standards for accessibility at IEC, ITU, ISO and W3C. The second part of the article outlines how standards are produced and implemented at a European level by the European Standardisation Organisations (CEN, CENELEC and ETSI). It then lists existing standards for each media accessibility service: subtitling, audio description, audio subtitling and sign language. Mention is made of Easy to Read as a new emerging accessibility modality. The final part of the article will provide conclusions and directions for further research.