Here you will find all publications created and published as a result of the MediaVerse project.
The most recent will be on top.
A click on the headlines will reveal the author(s) and an abstract.
DnS: Distill-and-Select for Efficient and Accurate Video Indexing and Retrieval
June 2021 (pre-print), Link
In this paper, we address the problem of high performance and computationally efficient content-based video retrieval in large-scale datasets. Current methods typically propose either: (i) fine-grained approaches employing spatio-temporal representations and similarity calculations, achieving high performance at a high computational cost or (ii) coarse-grained approaches representing/indexing videos as global vectors, where the spatio-temporal structure is lost, providing low performance but also having low computational cost. In this work, we propose a Knowledge Distillation framework, which we call Distill-and-Select (DnS), that starting from a well-performing fine-grained Teacher Network learns: a) Student Networks at different retrieval performance and computational efficiency trade-offs and b) a Selection Network that at test time rapidly directs samples to the appropriate student to maintain both high retrieval performance and high computational efficiency. We train several students with different architectures and arrive at different trade-offs of performance and efficiency, i.e., speed and storage requirements, including fine-grained students that store index videos using binary representations. Importantly, the proposed scheme allows Knowledge Distillation in large, unlabelled datasets — this leads to good students. We evaluate DnS on five public datasets on three different video retrieval tasks and demonstrate a) that our students achieve state-of-the-art performance in several cases and b) that our DnS framework provides an excellent trade-off between retrieval performance, computational speed, and storage space. In specific configurations, our method achieves similar mAP with the teacher but is 20 times faster and requires 240 times less storage space. Our collected dataset and implementation are publicly available: this https URL.
InDistill: Transferring Knowledge From Pruned Intermediate Layers
May 2022 (pre-print), Link
Ioannis Sarridis, Christos Koutlis, Symeon Papadopoulos, Ioannis Kompatsiaris
Deploying deep neural networks on hardware with limited resources, such as smartphones and drones, constitutes a great challenge due to their computational complexity. Knowledge distillation approaches aim at transferring knowledge from a large model to a lightweight one, also known as teacher and student respectively, while distilling the knowledge from intermediate layers provides an additional supervision to that task. The capacity gap between the models, the information encoding that collapses its architectural alignment, and the absence of appropriate learning schemes for transferring multiple layers restrict the performance of existing methods. In this paper, we propose a novel method, termed InDistill, that can drastically improve the performance of existing single-layer knowledge distillation methods by leveraging the properties of channel pruning to both reduce the capacity gap between the models and retain the architectural alignment. Furthermore, we propose a curriculum learning based scheme for enhancing the effectiveness of transferring knowledge from multiple intermediate layers. The proposed method surpasses state-of-the-art performance on three benchmark image datasets.
MemeTector: Enforcing deep focus for meme detection
May 2022 (pre-print), Link
Image memes and specifically their widely-known variation image macros, is a special new media type that combines text with images and is used in social media to playfully or subtly express humour, irony, sarcasm and even hate. It is important to accurately retrieve image memes from social media to better capture the cultural and social aspects of online phenomena and detect potential issues (hate-speech, disinformation). Essentially, the background image of an image macro is a regular image easily recognized as such by humans but cumbersome for the machine to do so due to feature map similarity with the complete image macro. Hence, accumulating suitable feature maps in such cases can lead to deep understanding of the notion of image memes. To this end, we propose a methodology that utilizes the visual part of image memes as instances of the regular image class and the initial image memes as instances of the image meme class to force the model to concentrate on the critical parts that characterize an image meme. Additionally, we employ a trainable attention mechanism on top of a standard ViT architecture to enhance the model’s ability to focus on these critical parts and make the predictions interpretable. Several training and test scenarios involving web-scraped regular images of controlled text presence are considered in terms of model robustness and accuracy. The findings indicate that light visual part utilization combined with sufficient text presence during training provides the best and most robust model, surpassing state of the art.
Leveraging Selective Prediction for Reliable Image Geolocation
March 2022, Link
Apostolos Panagiotopoulos, Giorgos Kordopatis-Zilos, and Symeon Papadopoulos
Reliable image geolocation is crucial for several applications, ranging from social media geo-tagging to media verification. State-ofthe-art geolocation methods surpass human performance on the task of geolocation estimation from images. However, no method assesses the suitability of an image for this task, which results in unreliable and erroneous estimations for images containing ambiguous or no geolocation clues. In this paper, we define the task of image localizability, i.e. suitability of an image for geolocation, and propose a selective prediction methodology to address the task. In particular, we propose two novel selection functions that leverage the output probability distributions of geolocation models to infer localizability at different scales. Our selection functions are benchmarked against the most widely used selective prediction baselines, outperforming them in all cases. By abstaining from predicting non-localizable images, we improve geolocation accuracy from 27.8% to 70.5% at the city-scale, and thus make current geolocation models reliable for real-world applications.
Panagiotopoulos, A., Kordopatis-Zilos, G., Papadopoulos, S. (2022). Leveraging Selective Prediction for Reliable Image Geolocation. In: , et al. MultiMedia Modeling. MMM 2022. Lecture Notes in Computer Science, vol 13142. Springer, Cham. https://doi.org/10.1007/978-3-030-98355-0_31
A Graph Diffusion Scheme for Decentralized Content Search based on Personalized PageRank
April 2022, Link
Nikolaos Giatsoglou, Emmanouil Krasanakis, Symeon Papadopoulos, Ioannis Kompatsiaris
Decentralization is emerging as a key feature of the future Internet. However, effective algorithms for search are missing from state-of-the-art decentralized technologies, such as distributed hash tables and blockchain. This is surprising, since decentralized search has been studied extensively in earlier peer-to-peer (P2P) literature. In this work, we adopt a fresh outlook for decentralized search in P2P networks that is inspired by advancements in dense information retrieval and graph signal processing. In particular, we generate latent representations of P2P nodes based on their stored documents and diffuse them to the rest of the network with graph filters, such as personalized PageRank. We then use the diffused representations to guide search queries towards relevant content. Our preliminary approach is successful in locating relevant documents in nearby nodes but the accuracy declines sharply with the number of stored documents, highlighting the need for more sophisticated techniques.
Giatsoglou, N., Krasanakis, E., Papadopoulos, S., Kompatsiaris, I.: A Graph Diffusion Scheme for Decentralized Content Search based on Personalized PageRank; to be published in the Decentralized Internet, Networks, Protocols, and Systems (DINPS 2022) workshop hosted at the 42nd IEEE International Conference on Distributed Computing Systems (ICDCS 2022); cite as: arXiv:2204.12902
Translation, Accessibility and Minorities
September 2021, Paper form
Audiovisual media in the 21st century is ubiquitous. Audiovisual media is technology driven, and its development elicits exponential complexity at all stages: creation, distribution, and how can be consumed. Having access to media content in all devices, and formats has been for years now a Human Right, since we live in the Information Society. The study of translation in this technology driven society needs to be approached from a complex multidisciplinary collaboration since it is difficult to separate the imbedded media content from its technological creation tools. The discipline studying communication for all citizens in this new audiovisual media access context is Media Accessibility. The chapter first proposes a new hierarchy for Translation Studies in the Society Information era, where the consumer is at the center of research, hence user centric approaches are the methodological context. The second part focuses on accessibility in general and media accessibility in particular. The chapter finishes revisiting the concept of minorities, beyond languages, cultures, and people, since artificial intelligence will soon offer a new research context from where society may be reorganized — fairly or not.
Orero, Pilar (2021) Translation, Accessibility and Minorities. In E. Bielsa (ed.) The Routledge Handbook of Translation and Media. London: Taylor Francis. pp. 384-399.
Making Media Accessible to All
November 2021, Link
Andy Quested, Pradipta Biswas, Masahito Kawamori, Pilar Orero
Selecting an accessible media service is often a binary option – either on or off, where one option is supplied to all no matter the degree or need. Audience requirements are very different and range from 100% loss of a sense to occasional need for assistance. Furthermore, accessible media services continue to only address sight and sound assistance, which does not help participation for those with reduce motor functions or with understanding or learning difficulties – often more than one condition is present leading to co-morbidity.
Developers need to understand and incorporate the wide range of requirements for people with a range of abilities. A ‘one-size-fits-all’ approach can be the easiest option to implement, rather than developing different options for the same website, application or audiovisual media, for people with a range of abilities. Solutions are often not scalable when applied to platforms with a range of accessibility options.
The role of the ITU Audio Visual Accessibility Group is to investigate and suggest options and solutions that can be applied to any form of media no matter how produced, distributed, or reproduced.
Quested, A., Biswas, P., Kawamori, M., Orero, P.: Making Media Accessible to All. In: Proceedings of “Shape the Future: Research and Development Questions in Digital Accessibility”. Online Research Symposium, 10 November 2021
Testing Times: Introduction
October 2021, Link
Pilar Orero, David Hernández Falagán
That COVID-19 touched all walks of life is an understatement. With the risk of sounding frivolous, compared with other impacts, COVID-19 had direct implications in research, and particularly in funded research activities with a strict schedule. Luckily, in the field of audiovisual translation we do not require any live samples or animals to be fed while in lockdown. Still, experimental programmed tests with people required alternative approaches. This special issue presents the social distancing challenges faced in user-centric research methodologies when human interaction is required.
Orero, P. & D. Hernández Falagán (2021). Testing Times: Introduction. Journal of Audiovisual Translation, 4(2), 1–4.
Culture meets immersive environments: a new media landscape across Europe
October 2021, Link
The traditional media landscape is in the middle of a monumental shift: the new prosumer profile, the need for faster and more efficient communication, and the search for more user-driven and accessible multimedia experiences. New technologies (and more specifically, immersive environments) can provide great opportunities in the entertainment sector, and also in communication, learning, arts and culture. These technologies are gaining popularity due to the COVID-19 crisis as they enable interactive, hyper-personalised and engaging experiences anytime and anywhere. The EU-funded projects TRACTION (870610) and MEDIAVERSE (957252) are embracing new technologies in order to establish an effective participatory production workflow and are exploring novel audio-visual art representation formats. TRACTION will provide a bridge between opera professionals and specific communities at risk of exclusion based on trials, understood as experimental attempts at fostering an effective community dialogue between diverse individuals at risk of exclusion in three different situations, across three countries: Ireland, Portugal and Spain. MEDIAVERSE will enable the creation of a decentralised network of intelligent, automated, and accessible services, tools, and authoring platforms for digital asset management; legal and monetisable discovery and distribution of verified content, and barrier-free usage and integration in target media and platforms.
Audio Description Personalisation
September 2021, Paper format
Technology is opening new and fascinating opportunities along the audio description workflow: from production to consumption. For years now academics have been looking at alternative ways of offering audio descriptions aimed at meeting end user needs and expectations. In the past, technology didn’t allow for personalization beyond the sound volume or colour contrast. The shift from analog to digital opened the possibility to enjoy simultaneously different audio description styles and languages (Orero et al 2014). Nowadays most audio description components may be altered with a view to achieving a higher level of interaction with the audience, the venue requirements, the media genres and the different age groups or cultural background of users (Mazur and Chmiel 2016). This personalization was studied in the Pear Tree Stories (Orero 2008) from a narratological point of view (Mazur and Kruger 2012). Audio description research has also focused on delivery where alternative voicing strategies have been studied (Szarkowska 2011; Caruso 2012; Fernández-Torné & Matamala 2015). Making the content easier to understand has also been under analysis when modifying degrees of information explicitation, intonation, or speed (Cabeza-Cáceres 2013). How to make audio description easier to understand has also been studied through the impact on information recall (Orero 2012; Fresno 2014; Fresno et al 2014; Bernabé & Orero 2019).
Technology is now enabling the delivery and user personalization of the many traits that impact audio description reception with a view to heightening its enjoyment and understanding – or vice versa? The chapter departs from the concept of personalisation and then describes the possible personalization features available today, pointing towards new research avenues, to conclude with a state of the art bibliography. The chapter has avoided technical terminology as much as possible.
Orero, Pilar (2021) Audio Description Personalisation. In Christopher Taylor and Elisa Perergo (Eds) The Routledge Handbook of Audio Description. London: Taylor Francis, pp. 121-134.
Holistic Requirements Analysis For Specifying New Systems For 3D Media Production and Promotion
April 2021, Link
Christos Mouzakis, Dimitrios Ververidis, Luis Miguel Girao, Nicolas Patz, Spiros Nikolopoulos and Ioannis Kompatsiaris
This paper presents a requirements specification analysis for driving the design of new systems that will allow 3D media creators to further promote and monetize from their work. The provided requirements analysis is based on the IEEE 830 standard for requirements specification. It allows us to elucidate system requirements through existing (AS-IS) and envisioned (TO-BE) scenarios affected by the latest trends on design methodologies and content promotion in social media. A total of 30 tools for content creation, promotion and monetization are reviewed. The target groups, i.e. creator groups, are divided in 10 types according to their role in 3D media production. Based on this division 10 candidate TO-BE scenarios have been identified and out of these 10 scenarios, we have selected 6 scenarios for validation by media creators. The validation was performed through a survey of 24 statements on a 5 Likert scale by 47 individuals from the domains of Media, Fine arts, Architecture, and Informatics. Useful evaluation results and comments have been collected that can be useful for future systems design.
Mouzakis, C., Ververidis, D., Girao, L. M., Patz, N., Nikolopoulos, S., & Kompatsiaris, I. (2021). Holistic Requirements Analysis for Specifying New Systems for 3D Media Production and Promotion. Sustainability, 13(15), 8155. doi:10.3390/su13158155 [pdf]
Immersive captioning : developing a framework for evaluating user needs
December 2020, Link
Hughes, C.J. , Zapata, M.B., Johnston, M. and Orero, P.
This article focuses on captioning for immersive environments and the research aims to identify how to display them for an optimal viewing experience. This work began four years ago with some partial findings. This second stage of research, built from the lessons learnt, focuses on the design requirements cornerstone: prototyping. A tool has been developed towards quick and realistic prototyping and testing. The framework integrates methods used in existing solutions. Given how easy it is to contrast and compare, the need to further the first framework was obvious. A second improved solution was developed, almost as a showcase on how ideas can quickly be implemented for user testing. After an overview on captions in immersive environments, the article describes its implementation, based on web technologies opening for any device with a web browser. This includes desktop computers, mobile devices and head mounted displays. The article finishes with a description of the new caption modes leading to improved methods, hoping to be a useful tool towards testing and standardisation.
Hughes, CJ, Zapata, MB, Johnston, M and Orero, P 2020, Immersive captioning : developing a framework for evaluating user needs , in: IEEE AIVR 2020 : 3rd International Conference on Artificial Intelligence & Virtual Reality 2020, 14th-18th December 2020, Online.
Evaluating subtitle readability in media immersive environments
02 December 2020, Link
Pilar Orero, Marta Brescia-Zapata, Chris Hughes
The advances in VR technology have led to immersive videos rapidly gaining popularity. Accessibility to immersive media should be offered and subtitles are the most popular accessibility service. Research on subtitle readability has led to guidelines and standards (W3C, ISO/IEC/ITU 20071-23:2018). More research into subtitle presentation modes in 360º is needed in order to move towards understanding optimum readability. Evaluating readability for subtitles in immersive media environments requires a flexible and user-friendly framework for both creating the subtitles and presenting the generated subtitle file in a fully functional immersive video player, in order to understand the final view in the environment and assess its quality. This article starts by looking at the readability recommendations in W3C and ISO/IEC/ITU. The second part will describe the new features required in immersive subtitle presentations. The final section will describe the new web-based framework that allows the generation of immersive subtitles where readability may be tested. The framework has adopted a contrast and comparison approach towards instant readability evaluation.
Orero, P., Brescia-Zapata, M., Hughes, C. (2020). Evaluating subtitle readability in media immersive environments. In: DSAI 2020: 9th International Conference on Software Development and Technologies for Enhancing Accessibility and Fighting Info-exclusion (December 2020) Pages 51–54. https://doi.org/10.1145/3439231.3440602
Easy to Read Standardisation: Some steps towards an international standard
December 2020, Link
Pilar Orero, Clara Delgado, Anna Matamala
Reading is a means to make Human Rights effective, mainly those related to full participation in society under equal conditions. Literacy is not natural but acquired, and it depends on many factors from personal capabilities to access to education from a geographical or financial point of view. Even in more developed countries where education is compulsory until adolescence, a growing number of children do not fully develop their reading skills. This fact makes reading a universal barrier towards equal opportunities. Learning to read is one solution to the problem, on the other hand, generating texts which are easier to read may also help. This article presents the Easy to Read existing standards, and describes some further standardisation requirements such as terminology, intended audience, workflows, formats, and languages that should be taken into consideration towards a 21st century Easy to Read recommendation.
Orero, P., Delgado, C., Matamala, A.: Easy to Read Standardisation: Some steps towards an international standard. In: DSAI 2020: 9th International Conference on Software Development and Technologies for Enhancing Accessibility and Fighting Info-exclusion (December 2020) Pages 44–46. https://doi.org/10.1145/3439231.3440605
Let's put standardisation in practice: accessibility services and interaction
October 2020, Link
Estel·la Oncins and Pilar Orero
This article looks at existing standards related to accessibility and media communication. The first part of the article looks at different standardisation agencies and the need to produce harmonised standards for accessibility at IEC, ITU, ISO and W3C. The second part of the article outlines how standards are produced and implemented at a European level by the European Standardisation Organisations (CEN, CENELEC and ETSI). It then lists existing standards for each media accessibility service: subtitling, audio description, audio subtitling and sign language. Mention is made of Easy to Read as a new emerging accessibility modality. The final part of the article will provide conclusions and directions for further research.
Oncins, E., & Orero, P. (2021). Let’s put standardisation in practice: Accessibility services and interaction. Hikma, 20(1), 71-90. https://doi.org/10.21071/HIKMA.V20I1.12886 [pdf]