Papers
arxiv:2108.03353

Screen2Words: Automatic Mobile UI Summarization with Multimodal Learning

Published on Aug 7, 2021
Authors:
,
,
,
,
,

Abstract

Screen2Words is a multi-modal approach for summarizing mobile screens into coherent language phrases, using deep models and a large-scale annotated dataset.

AI-generated summary

Mobile User Interface Summarization generates succinct language descriptions of mobile screens for conveying important contents and functionalities of the screen, which can be useful for many language-based application scenarios. We present Screen2Words, a novel screen summarization approach that automatically encapsulates essential information of a UI screen into a coherent language phrase. Summarizing mobile screens requires a holistic understanding of the multi-modal data of mobile UIs, including text, image, structures as well as UI semantics, motivating our multi-modal learning approach. We collected and analyzed a large-scale screen summarization dataset annotated by human workers. Our dataset contains more than 112k language summarization across sim22k unique UI screens. We then experimented with a set of deep models with different configurations. Our evaluation of these models with both automatic accuracy metrics and human rating shows that our approach can generate high-quality summaries for mobile screens. We demonstrate potential use cases of Screen2Words and open-source our dataset and model to lay the foundations for further bridging language and user interfaces.

Community

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2108.03353
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 171

Browse 171 models citing this paper

Datasets citing this paper 2

Spaces citing this paper 104

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.