W3C Workshop Program:
The Multilingual Web – The Way Ahead
15 - 16 March 2012, Luxembourg
Endorsed by
The MultilingualWeb project is looking at best practices and standards related to all aspects of creating, localizing and deploying the Web multilingually. Coordinated by the W3C, the project aims to raise the visibility of existing best practices and standards and identify gaps. This fourth workshop in Luxembourg, was hosted by the Directorate-General for Translation (DGT) of the European Commission.
After the keynote speech, each main session on the first day contained a series of 15 minute talks followed by some time for questions and answers. On the second day, the workshop lasted for the morning only, and was dedicated to an Open Space discussion forum, where participants could discuss the themes of the workshop in breakout sessions. This was facilitated by TAUS. All attendees participated in all sessions.
The IRC log is the raw scribe log, which has not undergone careful post-editing and may contain errors or omissions. It should be read with that in mind. It constitutes the best efforts of the scribes to capture the gist of the talks and discussions that followed, in real time. IRC was used not only to capture notes on the talks, but can be followed in real time by remote participants, or participants with accessibility problems. People following on IRC can also add contributions to the flow of text themselves.
We recommend that you follow the links to video recordings. They are short, but convey the speaker's full talk. In some cases video links are unavailable because speakers have not given us permission to post the video. You can also find links to all videos on the VideoLectures workshop page. Thanks to VideoLectures for hosting the videos.
Related links: Workshop report • About W3C
15 March
Richard Ishida
W3C Internationalization Activity Lead & MultilingualWeb Project Coordinator
Workshop logistics
Piet Verleysen
European Commission - Resources Director, responsible for IT at the Directorate-General for Translation
Brief welcome address
Kimmo Rossi
European Commission - DG INFSO E1
EC language programs – the challenge for the future
Iván Herman
Semantic Web Activity Lead, W3C
What is Happening in the W3C Semantic Web Activity?
abstract This talk gives a short overview of the current work done at the W3C related to the Semantic Web, Linked Data, and related technical issues. The goal is not to give a detailed technical account but, rather, to give a general overview and use this is a basis for further discussions on how that particular technology can be used for the general issue of Multilingual Web
Jan Anders Nelson
Microsoft
Support for Multilingual Windows Web Apps (WWA) and HTML 5 in Windows 8
abstract This talk looks at support for web apps running on Windows8 and steps through the workflow of how a developer can optimize creation of multilingual applications, organizing their app projects to support translation and related tools available that make creation of multilingual apps easy for anyone considering shipping in more than one market, perhaps for the first time.
Tony Graham
Mentea
XSL-FO meets the Tower of Babel
abstract XSL Formatting Objects (XSL-FO) from the W3C is an XML vocabulary for specifying formatting semantics. While it shares many properties with CSS, it is most frequently used with paged media, such as formatting XML
documents as PDF. XSL-FO 2.0 is currently under development, and one of
its top-level requirements is for further improved non-Western language
support. However, the requirement for improved support in XSL-FO 2.0 is
actually less specific than the 1998 requirements for XSL 1.0 since we
recognised that we didn't have the knowledge and expertise to match our
ambitions. For that, we would need more help -- either from individual
experts or from the W3C forming more task forces along the lines of the
Japanese Layout Task Force to capture and distill expertise for use by all
of the W3C and beyond.
Richard Ishida
W3C
Jirka Kosek
University of Economics, Prague
HTML5 i18n: A report from the front line
abstract This talk will brief attendees on developments related to some key markup changes in HTML5. In addition to new markup to support bidirectional text (eg. in scripts such as Arabic, Hebrew and Thaana), the Internationalization Working Group at the W3C has been proposing changes to the current HTML5 ruby markup model. Ruby are annotations used in Japanese and Chinese to help readers recognize and understand ideographic characters. It is commonly used in educational material, manga, and can help with accessibility. The Working Group has also proposed the addition of a flag to indicate when text in a page should not be translated. This talk will deliver an up-to-the-minute status on the progress made in these areas.
[Chair, Reinhard Schäler • Scribe, Felix Sasaki]
Brian Teeman
JoomlalShack University
Building Multi-Lingual Web Sites with Joomla! the leading open source CMS
abstract Joomla is the leading Open Source CMS used by over 2.8% of the web and by over 3000 government web sites. With the release in Jan 2012 of the latest version (2.5) building and creating truly multi-lingual web sites has never been easier. This presentation will show how easy it is to build real multi-lingual web sites and not to rely on automated translation tools.
Loïc Dufresne de Virel
Intel Corp.
How standards (could) support a more efficient web localization process by making CMS - TMS integrations less complicated
abstract As Intel just deployed a new Web Content Management system, which we integrated into our TMS, We had to deal with multiple challenges, and also faced a great deal of complexity and customization. In this 15-min talk, I will look into what we did well, what we did wrong, what we could have done better, and will try to put a dollar figure on the cost of ignoring a few standards...
Gerard Meijssen
Wikimedia Foundation
Translation and localisation in 300+ languages ... with volunteers. The best practices.
abstract There are over 270 Wikipedias and over 30 languages are requesting one. These languages represent most scripts and represent small and large populations. The WMF enables the visibility of text with web fonts, it supports input methods, there is a big multi application localisation platform at translatewiki.net and we are implementing translation tools for our "pages" for documentation and communication to our communities. To do this, we rely on standards. Standards that get more relevance as they are implemented in more and more places in our software. Some standards are not supporting all the languages that we support.
[Chair, Jan Nelson • Scribe, Jirka Kosek]
Spyridon Pilos
Directorate General for Translation - European Commission
The Machine Translation Service of the European Commission
abstract The Directorate General for Translation (DGT) has been developing since October 2010 a new data-driven machine translation service for the European Commission. MT@EC should be operational in the second semester of 2013. One of the key requirements is for the service to be flexible and open: it should enable, on one hand, the use of any type of language resources and any type of MT technology and, on the other, facilitate easy access by any client (individual or service). The speaker will present the approach taken and highlight problems identified, as pointers to broader needs that should be addressed.
Matjaž Horvat
Mozilla
Live website localization
abstract Pontoon is a live website localization tool developed at Mozilla. Instead of extracting website strings and then merging translated strings back, Pontoon can turn any website into editable mode. This enables localizers to translate websites in-place and also provides context and spatial limitations
David Lewis
Centre for Next Generation Localisation at Trinity College Dublin
Meta-data interoperability between CMS, localisation and machine translation: Use Cases and Technical Challenges
abstract January 2012 saw the kick off of the MLW-LT ("Multilingual Web -
Language Related Technologies") Working Group at the W3C as part of the
Internationalization Activity. This WG will define meta-data for web
content (mainly HTML5) and "deep Web" content (CMS or XML files from
which HTML pages are generated) that facilitates content interaction
with multilingual language technologies such as machine translation, and
localization processes. The WG brings together localisation
and content management companies with content and language meta-data
research
expertise, including strong representation from the Centre for Next
Generation
Localisation. This talk will present three concrete business
use cases that span CMS, localisation and machine translation functions.
It discusses the challenges in addressing these cases with existing
metadata (e.g. ITS tags) and the technical requirements for additional
standardised metadata. This talk will be complemented by a breakout
session to allow attendees to voice their comments and requirements in
more detail, in order to better inform the working group.
[Chair, Arle Lommel • Scribe, Charles McCathieNevile]
Peter Schmitz
Publications Office
Common Access to EU Information based on semantic technology
abstract Publications Office is setting up a common repository to make available at a single place all metadata and digital content related to public official EU information (law and publications) in a harmonised and standardised way in order: to guarantee to the citizen a better access to law and publications of the European Union; to encourage and facilitate reuse of content and metadata by professionals and experts. The common repository is based on semantic technology. At least all official lanuages of the EU are supported by the system, thus the system is a practical example of a multilingual system accessible through the WEB.
Paul Buitelaar
DERI - National University of Ireland, Galway
Ontology Lexicalisation and Localisation for the Multilingual Semantic Web
abstract Although knowledge processing on the Semantic Web is inherently language-independent, human interaction with semantically structured and linked data will be text or speech based – in multiple languages. Semantic Web development is therefore increasingly concerned with issues in multilingual querying and rendering of web knowledge and linked data. The Monnet project on 'Multilingual Ontologies for Networked Knowledge' provides solutions for this by offering methods for lexicalising and translating knowledge structures, such as ontologies and linked data vocabularies. The talk will discuss challenges and solutions in ontology lexicalisation and translation (localisation) by way of several use cases that are under development in the context of the Monnet project.
Tadej Štajner
Jožef Stefan Institute
Cross-lingual named entity disambiguation for concept translation
abstract The talk will focus on our experience in developing an integrated natural language processing pipeline, consisting of several distinct components, operating across multiple languages. We will demonstrate a cross-language information retrieval method that enables reuse of the same language resources across languages, by using a knowledge base in one language to disambiguate named entities in text, written in another language, as developed in the Enrycher system (enrycher.ijs.si). We discuss the architectural implications of this ability on the development practices and its prospects as a tool for automated translation of specific concepts and phrases in a content management system.
[Chair, Felix Sasaki • Scribe, Reinhard Schäler]
Annette Marino
European Commission
Web translation, public service & participation
abstract For most Europeans, the internet provides the only chance they have for direct contact with the EU. But how can we possibly inform, communicate and interact with the public if we don't speak their language on the web? With the recent launch of the European Citizens' Initiative website in 23 languages, there's no doubting the role of web translation in participatory democracy, or the Commission's commitment to a multilingual web presence. But as well as enthusiasm, we need understanding – of how people use websites and social media, and what they want from us – so that we can make best use of translation resources to serve real needs. As the internet evolves, the Commission is on a steep learning curve, working to keep up with the posssibilities - and pitfalls - of web communication in a wide range of languages.
Murhaf Hossari
University College Dublin
Localizing to right-to-left languages: main issues and best practices
abstract Internationalization and Localization efforts need to take extra care when dealing with right-to-left languages due to specific features those languages have. Many localization issues are specific to right-to-left languages. The talk will try to categorize those issues that face localizers when dealing with right-to-left languages with special focus on text direction part and handling bidirectionality. The talk will also mention best practices and areas for improvements.
Nicoletta Calzolari
CNR-ILC
The Multilingual Language Library
abstract The Language Library is a quite new initiative – started with LREC 2012 – conceived as a facility for gathering and making available, through simple functionalities, all the linguistic knowledge the field is able to produce, putting in place new ways of collaboration within the Language Technology community. Its main characteristic is to be collaboratively built, with the entire community providing/enriching language resources by annotating/processing language data and freely using them. We exploit today's trend towards sharing for initiating a collective movement that works also towards creating synergies and harmonisation among different annotation efforts that are now dispersed. The Language Library could be considered as the beginning of a big Genome project for languages, where the community will collectively deposit/create increasingly rich and multi-layered linguistic resources, enabling a deeper understanding of the complex relations between different annotation layers.
Fernando Serván
Food and Agriculture Organization of the United Nations
Repurposing language resources for multilingual websites
abstract This presentation will address lessons learned regarding the reuse of language resources (translation memories in TMX, terminology databases
in TBX) to improve content and language versions of webpages. It will address the need for better integration between existing standards, the needs for interoperability and the areas where standards and best practices could help organizations with a multilingual mandate.
[Chair, Tadej Štajner • Scribe, Arle Lommel]
At the Hotel Parc Belle-Vue
details In order to provide additional opportunities for networking, a light finger buffet was held at the Hotel Parc Belle-Vue in Luxembourg.
16 March
Jaap van der Meer
TAUS
Explanation of the format for the morning, and selection of discussion topics. Topics are suggested by participants, and the most popular are allocated to breakout groups. A chair is chosen for each group from volunteers.
Break-out discussions
Various locations are available for breakout groups. Participants can join whichever group they find interesting, and can switch groups at any point. Group chairs facilitate the discussion and ensure that notes are taken to support the summary to be given to the plenary.
Group reports and discussion
Everyone meets again in the main conference area and each breakout group presents their findings. Other participants can comment and ask questions. What follows are the presentations by breakout group chairs.
Dave Lewis
MultilingualWeb-LT Conclusions
Timo Honkela
Semantic Resources and Machine Learning for Quality, Efficiency and Personalisation of Accessing Relevant Information over Language Borders
Anabela Barreiro
Speech Technologies for the Multilingual Web
Christian Lieske
MultilingualWeb, Linked Open Data & EC "Connecting Europe Facility"
Elena Rudeshko
Tools: issues, needs, trends
Manuel Tomas Carrasco Benitez
Multilingual Web Sites
Konrad Fuhrman
Language policy on multilingual websites
Social media links