Luxembourg Workshop

2012 Luxembourg Workshop Program

W3C Workshop Program:
The Multilingual Web – The Way Ahead
15 - 16 March 2012, Luxembourg

Endorsed by

IIT logo

Workshop sponsors

Become a sponsor.

The MultilingualWeb project is looking at best practices and standards related to all aspects of creating, localizing and deploying the Web multilingually. Coordinated by the W3C, the project aims to raise the visibility of existing best practices and standards and identify gaps. This fourth workshop in Luxembourg, was hosted by the Directorate-General for Translation (DGT) of the European Commission.

After the keynote speech, each main session on the first day contained a series of 15 minute talks followed by some time for questions and answers. On the second day, the workshop lasted for the morning only, and was dedicated to an Open Space discussion forum, where participants could discuss the themes of the workshop in breakout sessions. This was facilitated by TAUS. All attendees participated in all sessions.

The IRC log is the raw scribe log, which has not undergone careful post-editing and may contain errors or omissions. It should be read with that in mind. It constitutes the best efforts of the scribes to capture the gist of the talks and discussions that followed, in real time. IRC was used not only to capture notes on the talks, but can be followed in real time by remote participants, or participants with accessibility problems. People following on IRC can also add contributions to the flow of text themselves.

We recommend that you follow the links to video recordings. They are short, but convey the speaker's full talk. In some cases video links are unavailable because speakers have not given us permission to post the video. You can also find links to all videos on the VideoLectures workshop page. Thanks to VideoLectures for hosting the videos.

Related links: Workshop report • About W3C

15 March

0900

Welcome

Richard Ishida

W3C Internationalization Activity Lead & MultilingualWeb Project Coordinator

Workshop logistics

Piet Verleysen

European Commission - Resources Director, responsible for IT at the Directorate-General for Translation

Brief welcome address

Kimmo Rossi

European Commission - DG INFSO E1

EC language programs – the challenge for the future

Slides

IRC

0930

Keynote

Iván Herman

Semantic Web Activity Lead, W3C

What is Happening in the W3C Semantic Web Activity?

abstract

This talk gives a short overview of the current work done at the W3C related to the Semantic Web, Linked Data, and related technical issues. The goal is not to give a detailed technical account but, rather, to give a general overview and use this is a basis for further discussions on how that particular technology can be used for the general issue of Multilingual Web

Slides

IRC

Video

1015

Break

1045

Developers

Jan Anders Nelson

Microsoft

Support for Multilingual Windows Web Apps (WWA) and HTML 5 in Windows 8

abstract

This talk looks at support for web apps running on Windows8 and steps through the workflow of how a developer can optimize creation of multilingual applications, organizing their app projects to support translation and related tools available that make creation of multilingual apps easy for anyone considering shipping in more than one market, perhaps for the first time.

Slides

IRC

Video

Tony Graham

Mentea

XSL-FO meets the Tower of Babel

abstract

XSL Formatting Objects (XSL-FO) from the W3C is an XML vocabulary for specifying formatting semantics. While it shares many properties with CSS, it is most frequently used with paged media, such as formatting XML documents as PDF. XSL-FO 2.0 is currently under development, and one of its top-level requirements is for further improved non-Western language support. However, the requirement for improved support in XSL-FO 2.0 is actually less specific than the 1998 requirements for XSL 1.0 since we recognised that we didn't have the knowledge and expertise to match our ambitions. For that, we would need more help -- either from individual experts or from the W3C forming more task forces along the lines of the Japanese Layout Task Force to capture and distill expertise for use by all of the W3C and beyond.

Slides

IRC

Video

Richard Ishida

W3C

Jirka Kosek

University of Economics, Prague

HTML5 i18n: A report from the front line

abstract

This talk will brief attendees on developments related to some key markup changes in HTML5. In addition to new markup to support bidirectional text (eg. in scripts such as Arabic, Hebrew and Thaana), the Internationalization Working Group at the W3C has been proposing changes to the current HTML5 ruby markup model. Ruby are annotations used in Japanese and Chinese to help readers recognize and understand ideographic characters. It is commonly used in educational material, manga, and can help with accessibility. The Working Group has also proposed the addition of a flag to indicate when text in a page should not be translated. This talk will deliver an up-to-the-minute status on the progress made in these areas.

Slides

IRC

Video

[Chair, Reinhard Schäler • Scribe, Felix Sasaki]

1130

Q&A

IRC

Video

1145

Creators

Brian Teeman

JoomlalShack University

Building Multi-Lingual Web Sites with Joomla! the leading open source CMS

abstract

Joomla is the leading Open Source CMS used by over 2.8% of the web and by over 3000 government web sites. With the release in Jan 2012 of the latest version (2.5) building and creating truly multi-lingual web sites has never been easier. This presentation will show how easy it is to build real multi-lingual web sites and not to rely on automated translation tools.

Slides

IRC

Video

Loïc Dufresne de Virel

Intel Corp.

How standards (could) support a more efficient web localization process by making CMS - TMS integrations less complicated

abstract

As Intel just deployed a new Web Content Management system, which we integrated into our TMS, We had to deal with multiple challenges, and also faced a great deal of complexity and customization. In this 15-min talk, I will look into what we did well, what we did wrong, what we could have done better, and will try to put a dollar figure on the cost of ignoring a few standards...

Slides

IRC

Video

Gerard Meijssen

Wikimedia Foundation

Translation and localisation in 300+ languages ... with volunteers. The best practices.

abstract

There are over 270 Wikipedias and over 30 languages are requesting one. These languages represent most scripts and represent small and large populations. The WMF enables the visibility of text with web fonts, it supports input methods, there is a big multi application localisation platform at translatewiki.net and we are implementing translation tools for our "pages" for documentation and communication to our communities. To do this, we rely on standards. Standards that get more relevance as they are implemented in more and more places in our software. Some standards are not supporting all the languages that we support.

Slides

IRC

Video

[Chair, Jan Nelson • Scribe, Jirka Kosek]

1230

Q&A

IRC

Video

1300

Lunch

1400

Localizers

Spyridon Pilos

Directorate General for Translation - European Commission

The Machine Translation Service of the European Commission

abstract

The Directorate General for Translation (DGT) has been developing since October 2010 a new data-driven machine translation service for the European Commission. MT@EC should be operational in the second semester of 2013. One of the key requirements is for the service to be flexible and open: it should enable, on one hand, the use of any type of language resources and any type of MT technology and, on the other, facilitate easy access by any client (individual or service). The speaker will present the approach taken and highlight problems identified, as pointers to broader needs that should be addressed.

Slides

IRC

Video pending

Matjaž Horvat

Mozilla

Live website localization

abstract

Pontoon is a live website localization tool developed at Mozilla. Instead of extracting website strings and then merging translated strings back, Pontoon can turn any website into editable mode. This enables localizers to translate websites in-place and also provides context and spatial limitations

Slides

IRC

Video

David Lewis

Centre for Next Generation Localisation at Trinity College Dublin

Meta-data interoperability between CMS, localisation and machine translation: Use Cases and Technical Challenges

abstract

January 2012 saw the kick off of the MLW-LT ("Multilingual Web - Language Related Technologies") Working Group at the W3C as part of the Internationalization Activity. This WG will define meta-data for web content (mainly HTML5) and "deep Web" content (CMS or XML files from which HTML pages are generated) that facilitates content interaction with multilingual language technologies such as machine translation, and localization processes. The WG brings together localisation and content management companies with content and language meta-data research expertise, including strong representation from the Centre for Next Generation Localisation. This talk will present three concrete business use cases that span CMS, localisation and machine translation functions. It discusses the challenges in addressing these cases with existing metadata (e.g. ITS tags) and the technical requirements for additional standardised metadata. This talk will be complemented by a breakout session to allow attendees to voice their comments and requirements in more detail, in order to better inform the working group.

Slides

IRC

Video

[Chair, Arle Lommel • Scribe, Charles McCathieNevile]

1445

Q&A

IRC

Video

1500

Machines

Peter Schmitz

Publications Office

Common Access to EU Information based on semantic technology

abstract

Publications Office is setting up a common repository to make available at a single place all metadata and digital content related to public official EU information (law and publications) in a harmonised and standardised way in order: to guarantee to the citizen a better access to law and publications of the European Union; to encourage and facilitate reuse of content and metadata by professionals and experts. The common repository is based on semantic technology. At least all official lanuages of the EU are supported by the system, thus the system is a practical example of a multilingual system accessible through the WEB.

Slides

IRC

Video

Paul Buitelaar

DERI - National University of Ireland, Galway

Ontology Lexicalisation and Localisation for the Multilingual Semantic Web

abstract

Although knowledge processing on the Semantic Web is inherently language-independent, human interaction with semantically structured and linked data will be text or speech based – in multiple languages. Semantic Web development is therefore increasingly concerned with issues in multilingual querying and rendering of web knowledge and linked data. The Monnet project on 'Multilingual Ontologies for Networked Knowledge' provides solutions for this by offering methods for lexicalising and translating knowledge structures, such as ontologies and linked data vocabularies. The talk will discuss challenges and solutions in ontology lexicalisation and translation (localisation) by way of several use cases that are under development in the context of the Monnet project.

Slides

IRC

Video

Tadej Štajner

Jožef Stefan Institute

Cross-lingual named entity disambiguation for concept translation

abstract

The talk will focus on our experience in developing an integrated natural language processing pipeline, consisting of several distinct components, operating across multiple languages. We will demonstrate a cross-language information retrieval method that enables reuse of the same language resources across languages, by using a knowledge base in one language to disambiguate named entities in text, written in another language, as developed in the Enrycher system (enrycher.ijs.si). We discuss the architectural implications of this ability on the development practices and its prospects as a tool for automated translation of specific concepts and phrases in a content management system.

Slides

IRC

Video

[Chair, Felix Sasaki • Scribe, Reinhard Schäler]

1545

Q&A

IRC

Video

1600

Break

1630

Users

Annette Marino

European Commission

Web translation, public service & participation

abstract

For most Europeans, the internet provides the only chance they have for direct contact with the EU. But how can we possibly inform, communicate and interact with the public if we don't speak their language on the web? With the recent launch of the European Citizens' Initiative website in 23 languages, there's no doubting the role of web translation in participatory democracy, or the Commission's commitment to a multilingual web presence. But as well as enthusiasm, we need understanding – of how people use websites and social media, and what they want from us – so that we can make best use of translation resources to serve real needs. As the internet evolves, the Commission is on a steep learning curve, working to keep up with the posssibilities - and pitfalls - of web communication in a wide range of languages.

Slides

IRC

Video pending

Murhaf Hossari

University College Dublin

Localizing to right-to-left languages: main issues and best practices

abstract

Internationalization and Localization efforts need to take extra care when dealing with right-to-left languages due to specific features those languages have. Many localization issues are specific to right-to-left languages. The talk will try to categorize those issues that face localizers when dealing with right-to-left languages with special focus on text direction part and handling bidirectionality. The talk will also mention best practices and areas for improvements.

Slides

IRC

Video

Nicoletta Calzolari

CNR-ILC

The Multilingual Language Library

abstract

The Language Library is a quite new initiative – started with LREC 2012 – conceived as a facility for gathering and making available, through simple functionalities, all the linguistic knowledge the field is able to produce, putting in place new ways of collaboration within the Language Technology community. Its main characteristic is to be collaboratively built, with the entire community providing/enriching language resources by annotating/processing language data and freely using them. We exploit today's trend towards sharing for initiating a collective movement that works also towards creating synergies and harmonisation among different annotation efforts that are now dispersed. The Language Library could be considered as the beginning of a big Genome project for languages, where the community will collectively deposit/create increasingly rich and multi-layered linguistic resources, enabling a deeper understanding of the complex relations between different annotation layers.

Slides

IRC

Video

Fernando Serván

Food and Agriculture Organization of the United Nations

Repurposing language resources for multilingual websites

abstract

This presentation will address lessons learned regarding the reuse of language resources (translation memories in TMX, terminology databases in TBX) to improve content and language versions of webpages. It will address the need for better integration between existing standards, the needs for interoperability and the areas where standards and best practices could help organizations with a multilingual mandate.

Slides

IRC

Video

[Chair, Tadej Štajner • Scribe, Arle Lommel]

1730

Q&A

IRC

Video

1745

End

1930

Evening reception

At the Hotel Parc Belle-Vue

details

In order to provide additional opportunities for networking, a light finger buffet was held at the Hotel Parc Belle-Vue in Luxembourg.

16 March

0900

Set up

Jaap van der Meer

TAUS

Explanation of the format for the morning, and selection of discussion topics. Topics are suggested by participants, and the most popular are allocated to breakout groups. A chair is chosen for each group from volunteers.

0930

Open space

Break-out discussions

Various locations are available for breakout groups. Participants can join whichever group they find interesting, and can switch groups at any point. Group chairs facilitate the discussion and ensure that notes are taken to support the summary to be given to the plenary.

1100

Break

1130

Open space

Group reports and discussion

Everyone meets again in the main conference area and each breakout group presents their findings. Other participants can comment and ask questions. What follows are the presentations by breakout group chairs.