Today, the World Wide Web is fundamental to communication in all walks of life. As the share of English web pages decreases and that of other languages increases, it is vitally important to ensure the multilingual success of the World Wide Web.
The MultilingualWeb project is looking at best practices and standards related to all aspects of creating, localizing and deploying the Web multilingually. The project aims to raise the visibility of existing best practices and standards and identify gaps. The core vehicle for this is a series of four events which are planned over a two year period.
On 21-22 September 2011 the W3C ran the third workshop in the series, in Limerick, entitled "A Local Focus for the Multilingual Web". The Limerick workshop was hosted by the University of Limerick. Kieran Hodnett, Dean of the Faculty of Science and Engineering, gave a brief welcome address.
As for the previous workshops, the aim of this workshop was to survey, and introduce people to, currently available best practices and standards that are aimed at helping content creators, localizers, tools developers, and others meet the challenges of the multilingual Web. The key objective was to share information about existing initiatives and begin to identify gaps.
Unlike the previous workshops, due to co-location with the 16th LRC Conference, this event ran for one and a half days. In another departure, the final half day was dedicated to an Open Space discussion forum, in breakout sessions, rather than to presentations. Participants pooled ideas for discussion groups at the beginning of the morning, split into 6 breakout areas and reported back in a plenary session at the end of the morning. Participants could join whichever group they find interesting, and could switch groups at any point. During the reporting session participants in other groups could ask questions of or make comments about the findings of the group. This proved to be an extremely popular part of the workshop, and several participants wanted to continue working on the things they discussed after the event, with a view to meeting again for further discussion during the next workshop. The final attendance count for the event was 85.
As for previous workshops, we video-recorded the presenters and with the assistance of VideoLectures, made the video available on the Web. We were unable to stream the content live over the Web as we did in Pisa. We also once more made available live IRC scribing to help people follow the workshop remotely, and assist participants in the workshop itself. As before, there were numerous people tweeting about the conference and the speakers during the event, and you can see these linked from the program page.
The program and attendees continued to reflect the same wide range of interests and subject areas as in previous workshops and we once again had good representation from industry (content and localization related) as well as research.
In what follows, after a short summary of key highlights and recommendations, you will find a short summary of each talk accompanied by a selection of key messages in bulleted list form. Links are also provided to the IRC transcript (taken by scribes during the meeting), video recordings of the talk (where available), and the talk slides. All talks lasted 15 minutes. Finally, there are summaries of the breakout session findings, most of which are provided by the participants themselves.
The next workshop will take place in Luxembourg, on 15-16 March, 2012.
Contents: Summary • Welcome • Developers • Creators • Localizers • Machines • Users • Policy • Breakouts
What follows is an analysis and synthesis of ideas brought out during the workshop. It is very high level, and you should watch the individual speakers talks to get a better understanding of the points made.
Our keynote speaker, Daniel Glazman, reviewed the progress made by CSS and HTML in terms of internationalization support. He exhorted content authors to always write documents in utf-8 and tag them with language information. He also encouraged the use of content negotiation in order to help people reach resources in their own language more easily. This latter point was also taken up later in the Creators session, and in one of the discussion groups.
He called for browser implementers to quickly implement the new HTML5 bdi element, to support 'start' and 'end' values instead of 'left' and 'right', and to address other problems with mixed direction content in forms. He warned, however, that more pressure is needed from users, especially Japanese and Chinese, to encourage browser developers to support their language features.
He also said that programming languages don't pay enough attention to internationalization issues at an early enough stage, and JavaScript has poor internationalization.
Finally, he intimated that ePub and related standards are likely to be an important way of packaging documents in the future.
In the Developers session we heard from Christian Lieske how ITS (the W3C's Internationalization Tag Set) is being applied today, but how there is a need for new data categories to be supported, and application of ITS to additional usage scenarios, such as HTML5. The MultilingualWeb-LT project, introduced by David Filip, plans to address these concerns and others in its work to improve and provide reference implementations for metadata through the authoring and localization process chain.
We heard some more about various useful new features for script support in CSS, as well as some pitfalls to avoid, from Gunnar Bittersmann. He proposed that there was a need for a :script selector in CSS, although this was disputed during the Q&A that followed.
During the Creators session, Moritz Hellwig spoke about his experiences with Content Management Systems (CMS) and how the lack of a simple method to reformat translated text content and integrate media into translated text could cause projects to be abandoned. He called for the gap between CMS and Language Service Provider (LSP) communities to be closed. He said that we need a metadata standard that covers issues such as terminology and domain information (content domains and audience domains), and widely adopted open definitions for content import and export functions. The translation workflow should also allow shipping of other information, over and above text.
Danielle Boßlet took up the topic of user navigation again, calling on content developers to put focus on users and their needs when creating web sites. She called for all information (including site maps and indexes) to be available in all languages offered, and for more attention to be paid to links so that users can clearly see what information is available in their own language.
Lise Bisonnette Janody concluded the session with a call for better processes to ensure the development of effective content, and a number of ideas about how to achieve that.
The Localizers session began with an analysis of how to improve the efficiency of the translator's workbench from Matthias Heyn. He ended his talk by predicting that reviewer productivity (in addition to translator productivity) would soon become an important topic, and that another emerging theme would be automation of the broader production chain.
Asanka Wasala demonstrated a crowd-sourced approach to enabling more content for the linguistic long tail of the Web, where many languages are still under served, leaving people without access to information. For him, a key standardization issue is the divergent ways in which browsers and operating systems handle the rendering of complex scripts. He also called for more standardization with regards to extension development across browsers.
From Sukumar Munshi we had a review of the importance of interoperability, and the plea for an initiative to help people collaborate and to facilitate work on standards.
In the Machines session we had overviews of ongoing best practices in news sentiment analysis, NLP interchange formats and Web services for accessing lexical resources.
Thomas Dohmen said that there is a need for a standard API that is easy to integrate and rich enough for domain-specific machine translation, and that data collection standards are needed for markup to tell you about the quality of the data being used to train machine translation systems.
Sebastian Hellmann and Yoshihiko Hayashi showed how standards had been used in their projects.
The Users session included a talk about personalization from Alexander O'Connor and an overview of best practices in content development and community building from Olaf-Michael Stefanov.
Alex proposed that there is an enormous opportunity and a great need to include globalization and internationalization in the personalized future of the web.
Olaf-Michael showed some clever ways of tracking changes in a wiki where there was no specific source language, and highlighted the Linport project, which defines a universal way to put everything needed for a translation project into one electronic package for transfer among stakeholders. This type of standard was later discussed in a breakout session.
Kicking off the Policy session, Gerhard Budin said that we need a modelling approach to terminological semantics that includes the modelling of cultural differences (only some of which are expressed linguistically), and an integrative model of semantic interoperability in order to (semi-)automate cross-lingual linkages, inferences, translations and other operations. He also called for more networking among stakeholders to better link the many diverse projects together and bring forward successful elements into sustainable structures once a project ends.
Georg Rehm followed with an overview of the status of the META-NET project, and in particular drew attention to a number of white papers that are in development and which each provide information about the state of one European language in the digital age including the respective support through Language Technology.
In the final presentation, Arle Lommel reviewed the state of GALA, and called for localization standards to be written in a more user friendly way, rather than by geeks for geeks. He also called for more coordination in order to avoid incompatible standards, and more guidance about how to use standards and the business case they address.
The second day was devoted to Open Space breakout sessions. The session topics were: Standardization; Translation Container Standards; LT-Web; Multilingual Social Media; and User Focused Best Practices.
Among the proposals arising from these discussions was a way of achieving automation and operability in translation container standards that goes beyond Linport in terms of scope. Another recommendation was a stronger coupling between CMS and translation memory systems, and an endorsement of the value of backing standardization with implementations and test suites, such as the MultilingualWeb-LT project will deliver.
The best practices group documented a number of recommendations. These include producing a simplified version of existing WCAG (Web Content Accessibility Guidelines) 2.0 documents, and involving translators, localizers and designers more in the design and training related to web accessibility guidelines. There were also recommendations about how to adapt Web content for educational users, and a call to protect endangered languages by providing more localized content. As previously mentioned, there was also a call for more focus on linking strategies, to help people better understand what is and is not available to them in their language.
Another group discussed making a list of currently available standards, and making that list available to the public as a way of raising awareness about such standards, and informing future work on those standards.
Daniel Glazman, of Disruptive Innovations, and W3C CSS Working Group Co-Chair gave the keynote speech, entitled "Babel 2012 on the Web". In his talk Daniel said that if Open Web and Internet Standards were mostly western-centric in the early years, things have drastically changed. English is not any more the most common language on the net and the various standards bodies have improved the support for the languages and scripts of the world. The new cool kids on the block 2012 will be HTML5, CSS3, EPUB3 and this talk showed us how Standards are paving the way for the Multilingual Web in those areas.
The developers Session was chaired by Tadej Štajner of the Institut Jožef Stefan.
Dr. David Filip , Senior Researcher at the Localisation Research Centre (LRC) and the Centre for Next Generation Localisation (CNGL), started off the Developers session with a talk entitled, "MultilingualWeb-LT: Meta-data interoperability between Web CMS, Localization tools and Language Technologies at the W3C". MultilingualWeb-LT, an FP7 funded coordination action, is going to set up a W3C Working Group to standardize metadata exchange between Web Content Management Systems, Localization Tools and Language Technologies. This session opened the public discussion of the working group's charter and encouraged participation in the working group from outside of the initial EC funded consortium. Key points:
Christian Lieske, Knowledge Architect at SAP AG, talked about "The Journey of the W3C Internationalization Tag Set – Current Location and Possible Itinerary". His coauthors were Felix Sasaki of DFKI, Yves Savourel of Enlaso Jirka Kosek of the University of Economics in Prague, and Richard Ishida of the W3C. Two questions were addressed: Where is the W3C Internationalization Tag Set (ITS) today?, and Where may ITS be heading? In addition, a brief introduction to ITS was provided - it highlighted that ITS is a set of internationalization-related concepts that may not just be applicable in an XML world.
Gunnar Bittersmann, Web Developer at brands4friends, presented "CSS & i18n: dos and don'ts when styling multilingual Web sites". The talk covered best practices and pitfalls when dealing with languages that create large compound words (like German), languages with special capitalization rules (again, like German), or languages written in right-to-left scripts. This includes things like box sizes, box shadows and corners, image replacement etc. It also covers benefits that new CSS3 properties and values offer in terms of internationalization, a discussion about whether the :lang pseudo-class selector meets all needs or if there's more to wish for, and how to implement style sheets for various languages and scripts (all rules in a single file or spread over multiple files?). The talk was of practical rather than theoretical nature. Key points:
The Developers session on the first day ended with a Q&A question about language negotiation, and a suggestion that it should be possible to identify the original language in which a page was written.
This session was chaired by Charles McCathieNevile of Opera Software.
Moritz Hellwig, Consultant at the Cocomore, gave the first talk for the Creators session, "CMS and Localisation – Challenges in Multilingual Web Content Management". Because he was unable to make the trip, this talk was delivered as a pre-recorded video. Content Management Systems (CMS) have come to be widely used to provide and manage content on the Web. As such, CMS are increasingly used for multilingual content, which presents new challenges to developers and content providers. This presentation explored these challenges and showed how and why a closer alignment of CMS developers and LSP can improve translation management, workflows and quality. Key points:
Danielle Boßlet, Translator, spoke on "Multilinguality on Health Care Websites - Local Multi-Cultural Challenges". Global health care organizations like the World Health Organization have to present their websites in a variety of languages to make sure that as many people as possible can benefit from their online offer. The same applies to the European Union, which publishes its official documents in 23 languages and therefore has to guarantee that its websites are equally multilingual. Due to the fact that Germany is a country with a large number of immigrants, the government and other official institutions would do well to present their websites not only in German or English, but also in other languages, like Turkish or Russian. The websites of the WHO, the EU and some German institutions were checked for their multilingual offerings and possible shortcomings of the different language versions. The severest and most frequent shortcomings and their consequences for users were highlighted in this talk. Key points:
Lise Bissonnette Janody, Content Consultant at Dot-Connection, presented "Balance and Compromise: Issues in Content Localization". Web content managers need to make choices with respect to the content they translate and localize on their websites. What guides these decisions? When in the process should they be made? What are their impacts? This talk provided a high-level overview of these choices, and how they fit into the overall content strategy cycle. Key points:
The Q/A part of the Creator session began with questions for Danielle about how she gathered her data, and a couple of audience members contributed additional information. One mentioned that a significant factor that hinders large organizations to meet user needs is fear of imperfections, but another is resources. Crowd sourced resources may be able to help. Another question asked to what extent are standards being applied? There's a time lag for adoption, and the proliferation of content on the Web is driving issues. W3C recommendations from W3C for page design are commonly followed, but WHO doesn't offer content negotiation. For more details, see the related links.
This session was chaired by Christian Lieske of SAP.
Matthias Heyn, VP Global Solutions at SDL, talked about "Efficient translation production for the Multilingual Web". The translation editor has seen major technological advances over the last years. Compared to classic translation memory applications, current systems allow expert users to double, if not triple, the amount of words translated. Whereas the key technology advances are in the area of sub-segment reuse and statistical machine translation (SMT), the actual productivity gains relate to the ergonomics of how systems allow users to interact, control and automate the various data sources. This presentation reviewed key capabilities on the various document, segment and sub-segment levels like: Document level SMT, TrustScore, dynamic routing, dynamic preview; Match type differentiation, Auto-propagation, SMT integration and SMT configurations, segment-level SMT trust scores and feedback cycles (segment level); Auto-suggest dictionary and phrase completions (sub-segment level). The discussed capabilities were brought into perspective of how the vast amount of multilingual online content are affected by such innovation. Key points:
Asanka Wasala, PhD Student from the Centre for Next Generation Localisation (CNGL) and the Localisation Research Centre (LRC), talked about "A Micro Crowdsourcing Architecture to Localize Web Content for Less-Resourced Languages". He reported on a novel browser extension-base,d client-server architecture using open standards that allows localization of web content using the power of the crowd. The talk addressed issues related to MT-based solutions and proposed an alternative approach based on translation memories (TMs). The approach is inspired by Exton et al. (2009) on real-time localization of desktop software using the crowd and Wasala and Weerasngihe (2008) on browser based pop-up dictionary extensions. The architectural approach chosen enables in-context real-time localization of web content supported by the crowd. To best of his knowledge, this is the only practical web content localization methodology currently being proposed that incorporates Translation Memories. The approach also supports the building of resources such as parallel corpora – resources that are still not available for many, but especially for under-served languages. Key points:
Sukumar Munshi, Corporate Development Officer at Across Systems, spoke about "Interoperability standards in the localization industry – Status today and opportunities for the future". Unable to attend the workshop at the last minute, Sukumar provided a pre-recorded video presentation. Interoperability and related standards are topics still frequently and controversially discussed. While standards such as TMX and TBX are established within the industry, others, such as XLIFF are rated differently and not that widely implemented. This presentation covered the current status of interoperability in the localization and translation industry, historical development, understanding of interoperability, related business requirements, effects on delivery models, interoperability between tools, open standards, current challenges and opportunities for the future. Key points:
The Q&A dwelt briefly on crowd sourcing considerations. A comment was also made that initiatives, such as Interoperability Now, should be sure to talk with standards bodies at some point. It was mentioned that the W3C has started up a Community Group program to enable people to discuss and develop ideas easily, and then easily take them on to standardization if it is felt that it is appropriate. For details, see the related links.
This session was chaired by Felix Sasaki of DFKI.
Thomas Dohmen, Project Manager at SemLab talked about "The use of SMT in financial news sentiment analysis". Statistical Machine Translation systems are a welcome development for news analytics. They enable topic-specific translation services, but are not without problems. The SMT system that is developed for the Let'sMT (FP7) project is trained and used to translate financial news for SemLab's news sentiment analysis platform. This talk gave an example of the benefits and problems of integrating such systems. Key points:
Sebastian Hellmann, Researcher at the University of Leipzig, spoke about the "NLP Interchange Format (NIF)". NIF is an RDF/OWL-based format that allows developers to combine and chain several NLP tools in a flexible, light-weight way. The core of NIF consists of a vocabulary, which can represent Strings as RDF resources. A special URI design is used to pinpoint annotations to a part of a document. These URIs can then be used to attach arbitrary annotations to the respective character sequence. Based on these URIs, annotations can be interchanged between different NLP tools. Although NLP Tools are abundantly available on all linguistic levels for the English language, this is often not the case for languages with fewer speakers. Thus, it becomes especially necessary to create a format that allows the integration and interoperability of NLP tools. With respect to multilinguality, two use cases come to mind: 1. an already existing English software system, that uses an English NLP tool needs to be ported to another language. The NLP tool for the other language is not compatible to the system, because there is no common interface (Example: A CMS with keyword extraction). 2. Paragraphs in different kinds of documents can be annotated in RDF with multilingual translations that can potentially remain stable over the life-time of a document. Especially, the introduced URI recipe (Context-Hash) possesses advantageous properties, which withstand comparison to other URI naming approaches. Key points:
Yoshihiko Hayashi, Professor at the Osaka University, gave a talk about "LMF-aware Web services for accessing lexical resources". This talk demonstrated that Lexical Markup Framework (LMF), the ISO standard for modeling and representing lexicons, can be nicely applied to the design and implementation of lexicon access Web services, in particular, when the service is designed with so-called RESTful style. As the implemented prototype service provides access to bilingual/multilingual semantic resources, in addition to standard WordNets, slight revisions to the LMF specification were also proposed. Key points:
Topics discussed during the Q&A session included whether XPointer is an alternative for the RDF-based string annotation described by Sebastian, and why Sparql can't be used to access the LMF data. There was a suggestion that moving Sebastian's data into XML and overlaying it with XML:TM would allow for translation memory at a segment level. For the details, follow the related links.
This session was chaired by Reza Keschawarz of the LTC.
Alexander O'Connor, Research Fellow (DCM) at Trinity College Dublin and the Centre for Next Generation Localisation (CNGL), started the Users session with a talk entitled "Digital Content Management Standards for the Personalized Multilingual Web". The World Wide Web is at a critical phase in its evolution right now. The user experience is no longer limited to a single offering in a single language. Localisation has offered a web of many languages to users, and this is now becoming a hyper-focused tailoring that makes each web experience different for each user. The need to address the key requirements of a web which is real-time, personal and in the right language is paramount to the future of how information is consumed. This talk discussed the key trends in personalization, with particular focus on work being undertaken in the Digital Content Management track of the CNGL, and provided an insight into current and future trends, both in research and in the living web. Key points:
Olaf-Michael Stefanov, Co-administrator, Multilingual Website at JIAMCATT, talked about how "An Open Source tool helps a global community of professionals shift from traditional contacts and annual meetings to continuous interaction on the web". The challenges of maintaining and developing a multilingual web site with open source software tools and crowd-sourced translations, for a community of professional translators and terminologists working for international organizations and multilateral bodies where that "community" has no budget, depends on members' contributions in kind, but continues to grow, and has been growing since 1987, using an Open Source tool which supports multilingualism to provide a complex support site for an international working group on language issues. The talk explored how use of the Tiki CMS Wiki Groupware software made it possible to provide an ongoing interactive support site for JIAMCATT, helping convert the "International Annual Meeting on Computer-Assisted Translation and Terminology" into an ongoing year-round affair. The site, which is run without a budget and on the spare time of members, nevertheless is fully bilingual English-French, with parts in Arabic, Chinese, Russian and Spanish (all official languages of the United Nations) as well as some German. Key points:
The Q&A, began with a request for more information from Alex about the implications for localization. He said that good localization needs additional metadata, including that related to identification and personalization. There was also a comment that there is not such as harsh clash between the business and academic worlds with regards to semantic web technologies. Alex replied that it's more an issue of adoption.
This session was chaired by Jörg Schütz of bioloom group.
Gerhard Budin of the University of Vienna presented the first talk in the Policy session, "Terminologies for the Multilingual Semantic Web - Evaluation of Standards in the light of current and emerging needs". In recent years several standards have emerged or have come of age in the field of terminology management (such as ISO 30042 (TBX), ISO 26162, ISO 12620, etc.) Different user communities in the language industry (including translation and localization), language technology research, industrial engineering and other domain communities are increasingly interested in using such standards in their local application contexts. This is exactly where problems more often than not arise in the natural need to adapt global and sometimes abstract, heavy-weight standards specifications to local situations that differ from each other. Thus the way standards are prepared needs to be adapted in such a way that different requirements from user groups and from local situations can be processed and taken into account appropriately and efficiently. The paper discussed innovative (web-service-oriented) approaches to standards creation in the field of terminology management in relation to different web-based user groups and semantic web-application contexts, integrating vocabulary-oriented W3C recommendations such as SKOS. The speaker integrated his experiences in the strategic contexts of FlareNet, CLARIN, ISO/TC 37 and in concrete user communities, e.g. in legal and administrative terminologies (the "LISE" project) and in risk terminologies (the "MGRM" project). Key points:
Georg Rehm, META-NET Network Manager at DFKI GmbH, presented "META-NET: Towards a Strategic Research Agenda for Multilingual Europe". META-NET is a Network of Excellence, consisting of 47 research centres in 31 countries, dedicated to fostering the technological foundations of a multilingual European information society. A continent-wide effort in Language Technology (LT) research and engineering is needed for realizing applications that enable automatic translation, multilingual information and knowledge management and content production across all European languages. The META-NET Language White Paper series "Languages in the European Information Society" reports on the state of each European language with respect to LT and explains the most urgent risks and chances. The series covers all official and several unofficial as well as regional European languages. After a brief introduction of META-NET the talk presented key results of the 30 Language White Papers which provide valuable insights concerning the technological, research, and also standards-related gaps of a multilingual Europe realized with the help of language technology. These insights are an important piece of input for the Strategic Research Agenda for Multilingual Europe which will be finalized by the beginning of 2012. Key points:
Arle Lommel, Standards Coordinator for the Globalization and Localization Association (GALA), gave a talk entitled "Beyond Specifications: Looking at the Big Picture of Standards". In the localization industry standardization has been seen primarily as a technical activity: the development of technical specifications. As a result there are many technical standards that have failed to achieve widespread adoption. The GALA Standards Initiative, an open, non-profit effort, is attempting to address areas that surround standards development—education, promotion, coordination of development activities, and development of useful guidelines and business cases, and non-technical, business-oriented standards—to help achieve an environment in which the needs of various user groups will help drive greater adoption of standards.. Key points:
During the Q&A, it was suggested that it would be useful to have a summary of all the standards from the workshop - a glossary of alphabet soup we've talked about. (This was developed further in the discussion sessions on the following day.) There was also some discussion about whether there are too many standards, and whether we can find a way to merge things to make life simpler. And a final set of questions focused on MT support and roadmaps related to the presentations.
This session was chaired by Jaap van der Meer of TAUS.
Workshop participants were asked to suggest topics for discussion on small pieces of paper that were then stuck on a wall. Jaap then lead the group in grouping the ideas and selecting a number of topics for breakout sessions. People voted for the discussion they wanted to participate in, and a group chair was chosen to facilitate. The participants then separated into breakout areas for the discussion, and near the end of the workshop met together again in plenary to discuss the findings of each group. Participants were able to move between breakout groups.
At Limerick we split into the following groups:
Summaries of the findings of these groups are provided below, some of which have been contributed by the breakout group chair. A number of groups were keen to renew discussion on these topics at the next workshop in Luxembourg.
Discussions in this group centred around the idea that, in practice, translation seems to include a lot of files being e-mailed around, and translation tools create packages, which are not always interoperable – so what can be done to achieve automation and interoperability?
Many issues were discussed. Some were addressed in Linport.
There are a couple of key workflow issues to consider. First, can I send you something and can you immediately use it? This was the focus on Linport. However, there's often a focus on the containers and not on the concrete content formats. Standards are often too narrowly focused in use cases. Interoperability-Now! was slightly broader, LINPORT even more so. Secondly, can we merge all of the efforts into one? At the very least, we want to avoid overlapping development of the same functionality. We can't just focus on being a translation-focused project (LINPORT), we need to look for broader scope.
A question arose about independent translators, who don't have the bandwidth to participate in these initiatives.
David Filip (University of Limerick) introduced the consortium of the MultilingualWeb-LT project. This is an upcoming EU project (known internally as LT-Web) with 13 initial partners. The main topic of MultilingualWeb-LT will be metadata for the multilingual web. The participants funded by LT-Web will concentrate on three implementation scenarios to make use of the metadata: 1) deep web and the localization chain, 2) surface web and real time machine translation, 3) deep web and machine translation training. The discussion in the session focused on the scope of the MultilingualWeb-LT group: what kind of metadata ("data categories") should be defined? Should it be broad or rather focused on relevant for the above implementation scenarios? In addition to deep Web content of specific areas (e.g. DITA applied the area of medicine), what kind of Web content should MultilingualWeb-LT focus on (HTML, CSS, maybe also JavaScript)? The audience supported the approach of backing standardization with implementations and test suites. Test suites might also get input from other efforts like linked open data. Especially, participants from the localization industry emphasized that providers and clients in the language industry already have their data categories. Often these are proprietary, but they are in real use in many connectors between systems. Instead of trying to "save the world", new data categories should be developed as little as possible. Rather, documenting best practices and processing requirements, and demonstrating these in real open source implementations could benefit various communities. Key points:
Summary: Maybe best practices, test suites and implementations that demonstrate the usage of the best practices, are needed more than "solving world hunger". Open questions: what data categories to tackle, how that relates to ITS.
One important topic discussed was crowd sourcing for translation, emphasizing the Facebook use case, where half a million people are participating in choices of terminology for 75 languages. Important criteria are speed, cost, quality, trustworthiness. Lionbridge said that a lot of trustworthiness comes from the fact that crowd sourced content has a local feel.
People have many motivations to participate: the chance to make decisions, peer recognition, to see contributions made visible, national pride, etc.
Hybrid approaches of professional plus crowd source localization are also feasible and practical.
There are also two cases for cultural differences: the design of the product, and interacting with the user generated content For instance, family relationships are not always directly translatable, since they mean different things in different cultures. Multilingual mining of SM can be very useful for marketing analytics.
During the plenary session, the question was asked about how to control quality assurance in the face of contributions from some many diverse sources. Timo responded that if there are many people in a local community, mistakes get discovered sooner or later. On the other hand, pro translators make correct translations, but may not adapt to the market completely A rule of thumb: if a use of a term is shared by a large number of people, it's preferred over the official translation.
Throughout the whole session, this breakout group had a strong focus on the users, evaluating which are their current needs and expectations regarding multilingual websites.
The Web Accessibility Initiative (WAI) of the World Wide Web Consortium (W3C) released the Web Content Accessibility Guidelines (WCAG 2.0) in December 2008. These guidelines are concerned with being testable, one of the main criticisms of their previous version, WCAG 1.0 (May, 1999). However, up to present, website accessibility validation has not been an easy task to accomplish by web designers and developers, since they acknowledge that they are not only reasonably difficult to implement, but also too restrictive and time-consuming. Studies have shown that success criteria are often met only up to the first level (A) because sometimes accessibility issues are not seen as important as other web design parameters. This is partially due to:
One of the suggestions proposed is to come up with a simplified version of existing WCAG 2.0 documents.
Note from Richard: There is some work being currently done on these.
When it comes to dealing with multilingual websites, implementation of WCAG 2.0 becomes even more complicated. Some of the accessibility concerns observed by the WAI should also be taken into account during the localization process (subtitling, audio-description, alternative content, focus order…). Sign language was a particularly interesting issue for debate. When localizing websites and videos, should we not include a sign language interpretation embedded videos too? After all, sign languages are often the native languages of deaf people. We pointed out, though, that they are usually rejected by customers because they are "too expensive". All these things considered, one might even think about web accessibility as a localizable element of the web: the accessibility level achieved in the original product should be maintained in the target languages. Even more, negotiations could occur with the client to improve the level of accessibility for the localized websites, even if the source original website was not 100% accessible. However, little attention is currently paid to web accessibility when performing localization tasks. In order to improve this situation, it has been suggested to enhance communication across the whole web development cycle. That is, there is a need for further training, as well as to bridge experts in different fields, getting developers, translators, localizers and designers involved in the implementation of web accessibility guidelines.
Now, from the Web Localization industry perspective, it is important to make customers understand the need to observe these web accessibility guidelines too. The example of a "rainbow menu bar" popped up. The idea might seem attractive, but maybe we are restricting the accessibility level of the website, thus leaving some communities of users aside. It is hard to find a compromise between what the client wants and what is technically and ethically accessible. In this sense, one of the participants said that it is useful to have a session with customers, letting them "experience" what restricted accessibility actually means. Accessibility workshops for developers are very helpful, too. Most companies are not aware of what is required in creating a multilingual/localized website, let alone any accessibility issues, and in a lot of cases, their web developers do not have the sufficient knowledge on this area either. Therefore, it takes some "education" sessions with customers to explain to them that, for instance, using heavy large imagery for countries where Internet connection speeds are relatively slow (e.g. China) is not advisable, or using Flash on the Home page for countries where most people access the Internet from their mobile phone (e.g. Russia). Also, even simpler aspects to take into account such as the color scheme, for example in Japan white is the color of death, etc.
Answers from the audience:
Within the educational domain, (multilingual) websites should be even more adapted to the user, although sometimes more attention is paid to the purpose or function of the website than to the end user. The content, as well as the user's interface, should be adapted to an audience made mostly of children and/or young students (it is often the case that teachers implement eLearning platforms suitable for them, but not for their students, in terms of clarity and simplicity of the content and organizational design). Among others, these are the recommendations that were proposed when thinking about how an educational-oriented website should look like: larger size fonts, simple/simplified content, right selection of images (yes, here the rainbow would fit), appropriate vocabulary complexity, etc.
Notes: There was another W3C workshop on Plain Language issues in Germany (Berlin) on Sept 19 which could add helpful insights: http://www.xinnovations.de/w3c-tag/articles/w3c-tag-2011.html – only accessible to people with knowledge of German, though.
In general, international organizations and institutions tend to ignore the users' needs in terms of web usability and, in particular, as regards language choice. It is often the case that they create the websites thinking first about themselves as an entity (image of the organization, internal communication, documents repository…), but not about the end users. As an example, we recalled one of the presentations (Danielle's) on health care international institutions and their multilingual portals. For instance, the user support on the EU health portal is considered poor because it is only offered in English. During the Q&A session, people seemed to justify this by explaining the technical challenges (human resources, funding…) behind such an implementation. However, users do not know very often about internal problems: the only thing that is important for them is having the information that they need available in their language. Organizations should, therefore, at least get a handle on their accessibility and localization issues, so that they can make improvements, given budget.
Generally speaking, the main problems that we have encountered when looking at institutional websites are the following:
Answers from the audience: