The Unicode Blog

Thursday, August 20, 2020

Tableaux des caractères Unicode 13.0 désormais disponibles en langue française

Les tableaux des caractères Unicode 13.0 en langue française sont désormais disponibles sur le site web d’Unicode. Après un long travail de traduction réalisé par des experts francophones (du Canada, de France et de Belgique), une grande partie du système proposé aux locuteurs anglophones pour l’accès en ligne aux tableaux de caractères (https://www.unicode.org/charts/) a été reproduite en langue française pour les utilisateurs francophones et est disponible sous ce lien : https://www.unicode.org/charts/fr/. Cette page du site propose un accès aux différents blocs définis dans les tableaux des caractères Unicode 13.0, rangés par catégorie (écritures, symboles, ponctuation, etc.). La recherche par code hexadécimal d’un caractère est également proposée sur cette page. Et une recherche par nom de caractère est possible sur cette autre page : https://www.unicode.org/charts/fr/charindex.html (un clic sur le lien intitulé « Index des noms » vous y conduira directement).

Les tableaux des caractères Unicode 13.0 en langue française sont également disponibles sous la forme d’un fichier unique à cette adresse : https://www.unicode.org/Public/13.0.0/charts/fr/ ; il n’est toutefois pas prévu de fournir des tableaux en langue française mettant en lumière les caractères ajoutés au répertoire de la version actuelle (c’est-à-dire des fichiers équivalents à ceux que l’on trouve sous ce lien : https://www.unicode.org/charts/PDF/Unicode-13.0/).

Ces tableaux sont également accessibles depuis : https://www.unicode.org/versions/Unicode13.0.0/#Code_Charts.

Marc Lodewijck a été le principal contributeur à la réalisation des tableaux de caractères en langue française pour la version 13.0 d’Unicode, un travail auquel ont largement participé, en particulier, Patrick Andries, Alain LaBonté, Michel Suignard et François Yergeau, ainsi que quelques autres personnes.

Avertissement : la fourniture des tableaux des caractères Unicode 13.0 en langue française n’implique nullement que le Consortium Unicode créera de tels tableaux (en français ou dans d’autres langues que l’anglais) pour les versions à venir du standard Unicode. Contrairement aux noms des caractères Unicode en langue anglaise, leurs équivalents en langue française ne constituent pas un élément normatif du standard Unicode.

Unicode 13.0 code charts now available in French

The Unicode 13.0 code charts are now also available in French on the Unicode web site. Following an extensive translation work by French-speaking experts (from Canada, France, and Belgium), a large part of the online code chart mechanism available to English speakers at https://www.unicode.org/charts/ has been duplicated in French at https://www.unicode.org/charts/fr/. That link allows the access to the various blocks defined in the Unicode 13.0 code charts, based on their categories (scripts, symbols, punctuation, etc.). The search by hex code is also available on the same page. And you may access an index of character names on the following page: https://www.unicode.org/charts/fr/charindex.html (clicking on the link labeled “Index des noms” will take you straight to it).

Access to the Unicode 13.0 version of the French-language archival code charts (single file) is also available at https://www.unicode.org/Public/13.0.0/charts/fr/; however there is no plan to provide a French version of the delta code charts (equivalent to https://www.unicode.org/charts/PDF/Unicode-13.0/).

These code charts are also accessible from: https://www.unicode.org/versions/Unicode13.0.0/#Code_Charts.

Marc Lodewijck has been the main contributor to the creation of the French-language Unicode 13.0 code charts, and more have helped in making this possible, including Patrick Andries, Alain LaBonté, Michel Suignard, François Yergeau, and a few other people.

Disclaimer: Providing these French-language code charts for Unicode 13.0 does not imply that the Unicode Consortium will create such code charts (in French or other languages other than English) for future versions of the Unicode Standard. Unlike Unicode character names in English, their French-language equivalents are not a normative part of the Unicode Standard.

Over 140,000 characters are available for adoption to help the Unicode Consortium’s work on digitally disadvantaged languages

Thursday, June 18, 2020

Unicode Regular Expressions v21 Released

Regular expressions are a powerful tool for using patterns to search and modify text, and are vital in many programs, programming languages, databases, and spreadsheets.

Starting in 1999, UTS #18: Unicode Regular Expressions has supplied guidelines and conformance levels for supporting Unicode in regular expressions. The new version 21 broadens the scope of properties for regular expressions (regex) to allow for properties of strings (such as for emoji sequences). For example, the following matches all emoji flags except the French flag:

/[\p{RGI_Emoji_Flag_Sequence}--\q{🇫🇷}]/

Among the improvements are:

Provides a new Annex D: Resolving Character Classes with Strings for handling negations of sets of strings.
Updates the full property list to include the latest UCD properties, plus Emoji properties and UTS #39 properties.
Removes obsolete text passages, and makes editorial changes for clarity.

Over 140,000 characters are available for adoption to help the Unicode Consortium’s work on digitally disadvantaged languages

Unicode Consortium Announces New Additions to Leadership Team

We are pleased to announce the following leadership additions at the Unicode Consortium. “Each of these individuals brings deep expertise in their field,” said Mark Davis, president of the Consortium. “They have already made significant improvements in their new roles.”

Unicode Emoji Subcommittee

Chair: Jennifer Daniel

Jennifer Daniel’s first contribution to Unicode was standardizing gender inclusive representations in emoji. As a designer, author and former graphics editor at the New York Times, she now explores communication and messaging through verbal, written, auditory and visual expression at a small ad company called Google. Jennifer is a co-author and illustrator of a number of graphics books including How to Be Human, Space!, and the Origins of Almost Everything. Her work has been recognized by the Walker Art Museum, Society of Illustrators and published in the New Yorker, The Washington Post, and Time Magazine to name a few. She has had the honor to serve as a judge for the Society of News Design, Online News Association, Society of Illustrators, American Illustration, Data is Beautiful and the Art Director's Club. She lives in Berkeley, California but also in cyberspace.

Vice Chair: Ned Holbrook

Ned Holbrook is a typographic engineer at Apple, specializing in text layout and fonts. He was one of the participants in the industry-wide effort to standardize variable font technology in OpenType. He previously worked on wireless networking, virtualization, digital audio, embedded graphics, and remote filesystems.

Unicode CLDR Committee

Vice Chair: Kristi Lee

Kristi Lee is the CLDR technical committee vice-chair, and she represents Microsoft in the CLDR technical committee. She joined Microsoft in 1997 and has worked in a number of different divisions and product development groups. Her focus has been delivering solutions to international customers in localization and internationalization. She holds a mathematics degree from University of Washington. Currently, she is in the Corporate division in Microsoft and works with engineering groups across Microsoft including Windows, .NET, Office, and others on topics relating to CLDR and i18n.

Executive Officer

General Counsel: Anne Gundelfinger

Anne is an experienced legal executive with 30 years in private practice and in-house legal roles. From 2013-2019 she served as vice president for global intellectual property for Swarovski, a global fashion jewelry brand based in central Europe. Before that she held various positions over a decade in the Intel legal department including vice president for global public policy, vice president for global sales & marketing legal affairs, and director of trademarks & brands. Early in her career she was an associate at Fenwick & West and director of trademarks at Sun Microsystems. Since retiring from Swarovski, Anne has been a consultant and has served as a World Intellectual Property Organization domain name panelist under the Uniform Dispute Resolution Policy of ICANN. Anne has long been a leader in the global IP bar. She served on the Board of Directors of the International Trademark Association for nearly a decade and served as the Association’s president in 2005.

Mark Davis, the former chair of the emoji subcommittee, will continue to contribute to the emoji subcommittee and serve as president of the Unicode Consortium. “I’d also like to thank John Emmons for his many years of service as chair and vice chair of the CLDR technical committee,” said Davis. “Especially for his work in promoting support for digitally disadvantaged languages.”

Over 140,000 characters are available for adoption to help the Unicode Consortium’s work on digitally disadvantaged languages

Friday, June 12, 2020

Unicode 13.0 Paperback Available

The Unicode 13.0 core specification is now available in paperback book form with a new, original cover design by Huijun Shan. This edition consists of a pair of modestly priced print-on-demand volumes containing the complete text of the core specification of Version 13.0 of the Unicode Standard.

Each of the two volumes is a compact 6×9 inch US trade paperback size. The two volumes may be purchased separately or together, although they are intended as a set. Please visit the separate description pages for Volume 1 and Volume 2 to order each volume in the set. The cost for the pair is US $29.58, plus shipping and taxes (if applicable).

Note that these volumes do not include the Version 13.0 code charts, nor do they include the Version 13.0 Standard Annexes and Unicode Character Database, which are all freely available on the Unicode website.

Purchase The Unicode Standard, Version 13.0 - Core Specification Volume 1 and Volume 2

Over 140,000 characters are available for adoption to help the Unicode Consortium’s work on digitally disadvantaged languages

Wednesday, June 10, 2020

PRI #418: Registration of additional sequences in the MSARG collection

The Unicode Consortium has posted a new issue for public review and comment.

Public Review Issue #418: A submission for the “Registration of additional sequences in the MSARG collection” has been received by the IVD registrar.

This submission is currently under review according to the procedures of UTS #37, Unicode Ideographic Variation Database, with an expected close date of 2020-09-11. Please see the submission page for details and instructions on how to review this issue and provide comments:

https://www.unicode.org/ivd/pri/pri418/

The IVD (Ideographic Variation Database) establishes a registry for collections of unique, and sometimes shared, variation sequences for ideographs, which enables standardized interchange in plain text, in accordance with UTS #37.

Friday, April 24, 2020

ICU 67 Released

Unicode® ICU 67 has just been released. ICU 67 updates to CLDR 37 locale data with many additions and corrections. This release also includes the updates to Unicode 13, subsuming the special CLDR 36.1 and ICU 66 releases. ICU 67 includes many bug fixes for date and number formatting, including enhanced support for user preferences in the locale identifier. The LocaleMatcher code and data are improved, and number skeletons have a new “concise” form that can be used in MessageFormat strings.

ICU is a software library widely used by products and other libraries to support the world's languages, implementing both the latest version of the Unicode Standard and of the Unicode locale data (CLDR).

For details, please see http://site.icu-project.org/download/67.

Thursday, April 23, 2020

Unicode Locale Data v37 released!

The final version of Unicode CLDR version 37 is now available. It focuses on adding new locales, enhancing support for units of measurement, adding annotations (names and search keywords) for symbols, and adding annotations for Emoji v13.

Unicode CLDR provides an update to the key building blocks for software supporting the world's languages. CLDR data is used by all major software systems (including mobile phones) for their software internationalization and localization, adapting software to the conventions of different languages.

Expanded locale preferences for units of measurement. The new unit preference and conversion data allows formatting functions to pick the right measurement units for the locale and usage, and accurately convert input measurement into those units.

Emoji 13.0. The emoji annotations (names and search keywords) for the new Unicode 13.0 emoji are added. The collation sequences are updated for new Unicode 13.0, and for emoji.

Annotations (names and keywords) expanded to cover more than emoji. This release includes a small set of Unicode symbols (arrow, math, punctuation, currency, alphanum, and geometric) with more to be added in future releases. For example, see v37/annotations/romance.html.

New locales. New languages at Basic coverage: Fulah (Adlam), Maithili, Manipuri, Santali, Sindhi (Devanagari), Sundanese. New languages at Modern coverage: Nigerian Pidgin. See Locale Coverage Data for the coverage per locale, for both new and old locales.

Grammatical features added. Grammatical features are added for many languages, a first step to allowing programmers to format units according to grammatical context (eg, the dative version of "3 kilometers").

Updates to code sets. In particular, the EU is updated (removing GB).

For more details, access to the data and charts, and important notes for smoothly migrating implementations, see Unicode CLDR Version 37.

Friday, April 10, 2020

Technical Alert: Unicode Technical Website Down

TECHNICAL ALERT: the Unicode Consortium's technical website is hosted in a data center that has experienced a catastrophic failure. We are working to get back online, but this may take a couple weeks. We apologize for the inconvenience. BTW: this failure occurred after we announced we are delaying the release of Unicode 14.0.

Thursday, August 20, 2020

Tableaux des caractères Unicode 13.0 désormais disponibles en langue française

Unicode 13.0 code charts now available in French

Thursday, June 18, 2020

Unicode Regular Expressions v21 Released