Today: Fun with Unicode, Regex and Java.

November 3, 2009 by Michael

Some would say, i have 3 problems 😉

private final static Pattern placeholder = Pattern.compile("#\\{(\\w+?)\\}");

won’t match “Mot#{ö}rhead” for example.

To replace the word character \w you either need the list of possible unicodeblocks like [\p{InLatin}|\p{InEtc}] (you get the codes for the blocks through “Character.UnicodeBlock.forName” or you’re lazy like me and just use the dot:

private final static Pattern placeholder = Pattern.compile("#\\{(.+?)\\}");

Oh what a day… :/

No comments yet

Post a Comment

Your email is never published. We need your name and email address only for verifying a legitimate comment. For more information, a copy of your saved data or a request to delete any data under this address, please send a short notice to michael@simons.ac from the address you used to comment on this entry.
By entering and submitting a comment, wether with or without name or email address, you'll agree that all data you have entered including your IP address will be checked and stored for a limited time by Automattic Inc., 60 29th Street #343, San Francisco, CA 94110-4929, USA. only for the purpose of avoiding spam. You can deny further storage of your data by sending an email to support@wordpress.com, with subject “Deletion of Data stored by Akismet”.
Required fields are marked *