Today: Fun with Unicode, Regex and Java.

November 3, 2009 by Michael

Some would say, i have 3 problems 😉

private final static Pattern placeholder = Pattern.compile("#\\{(\\w+?)\\}");

won’t match “Mot#{ö}rhead” for example.

To replace the word character \w you either need the list of possible unicodeblocks like [\p{InLatin}|\p{InEtc}] (you get the codes for the blocks through “Character.UnicodeBlock.forName” or you’re lazy like me and just use the dot:

private final static Pattern placeholder = Pattern.compile("#\\{(.+?)\\}");

Oh what a day… :/

No comments yet

Post a Comment

Your email is never published. We need your name and email address only for verifying a legitimate comment. For more information, a copy of your saved data or a request to delete any data under this address, please send a short notice to from the address you used to comment on this entry.
By entering and submitting a comment, wether with or without name or email address, you'll agree that all data you have entered including your IP address will be checked and stored for a limited time by Automattic Inc., 60 29th Street #343, San Francisco, CA 94110-4929, USA. only for the purpose of avoiding spam. You can deny further storage of your data by sending an email to, with subject “Deletion of Data stored by Akismet”.
Required fields are marked *