Today: Fun with Unicode, Regex and Java.
Some would say, i have 3 problems
private final static Pattern placeholder = Pattern.compile("#\\{(\\w+?)\\}"); |
won’t match “Mot#{ö}rhead” for example.
To replace the word character \w you either need the list of possible unicodeblocks like [\p{InLatin}|\p{InEtc}] (you get the codes for the blocks through “Character.UnicodeBlock.forName” or you’re lazy like me and just use the dot:
private final static Pattern placeholder = Pattern.compile("#\\{(.+?)\\}"); |
Oh what a day… :/
Share This
Some would say, i have 3 problems private final static Pattern placeholder = Pattern.compile("#\\{(\\w+?)\\}"); won’t match “Mot#{ö}rhead” for example. To replace the word character \w you either need the list...
— Trackback URI
This entry (permalink) was posted on Tuesday, November 3, 2009, at 4:32 pm by Michael, tagged with Code Snippets and categorized in English posts, Java.
The following post could be of some interest: Unicode substrings in Ruby 1.8.x, regex: URL thingy with username, password, host and port, Create ZIP Archives containing Unicode filenames with Java, Javas String.replaceAll, Hidden Java gems: java.text.Normalizer, Comments are evil?, Java stuff, Raid 0 or 1 with Mac OS X, Mint 2.x advanced preferences, Remote hdd cloning
Post a Comment