Java has a build-in java.text.Normalizer class to transform Unicode text into an equivalent composed or decomposed form. Dafuq? The letter ‘Á’ can be represented in a composed form U+00C1 LATIN CAPITAL LETTER A WITH ACUTE and a decomposed form U+0041 LATIN CAPITAL LETTER A U+0301 COMBINING ACUTE ACCENT Normalizer handles this for your: import java.text.Normalizer; […]
UTF-8 has always been a multi-byte encoding but you probably had to handle only 2 byte (16bit) UTF-8 characters. With the raise of Emojis 4 byte characters rose as well so handling 4 byte UTF-8 characters is not only of interest for handling exotic languages but also for the needs of average users who want […]
It is harder than i thought to create a simple Zip Archive from within Java that contains entries with unicode names in it. I’m actually to lazy to read all the specs, but it says something that the entries in a zip archive are encoded using “Cp437”. The buildin Java compressing api has nothing to […]