DZone Snippets is a public source code repository. Easily build up your personal collection of code snippets, categorize them with tags / keywords, and share them with the world

Snippets has posted 5883 posts at DZone. View Full User Profile

Java Cleanly Decode Utf8

08.18.2010
| 2907 views |
  • submit to reddit
        Clean a string of non-utf8 characters in java using nio madness!

    import java.nio.CharBuffer;
    import java.nio.charset.CharacterCodingException;
    import java.nio.charset.CodingErrorAction;
    import java.nio.charset.Charset;
    import java.nio.charset.CharsetDecoder;
    import java.nio.charset.CharsetEncoder;

    public static Charset charset = Charset.forName("UTF-8");
    public static CharsetEncoder encoder = charset.newEncoder();
    public static CharsetDecoder decoder = charset.newDecoder();

    static {
       decoder.onMalformedInput(CodingErrorAction.IGNORE);
       decoder.onUnmappableCharacter(CodingErrorAction.IGNORE);
    }

    public static String utf8( String input ) throws CharacterCodingException {
        return decoder.decode( encoder.encode( CharBuffer.wrap( input ) ) ).toString();
    }