Friday, June 29, 2007

FileWriter encoding problem

Last post I mentioned Apache POI. Today, I tired to parse an XLS file with Chinese characters and write the content to a file in tab separated format. Somehow, the result file is not readable at all. I assumed that there's a problem with how Apache POI handles unicode. After a little research, I realized that Apache POI should support unicode fully. So what is the problem?
After setting up some test cases, the mystery is solved. Java FileWriter class does NOT support unicode. A quick fix is to use OutputStreamWriter and specify proper encoding.

OutputStreamWriter writer = new OutputStreamWriter(new FileOutputStream(file),"UTF-8");

5 comments:

Rodrigo said...
This comment has been removed by a blog administrator.
Anulka&Wis said...

Thanks a lot, this one line of code solved my problems, too!

Anonymous said...

and here I am finding this two years later, but there is a fatal flaw with using the stream. you can't append data to the file! the append simply does not work. my code looks like this:

bw = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(writefile),"UTF8"));

bw.append(data);
bw.newline();
bw.close();

which fails to actually "append" it overwrites the content in my target file every time.

So FileWriter which does append fine , can't be made to set an encoding (which is something I saw discussed here http://www.mail-archive.com/issues@commons.apache.org/msg02179.html at JIRA as a problem by the JDK guys) and if you use the stream, you can encode to UTF-8 but you can't append! Awesome!! (sarcasm)

I am still trying to find a solution to this issue that should not exist if FileWriter's designers had the obvious idea to create a constructor or method to allow setting the write encoding. Oh well.

billbrown said...

Thanks. this solved my issue on windows XP.

Anonymous said...

that was very help full man.