How to Save a Unicode Text File That Excel Can Read

December 11, 2011

tags:
2 comments

The other day I had to create a .CSV file with some funky Unicode characters. Not only that, but Excel had to be able to open and edit it. When using .NET’s StreamWriter default constructor, it uses a default encoding of UTF-8 without BOM (byte order mark), which Excel can’t read.

Well, actually, it can read it, but it doesn’t realize that this is a unicode file and special characters (such as this lovely one – Ž) have a tendency to look like someone just puked a letter on the screen.

The solution is quite simple. You need to use the other StreamWriter constructor, which accepts Encoding. You then have to pass it an Encoding of UTF-8 with BOM, which you can create by using the non-default constructor of the UTF8Encoding class. The whole thing looks something like this:

   1: private static readonly Encoding Utf8WithBom = new UTF8Encoding(true);

   2:  

   3: public void WriteSomething()

   4: {

   5:     using (var streamWriter = new StreamWriter(@"c:\hello.csv", true, Utf8WithBom))

   6:     {

   7:         streamWriter.WriteLine("hello,world");

   8:     }

   9: }

By the way, this is more of an issue with Excel then with .NET. As the above Wikipedia article states, the UTF-8 specification doesn’t require the BOM, and most applications can do without it.

Add comment
facebook linkedin twitter email

Leave a Reply

Your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

*

2 comments

  1. ArsyilahJuly 21, 2012 ב 17:23

    Yes is debatable, but I was coisdiernng this because:* it simpler to convert an existing ASCII application to UTF-8 that to UTF-32* sometime in the, far, future UTF-32 could be not enough, it did happened with UTF-16 before * with some luck you can convince ASCII applications to work with UTF-8 strings without breaking them. This would be clearly impossible with UTF-32* on average UTF-8 is consumes far less space than UTF-32.And about processing: I wouldn’t even try to write my own UTF-8 string parsing routines there are good, free and open-source solutions for this.

    Reply
  2. seo资源October 22, 2012 ב 12:10

    Major danke Brian about an additional remarkable useful resource that will serve the full market.

    Reply