Compiler Error: Invalid Character in the Given Encoding

December 14, 2012

tags: , ,
no comments

Recently, a colleague of mine received the above compilation error, together with hundreds of other, less helpful errors, when he tried to compile what seemed to be a perfectly fine XAML code.

The problems started after he removed all the comments from the source files with a program that he created. The program simply loads the source file’s content, removes the comments, and then writes the updated content back to the file. Here is a naive example to illustrate the problem:

   1: public void ProcessFile(string path)

   2: {

   3:     string originalContent = File.ReadAllText(path);

   4:     string processedContent = Process(originalContent);

   5:     File.WriteAllText(path, processedContent, Encoding.Default);

   6: }

 To understand the problem, you should first understand that a compiler is basically just a program that receives a string, processes it, and outputs a new string. The C# compiler is no different, it receives a string that supposedly represents a C# program, validates it, and outputs a string that represents a binary representation of the program, or a list of errors and warnings.

In order for the compiler to interpret the given string correctly, the string format must be agreed upon. The following paragraph is taken from ECMA-334 – C# Language Specification:

A conforming implementation of C# shall interpret characters in conformance with the Unicode Standard, Version 4.0, and ISO/IEC 10646-1. Conforming implementations must accept Unicode source files encoded with the UTF-8 encoding form.

The Visual Studio C# compiler extends the standard and is able to compile source files that are encoded in ASCII, UTF-8 or UTF-32 (those are the ones I checked), but you have to make sure that the source code can be represented in the chosen encoding.

C# code is usually written in English*, so it can be encoded correctly with either one of the above encodings. XAML code, on the other hand, is more likely to contain characters in another language. And if it does, encoding it with the ASCII character set, for example, will cause the compiler to read gibberish.

That is exactly what happened in the above scenario, as the encoding given to the WriteAllText method is Encoding.Default, which returns an encoding for the operating system’s current ANSI code page. It is usually not enough if the source code contains non-English characters.

When you are working with text files, make sure to respect the encoding. In the above case, using Encoding.UTF8 fixed the errors.

*You can write perfectly valid c# programs in your own language. Try it.

Cross-Posted from http://www.programmingtidbits.com/post/2012/12/13/Compiler-Error-Invalid-Character-in-the-Given-Encoding.aspx

Add comment
facebook linkedin twitter email

Leave a Reply

Your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

*