DCSIMG
Natalie Reznik

Natalie Reznik

"It's not a bug, it's a feature !" Unknown Programmer

Convert Hebrew Unicode text to old EBCDIC (codepge 803)

Foreword:

We, dot.net progrmmers, don't deal too much with various encodings. We have Encoding class in the Framework, which solves all our issues... for the most part, but not always. There are some very old and rarely used encodings which are not natively supported by .net Framework, and because they are so rare there may be little or no information about them in public domain. It seems like a trivial task, you google it and don't find the solution.

This happened to me  a couple of months ago - i needed to convert normal unicode strings to ebcdic (encoding used in mainframes). The problem was that, for some reason, mainframes in my organization use old codepage for Hebrew; codepage 803, which unlike newer codepge 424 is not supported by the Framework. Remapping characters from one codepage to another was not such a big deal, though there was an issue with mapping 'alef'. I relied on this resource:

 http://www.tachyonsoft.com/cp00803.htm

But the biggest problem is not mapping, the biggest problem is combining right-to-left language with left-to-right language within one string. To save a RTL text in the MF one needs to reverse the order of charchters (i'm not really sure why, but i was told this is in order to support printing from the terminal and so that text is properly rendered in MF emulators), while LTR text should not be reversed. so once you have a combined text like:

יונדאי יוצאת במבצע תחת הסלוגן " I WANT" בו יוענקו הנחות והטבות שונות לכל דגמי יונדאי 2011.

in order to reverse only hebrew but not english or numbers or special charcters you need to at least recognize hebrew text as such. Naturally reading from MF requires the opposite process.

To make the long story short, take a look at the code which I came up with in the end. It fits for English, Hebrew and combined (english inside hebrew) text.

WARNING: this code is not absolutely generic as it was developed for the humble needs of our department, and we usually work with rather short strings, no longer than a couple of sentences, so i allowed myself to use REGEX as it won't seriously affect our applications' performance; for longer strings you might need to optimize , otherwise it might be nastily slow. The good part is that this code doesn't use any third party components which could be a  problem in financial organizations (where MFs are used); only the most basic libraries from the .net Framework.

There are basically two public functions:

  • ConvertToOldEbcdic - takes unicode string and converts it to ebcdic (codepge 803)
  • ConvertFromEbcdic - receives old ebcdic text and returns normal unicode string

all other functions are private and are utilized by the above.

The code is provided "AS IS" , you're allowed to use it at your own responsibility only.

----------------------------------

using System;
using System.Collections.Generic;
using System.Text;
using System.Text.RegularExpressions;
 
namespace SomeNameSpace
{
    class EbcdicAdapter
    {
        /// <summary>
        /// converts a unicode string to ebcdic encoded string
        /// </summary>
        /// <param name="normalString"></param>
        /// <returns></returns>
        public static string ConvertToOldEbcdic(string normalString)
        {
            if (Regex.IsMatch(normalString, "([א-ת])"))//only if includes hebrew needs to be arranged.
            {
                normalString = SwapBrackets(normalString);
            }
            string inputString = normalString.ToUpper(); // cannot handle correctly lowercase english letters 

            string decString = "";
            string oldEbcdic = "";
            Encoding ebcEnc = Encoding.GetEncoding(20424);
            byte[] inBytes = ebcEnc.GetBytes(inputString);
            StringBuilder sBuilder = new StringBuilder(inBytes.Length * 2);
 
            foreach (byte b in inBytes)
            {
                byte adaptedByte = AdaptToOldEbcdic(b);
                sBuilder.AppendFormat("{0:X2}", adaptedByte);
            }
            decString = sBuilder.ToString();
            byte[] outBytes = new byte[decString.Length / 2];
            for (int i = 0; i < decString.Length; i += 2)
            {
                outBytes[i / 2] = Convert.ToByte(decString.Substring(i, 2), 16);
            }
            Array.Reverse(outBytes);
            oldEbcdic = ebcEnc.GetString(outBytes);
            oldEbcdic = ArrangeEnHeString(oldEbcdic);
 
            return oldEbcdic.Trim();
        }
 
/// <summary>
        /// converts an ebcdic encoded string to a unicode string
        /// </summary>
        /// <param name="oldEbcdicString"></param>
        /// <returns></returns>
        public static string ConvertFromEbcdic(string oldEbcdicString)
        {
            string inputString = oldEbcdicString; //"wipfx ilhp xear zixara my";
            string decString = "";
            string outputString = "";
            Encoding ebcEnc = Encoding.GetEncoding(20424);
            byte[] inBytes = ebcEnc.GetBytes(inputString);
            StringBuilder sBuilder = new StringBuilder(inBytes.Length * 2);
 
            foreach (byte b in inBytes)
            {
                byte adaptedByte = AdaptToNewEbcdic(b);
                sBuilder.AppendFormat("{0:X2}", adaptedByte);
            }
            decString = sBuilder.ToString();
 
            byte[] outBytes = new byte[decString.Length / 2];
            for (int i = 0; i < decString.Length; i += 2)
            {
                outBytes[i / 2] = Convert.ToByte(decString.Substring(i, 2), 16);
            }
            Array.Reverse(outBytes);
            outputString = ebcEnc.GetString(outBytes);
            if (Regex.IsMatch(outputString, "([א-ת])"))//only if includes hebrew needs to be arranged.
            {
                outputString = ArrangeEnHeString(outputString);
                outputString = SwapBrackets(outputString);
            }
            else // if not just need to reverse the string
            {
                char[] chars = outputString.Trim().ToCharArray();
                Array.Reverse(chars);
                StringBuilder stBuilder = new StringBuilder(chars.Length);
                foreach (char _char in chars)
                {
                    stBuilder.Append(_char);
                }
                outputString = stBuilder.ToString();
            }
 
            return outputString.Trim();
        }
 
        /// <summary>
        /// reverses only english text within the hebrew text.
        /// numbers are treated as english text.
        /// punctuation as hebrew text.
        /// </summary>
        /// <returns></returns>
        private static string ArrangeEnHeString(string inputStr)
        {
            string _input = inputStr; //"זה טקסט לבדיקה עם ENGLISH ומספרים 123456";
            string pattern = "(([0-9A-Z\\s\\.\\,\\%])+|([A-Z\\.])+)"; //english or numbers with empty spaces (\\s). this pattern string is what you might want to adjust to your needs.
            MatchCollection matches = Regex.Matches(_input, pattern);
 
            foreach (Match match in matches)
            {
                //if match is " " white space  - go to next match
                if (match.Value.Equals(" "))
                {
                    continue;
                }
 
                //retain starting and trailing empty spaces
                bool startsEmpty = match.Value.StartsWith(" ");
                bool endsEmpty = match.Value.EndsWith(" ");
 
                string matchValueTrimmed = match.Value.Trim();
 
                char[] chars = matchValueTrimmed.ToCharArray();
                Array.Reverse(chars);
                StringBuilder sBuilder = new StringBuilder(_input.Length);
                foreach (char _char in chars)
                {
                    sBuilder.Append(_char);
                }
                string reversedStr = sBuilder.ToString();
 
                if (!string.IsNullOrEmpty(matchValueTrimmed))
                {
                    _input = _input.Replace(matchValueTrimmed, reversedStr);
                }
 
                //bring back empty spaces
                if (startsEmpty)
                {
                    _input = " " + _input;
                }
 
                if (endsEmpty)
                {
                    _input = _input + " ";
                }
            }
 
            return _input;
        }
 
        public static string SwapBrackets(string outputString)
        {
            for (int i = 0; i < outputString.Length; i++)
            {
                if (outputString[i] == ')')
                {
                    outputString = outputString.Remove(i, 1);
                    outputString = outputString.Insert(i, "(");
                }
                else if (outputString[i] == '(')
                {
                    outputString = outputString.Remove(i, 1);
                    outputString = outputString.Insert(i, ")");
                }
                else if (outputString[i] == '[')
                {
                    outputString = outputString.Remove(i, 1);
                    outputString = outputString.Insert(i, "]");
                }
                else if (outputString[i] == ']')
                {
                    outputString = outputString.Remove(i, 1);
                    outputString = outputString.Insert(i, "[");
                }
                else if (outputString[i] == '}')
                {
                    outputString = outputString.Remove(i, 1);
                    outputString = outputString.Insert(i, "{");
                }
                else if (outputString[i] == '{')
                {
                    outputString = outputString.Remove(i, 1);
                    outputString = outputString.Insert(i, "}");
                }
            }
            return outputString;
        }
 
        private static byte AdaptToNewEbcdic(byte oldEbcdic)
        {
            byte newEbcdic = 0;
            byte[,] dictionary = GetEbcdicDictionary();
 
            for (int i = 0; i < 27; i++)
            {
                if (dictionary[i, 0].Equals(oldEbcdic))
                {
                    newEbcdic = dictionary[i, 1];
                    break;
                }
            }
 
            if (newEbcdic.Equals(0)) // if not hebrew letter keep existing mapping
            {
                newEbcdic = oldEbcdic;
            }
 
            return newEbcdic;
        }
        private static byte AdaptToOldEbcdic(byte newEbcdic)
        {
            byte oldEbcdic = 0;
 
            byte[,] dictionary = GetEbcdicDictionary();
 
            for (int i = 0; i < 27; i++)
            {
                if (dictionary[i, 1].Equals(newEbcdic))
                {
                    oldEbcdic = dictionary[i, 0];
                    break;
                }
            }
 
            if (oldEbcdic.Equals(0)) // if not hebrew letter keep existing mapping
            {
                oldEbcdic = newEbcdic;
            }
 
            return oldEbcdic;
        }
 
        /// <summary>
        /// map hebrew letters from 424 codepage to (supposedly) 803 codepage.
        /// all letters except for א fit the 803 codepage.
        /// </summary>
        /// <returns></returns>
        private static byte[,] GetEbcdicDictionary()
        {
            byte[,] dictionary = new byte[,] {
                                    {121,65},  // א // different from 803 codepage
                                    {129,66},  // ב
                                    {130,067}, // ג
                                    {131,068}, // ד
                                    {132,069}, // ה
                                    {133,070}, // ו
                                    {134,071}, // ז
                                    {135,072}, // ח
                                    {136,073}, // ט
                                    {137,081}, // י
                                    {145,082}, // ך
                                    {146,083}, // כ
                                    {147,084}, // ל
                                    {148,085}, // ם
                                    {149,086}, // מ
                                    {150,087}, // ן
                                    {151,088}, // נ
                                    {152,089}, // ס
                                    {153,098}, // ע
                                    {162,099}, // ף
                                    {163,100}, // פ
                                    {164,101}, // ץ
                                    {165,102}, // צ
                                    {166,103}, // ק
                                    {167,104}, // ר
                                    {168,105}, // ש
                                    {169,113}  // ת 
                                      };
            return dictionary;
        }
 
    }
}
------------------------------------------------------------- 

Obviously , before I wrote this utility , I searched the web for the solution , which I din't find, but some pieces of code from different sites gave me tips. Unfortuntely , I cannot recall where I saw the code which I integrated in my solution. So if you feel that I used here a piece of code that you wrote, and deserve a credit, please write me a message, and I will gladly add  a credit.

Parsing HTML with RegEx

Many times web developpers have to make use of the data which comes from a not very conventional source - html. For example a client orders a mobile version of a site but is unwilling to give access to the database or provide web service or even RSS. "Take it all from our site" - he says. Not always you're in a position to insist on a normal datasourse and explain the enormous disadvantages and performance problems that will inevitably arrise in this scenario. Your project manager comes to you and says that he understands everything but you should get the client off his back : "But you can do it , right? So let's just do it fast and forget about it; we have other urgent issues to attend so let this one be fast and dirty....".
And so you have to admit that there's no other option but to find your regex cheat-sheet. Regex is something you struggle so much to remember when you need it , but you forget it almost totally after a couple of days that you didn't use it. For many developpers Regex is like going to a dentist - you hate it , you're afraid of it , you do it mostly only if there's no other choice , and you are thankful when it's over. So if you can't find your cheat sheet here's a helpful link:

http://www.mikesdotnetting.com/Article.aspx?ArticleID=46

What next? "I should make it flexible enough not to fail even if there will be slight changes in the source" - you say.
Well , sure that's important , but don't be too optimistic, remember murfey' rule - if anything can go wrong , it will. In case with html it will change, and your regex will certainly fail from time to time.... So it is more important to make your regex easyly fixable. and certainly it is best to save it somewhere in a file or DB rather than hardcoded in your app. Keep your regex as short and simple as possible. Long and complex regexes with numerous conditions and stuff like that are hardly-human-readable, and their mainteinance is a black hole where tons of development-hours disappear, which will certainly make your project manager very unhappy.

On the one hand you don't want to run your string too many time through a regex as it will affect the perfomance; on the other hand it might be really inevitable to breake your regex into few short parts and run them in a loop. The good news that it will make your regex maintainable, and next time your client changes his site's html you'll be able to adjust your regex easily within minutes. So it's up to you to find a golden path in between the two extremes. So now when the inevitable is taken care of , you can focus on how to make it happen as rarely as possible.

So what are the most often changes that occur on html pages? first of all all sorts of whitespaces like \r \n \t between the tags for examle if you have regex like this:

       <td\ style="padding-right:\ 10px;"\ width="60">some title:</td>\r\n\t\t\t\t\t\t\t\t<td><b>(?<SomeTitle>.*?)</b></td>

it will work untill there's a change between the <td>s so it is best to take care of it right away:

       <td\ style="padding-right:\ 10px;"\ width="60">some title:</td>\s*<td><b>(?<SomeTitle>.*?)</b></td>

then the problematic part is the tag's attributes like padding and so on. while the structure of the entire page doesn't change so often some minor changes in the page's design do happen petty often and you don't want your regex fail every time the width of a cell changes for a couple of pixels:

       <td\ style="padding-right:\ \d+px;"\ width="\d+">some title:</td>\s*<td><b>(?<SomeTitle>.*?)</b></td>

or you can make even a more drastic change and resolve all possible issues with attributes:

       <td.{0,70}>some title:</td>\s*<td><b>(?<SomeTitle>.*?)</b></td>

why limiting the number of characters? because if there are many such elements on the page and you want to extract them all there will be a problem if you don't limit it. but not necessarily in your case. Certainly these are mere examples and in real life you might make better regexes alltogether but the thing is that not always you have time to bring them to perfection , somehow such things are considered very easy and not time cosuming by both clients and not very technical project managers. so the best solution is to make it as fast as you can and focus on making them readable and solve typical potential problems, rather then sit and think how to make it perfect.

 

 

Silverlight and Nokia - what the fuss?

It's all over the internet - silverlight will be on on nokia's s60 and s40 devices. you're probably asking yourself why on earth everyone is so damn excited? well it's probably hard to understand for someone who didn't develop mobile internet.

development for mobile internet can be very , very frustrating.

as a .Net programmer you have wonderful tools , powerful libraries to accomplish amazing results , but but the front end of your apps - the pages are simply disappointing because of weak in functionality browsers. small displays , funny problems with right-to-left languages that take like 30% of your development time , this is the reality for us , mobile web developers.

nokia's s60 and s40 devices are basically all 3d generation and most 2nd generation devices! they hold like half of all devices in use in these categories in the world. they are your favorite N95 and E65 and 6280 and many many others.

Visually appealing sites is something pretty much non-realistic when your biggest concern is to make pages look alike (not even the same) across tens of browsers. and what about flash? well , it's not very much in use in mobile web. what you have here is FlashLight a very "light" (see "weak") version of flash. and besides flash is for designers , this is what we are used to think of flash - maximum visuality , minimum functionality, though it doesn't have to be that way, but design and functionality just don't connect when it comes to flash. mobile devices don't even support ordinary swf files.

so when they say that nokia will support silverlight on almost all normal phones , well , i loose my breath. this means that in close future i will be able to build a site and show it proudly to my mom and dad and they will say : "WOW!" instead of usual :"what can you do here?".

silverlight has a significant advantage over flash and this is the programming part. all .Net is at my command , it can be very powerful behind the curtains (code behind) which will deliver amazing functionality along with flash-like visual beauty. pages won't seem so static as they are now. microsoft provides tools for a better cooperation of designers and programmers for silverlight. i can continue working with visual studio!

i hope it is just a matter of time, and hopefully not too much time, that other companies like sony ericsson and samsung will also adopt silverlight for their 3G phones.

SQL - how to convert date that is nvarchar into datetime type and another format

This is unbelievable! I was afraid that 10 wins in a row is kinda not real and might make the players believe they are undefeatable. This had to stop some time , but the cup's final? After we lead 22 points??? ouch, it hurts...

So I cannot sleep tonight, so I desided to bring out something positive out of this "bizayon" - I cannot find another word.

Today I had an interesting task : there was a table where one field was a date but saved as a nvarchar, and was american date style MM/dd/yy h:/mm:ss PM/AM which I needed to insert into another field but as datetime and in an appropriate for Israel 113 format - dd/MM/yyyy hh:mm:ss.

 tbl_SomeTable
|---------------------------|-----------------|
|    nvarchar_timefield     |  DateTimeField  |
|---------------------------|-----------------|
|  "11/26/2005 5:45:30 PM"  |       ?         |
|---------------------------|-----------------|

My first thought was - ah , this is going to be hard , with dateparts and things like that. Thank god, I first tried something easier that I thought might work. Laziness is the engine of progress - that's what my grandfather always said, 'cause a hardworking person would never come up with a washing mashine! So take a look, isn't it nice?

UPDATE    tbl_SomeTable
SET       DateTimeField = Convert(datetime,Convert(datetime, nvarchar_timefield ),113)
Where
nvarchar_timefield is not null

What basically happens here is :  convert nvarchar_timefield to datetime type and than turn it into format 113. That's it! Hope it helps someone.

And congrats to hapoel and their fans, well, you deserve it.

SQL select on several different tables that are in different databases and on different servers that are not in the same subnet mask

Recently I needed to run a select on 5 different tables that sat on different servers spread all over the internet. At a certain point i almost gave up, as i thought it can't be done. But i believed in the abilities of MS SQLServer 2005 that helped me out so many times and kept searching and trying all sorts of tricks. Here's the final result:

1. When you want to connect from one server to another you need to make sure that the Sql Server Agent is with a green arrow on the server you connect to:

2. go to the server objects  --->  linked servers add new linked server .

When you add a new linked server you add it by its name (and not by IP), which would be just fine if those servers are in the same subnet mask. But in my case they were not, and because of that they couldn't find each other.

3. to overcome this problem you need to "tell" the system where to look for that server and that you do in the file named "hosts" that can be found at:

C:\WINDOWS\system32\drivers\etc

this a text file and all you need to do is add at the bottom of this file "registration" for your server like this:

123.45.6.789  MyServer

where 123.45.6.789 is the IP of the server and "MyServer" is its name.
after that servers start to find each other even by name.

I suppose that this is a pretty trivial thing for someone who knows SQLServer well , but for me and my friends it was something new, so i hope it will help someone.

And off course a lot of credit goes to my genius boyfriend Adlai Mashiach whom many of you know and love. Without his help i wouldn't solve this. Don't forget to come to his lecture on LINQ the day after tomorrow. I've seen the lecture - it's really good!

Hello world!

Well good day all!

And it's really a nice day today as I found out at today's Dev Academy what LINQ is, and this thing will make my life a lot easier starting from tomorrow.

But first of all let me introduce myself.

My name is Natalie Reznik (Natasha) and I am a web developer. Currently I work at Unisurf - Mobile Web development company. We write code in C# .Net 2.0 , we build apps that work with SQL Server 2005 or XML , and off course , web services. Among our clients are such giants of Israeli web like Walla! and Telesport. As to mobile web, well, at this stage the milking cow is the "sex sites" but we are trying to move away from it (pretty successfully, i must say), and to make money from other things. For example Unisurf even has it's own search engine 1212.co.il.
During my work in Unisurf, besides building web sites, I had the honor of building our company's intranet portal with SharePoint Services (WSS 3.0) and reporting system using SqlServer Reporting Services.

Well this is pretty much it. See you around