life = code + sleep

June 2008 - Posts

RE:yield

Replying to this post about the use of yield, here is an everyday practical example: split a string to lines of text (by newline).

(I do like comparisons, so here’s the Python version too):

C#(xUnit)

   1: static IEnumerable<string> SplitLines(string input)
   2: {
   3:     while (true)
   4:     {
   5:         int idx = input.IndexOf(Environment.NewLine); 
   6:  
   7:         if (idx < 0)
   8:         {
   9:             yield return input;
  10:             break;
  11:         } 
  12:  
  13:         yield return input.Substring(0, idx);
  14:         input = input.Substring(idx + Environment.NewLine.Length);
  15:     }
  16: }

Python

   1: def split(str):
   2:     while True:
   3:         idx = str.find('\n');
   4:         if(idx < 0):
   5:             yield str
   6:             break
   7:         
   8:         yield str[0:idx]
   9:         str = str[idx+1 :]

Sorry, Ruby’s idiom of yielding is a bit twisted. It will not create a generator.

Back From The Dead

Recovered from a major HDD crash and data loss.

Anyway, here’s a quick fun snippet, looking at the idioms and code size of Python vs Ruby: Getting the histogram of an english text. You can try this on some text of your choice.

Python

   1: import sys,re,collections
   2: ct=0;
   3: expr = re.compile('\w')
   4: f_obj = open(sys.argv[1])
   5: dict = collections.defaultdict(int)
   6: for line in f_obj:
   7:     for match in expr.finditer(line):
   8:         ct +=1
   9:         dict[match.group(0)] += 1
  10:  
  11:  
  12: from operator import itemgetter
  13: dict = sorted(dict.items(), key=itemgetter(1),reverse=True)
  14: for (letter, value) in dict:
  15:     print "%s\t%f%%\t[%s/%d]" % (letter,value*100.0/ct, value, ct)
  16:  

Ruby

   1: dict = {}
   2: dict.default = 0
   3: ct = 0
   4: open(ARGV[0]) do | f |
   5:   while c = f.read(1) do
   6:     if c =~ /\w/
   7:         ct += 1
   8:         dict[c] += 1
   9:     end
  10:   end
  11: end
  12: dict.sort{|a,b| b[1]<=>a[1]}.each { 
  13:    |key,value| print "%s\t%f%%\t[%d/%d]\n" %
  14:         [key, value*100.0/ct, value, ct]}

So who won?