DCSIMG
CsvReader and Linq - Gady Elkarif's Blog

Gady Elkarif's Blog

CsvReader and Linq

CsvReader and Linq

 In the previous post about My new Innovative project, I introduced a financial personal project that I will develop.

The first problem I want to solve is how to read CSV files, since all the data in are stored in CSV's.

 I develop simple generic CsvReader which helps me to read this data, before I view it in the graph.

The CSV file is looked something like this (I download S&P for the last 5 years from Yahoo):

Yahoo Finance CSV Data


The CSV file can be downloaded from here

CsvReader Usage:

 First define an interface IMarketEntity that the generic reader will fill automatically:

    public interface IMarketEntity

    {

        DateTime Date { get; set; }

        Double Open { get; set; }

        Double High { get; set; }

        Double Low { get; set; }

        Double Close { get; set; }

        Double Volume { get; set; }

        Double AdjClose { get; set; }

    }

The the usage of the CsvReader is very simple: 


        String fileName = "S_and_P_2000_2008_Daily.csv";

 

    CsvReader<MarketEntity> reader = new CsvReader<MarketEntity>(fileName);

 

    ICollection<MarketEntity> data = reader.Parse();

Class MarkerEntity is the implementation of interface IMarketEntity.

 

The CsvReader source code:

    public class CsvReader<T> where T : new()

    {

        private String m_path;

 

        public CsvReader(String path)

        {

            m_path = path;

        }

 

        public ICollection<T> Parse()

        {

            if (File.Exists(m_path) == true)

            {

                using (StreamReader reader = new StreamReader(m_path))

                {

                    String str = reader.ReadToEnd();

 

                    Int32 idx = str.IndexOf("\n");

                    String header = str.Substring(0, idx - 1);

                    String[] headers = header.Split(new Char[] {','} );

 

                    List<PropertyInfo> properties = new List<PropertyInfo>();

                    foreach (String h in headers)

                    {

                        PropertyInfo propertyInfo = typeof(T).GetProperty(h.Replace(" ", ""));

                        properties.Add(propertyInfo);

                    }

 

                    String data = str.Substring(idx + 1);

 

                    return Parse(properties, data);

                }

            }

 

            return null;

        }

 

        private ICollection<T> Parse(List<PropertyInfo> properties, String str)

        {

            ICollection<T> result = new List<T>();

 

            String[] data = str.Split(new Char[] { '\n' });

 

            foreach (String row in data)

            {

                if (String.IsNullOrEmpty(row))

                {

                    break;

                }

                T item = new T();

 

                String[] rowData = row.Substring(0, row.Length - 1).Split(new Char[] { ',' });

 

                Int32 ii = 0;

                foreach (PropertyInfo propertyInfo in properties)

                {

                    Object obj = rowData[ii++];

 

                    switch (propertyInfo.PropertyType.ToString())

                    {

                        case "System.DateTime":

                            obj = DateTime.Parse((String)obj);

                            break;

 

                        case "System.Double":

                            obj = Double.Parse((String)obj);

                            break;

                    }

 

                    propertyInfo.SetValue(item, obj, null);

                }

                result.Add(item);

            }       

 

            return result;

        }

    }

 And now for the Linq part:

 After the CsvReader return me a collection of MarketEntity, I can use Linq to make queries on this collection, here are some samples:

1) Get all the days with value between 1274 to 1416 

    var res1 =

        from e in data

        where (e.Low > 1375 && e.High < 1416)

        select new { Low = e.Low, High = e.High };

2) Get the days with volume > 5700000000

    var res2 =

        from e in data

        where e.Volume > 5700000000

        select e;

3) Get the days with volume > 5700000000

    IEnumerable<MarketEntity> res3 = data.Select(e => e).Where(e => e.Volume > 5700000000);

4) Get the days with volume > 5700000000 - Return only the volume

        var res4 = data.Select(e => new { Volume = e.Volume }).Where(e => e.Volume > 5700000000);

5) Group by days high value / 100

    var res5 =

        from e in data

        where e.High > 1500

        group e.High by e.High / 100 into g

        select new { High = g.Key, Numbers = g };


 The source code can be founded here.


 

פורסם: Feb 06 2008, 02:17 PM by egady | with 3 comment(s)
תגים:

תוכן התגובה

sccom כתב/ה:

Looks great. I have a question about this part:

using (StreamReader reader = new StreamReader(m_path))

               {

                   String str = reader.ReadToEnd();

Isn't that the Achilles Heel of your code? If multiple importers each get multimegabyte files, a lot of RAM will be consumed... Would it make more sense to do some kind of line-by-line interpretation?

# April 6, 2008 7:37 AM

egady כתב/ה:

Yes, you actually right, it wasn't the goal for handling big files, but it can be done by using chunks, for reading data is always better.

A better implementation is reading chunk of bytes (4k...) from the file each time, and be awaring for lines which will be cut in the middle.

Thanks,

Gady.

# April 6, 2008 3:17 PM

PS כתב/ה:

Cant some sort of yield return be used intead of read to end.

# October 12, 2009 11:17 AM
שלח תגובה

(שדה חובה)  

(שדה חובה)  

(אופציונלי)

(שדה חובה) 

Please add 1 and 8 and type the answer here:


Enter the numbers above: