Will it Python? Machine Learning for Hackers, Chapter 1, Part 1: Loading data

UPDATE 1/15/2014: This blog is no longer in service.

This post is now located at: http://slendermeans.org/ml4h-ch1-p1.html

Thanks,
-c.

About these ads
This entry was posted in Will it Python and tagged , , . Bookmark the permalink.

9 Responses to Will it Python? Machine Learning for Hackers, Chapter 1, Part 1: Loading data

  1. Pingback: Will it Python? Machine Learning for Hackers, Chapter 1, Part 2: Cleaning date and location data | Slender Means

  2. Pingback: Will it Python? Machine Learning for Hackers, Chapter 1, Part 3: Simple summaries and plots. | Slender Means

  3. Carl, in the upcoming pandas release, there will be a more helpful error message in malformed text files like the one here:

    In [1]: read_table(‘ufo_awesome.tsv’, sep=’\t’, header=None)
    —————————————————————————
    ValueError Traceback (most recent call last)
    /home/wesm/code/repos/mlfh/01-Introduction/data/ufo/ in ()
    —-> 1 read_table(‘ufo_awesome.tsv’, sep=’\t’, header=None)

    /home/wesm/code/pandas/pandas/io/parsers.pyc in read_table(filepath_or_buffer, sep, header, index_col, names, skiprows, na_values, thousands, comment, parse_dates, dayfirst, date_parser, nrows, iterator, chunksize, skip_footer, converters, verbose, delimiter, encoding)
    235 kwds['encoding'] = None
    236
    –> 237 return _read(TextParser, filepath_or_buffer, kwds)
    238
    239 @Appender(_read_fwf_doc)

    /home/wesm/code/pandas/pandas/io/parsers.pyc in _read(cls, filepath_or_buffer, kwds)
    172 return parser
    173
    –> 174 return parser.get_chunk()
    175
    176 @Appender(_read_csv_doc)

    /home/wesm/code/pandas/pandas/io/parsers.pyc in get_chunk(self, rows)
    727 msg = (‘Expecting %d columns, got %d in row %d’ %
    728 (col_len, zip_len, row_num))
    –> 729 raise ValueError(msg)
    730
    731 data = dict((k, v) for k, v in izip(self.columns, zipped_content))

    ValueError: Expecting 6 columns, got 123 in row 754

    • Carl says:

      Wes,
      Awesome. I actually thought “too many columns” is about as helpful an error message as I expect to get with I/O functions. Curious as to how read_table thinks there are 123 columns in row 754, where as the read/split shows there are 7. But that it alerts me to the type of problem (extra cols) and finds the first offending row is really helpful. Thanks!

  4. Aman says:

    Carl, thanks for this great series of posts on working through MLFH with Pandas. I stumbled across it fortuitously, but the timing couldn’t be better.

  5. Archit says:

    really really helpful!!Thanks a lot for all these posts!!.i have been studying ML for hackers using R but this gives a different perspective to problems.

  6. Kyle says:

    Bit late to the party, but loving it, Carl – thank you.

  7. Milla says:

    Same here – thanks Carl!

Comments are closed.