<?xml version="1.0" encoding="UTF-8"?>

<rss version="2.0"
 xmlns:blogChannel="http://backend.userland.com/blogChannelModule"
>

<channel>
<title><![CDATA[Dobrica Pavlinušić's random unstructured stuff: javascript]]></title>
<link>https://saturn.ffzg.hr/rot13/index.cgi?action=weblog_display;category=javascript</link>
<description></description>
<pubDate>Fri, 24 Aug 2007 09:07:17 -0000</pubDate>
<webMaster>root@saturn.ffzg.hr</webMaster>
<generator>Socialtext Workspace v2.19.0.2</generator>

<item>
<title><![CDATA[Exhibit facet browsing]]></title>
<link>https://saturn.ffzg.hr/rot13/index.cgi?exhibit_facet_browsing</link>
<description><![CDATA[<div>Creator: Dobrica Pavlinušić</div><hr/><div>Tags: csv, javascript, perl</div><hr/><div class="wiki">
<p>
We have few mp3 players which no longer work, but are still under warranty. So idea was to pick another device (which will hopefully work longer). However, on-line shops leave a lot to be desired if you want to just do quick filtering of data.</p>
<p>
As a very fortunate incident, I stumbled upon <a target="_blank" title="(external link)" href="http://simile.mit.edu/exhibit/">Exhibit<!-- wiki-renamed-hyperlink "Exhibit"<http://simile.mit.edu/exhibit/> --></a> from <a target="_blank" title="(external link)" href="http://simile.mit.edu/">SMILE<!-- wiki-renamed-hyperlink "SMILE"<http://simile.mit.edu/> --></a> project at MIT which brought us such nice tools as <a target="_blank" title="(external link)" href="http://simile.mit.edu/timeline/">Timeline<!-- wiki-renamed-hyperlink "Timeline"<http://simile.mit.edu/timeline/> --></a> and <a target="_blank" title="(external link)" href="http://simile.mit.edu/potluck/">Potluck<!-- wiki-renamed-hyperlink "Potluck"<http://simile.mit.edu/potluck/> --></a>.</p>
<p>
So, I scraped web, converted it to CSV and tried to do something with it. In the process I again re-visited the problem of semi-structured data: while data is separated in columns, one column has generic description, player name and all characteristics in it.</p>
<p>
So, what did I do? Well, I started with CPAN and few hours later I had a <a target="_blank" title="(external link)" href="http://svn.rot13.org/index.cgi/simile/view/links/csv2js.pl">script which is rather good in parsing semi-structured CSV files<!-- wiki-renamed-hyperlink "script which is rather good in parsing semi=-structured CSV files"<http://svn.rot13.org/index.cgi/simile/view/links/csv2js.pl> --></a>. It supports following:</p>
<ul>
<li>guess CSV delimiter on it's own (using <a target="_blank" title="(external link)" href="http://search.cpan.org/~enell/Text-CSV-Separator/">`Text::CSV::Separator`<!-- wiki-renamed-hyperlink "`Text::CSV::Separator`"<http://search.cpan.org/~enell/Text=-CSV=-Separator/> --></a>)</li>
<li>recognize 10 Kb and similar sizes and normalize them (using <a target="_blank" title="(external link)" href="http://search.cpan.org/~ferreira/Number-Bytes-Human/">`Number::Bytes::Human`<!-- wiki-renamed-hyperlink "`Number::Bytes::Human`"<http://search.cpan.org/~ferreira/Number=-Bytes=-Human/> --></a>)</li>
<li>splitting of comma (<tt>,</tt>) separated values within single field</li>
<li>strip common prefix from all values in one column</li>
<li>group values and produce additional properties in data</li>
<li>generate specified number of groups for numeric data, useful for price ranges</li>
<li>produce JSON output for Exhibit using <a target="_blank" title="(external link)" href="http://search.cpan.org/~audreyt/YAML-Syck/">`JSON::Syck`<!-- wiki-renamed-hyperlink "`JSON::Syck`"<http://search.cpan.org/~audreyt/YAML=-Syck/> --></a></li>
</ul>
<p>
<a target="_blank" title="(external link)" href="&quot;<br />
&nbsp;So how does it look?&quot;http://blog.rot13.org/demo/links/links.html">&quot;<br />
&nbsp;So how does it look?&quot;http://blog.rot13.org/demo/links/links.html</a></p>
<p>
In the end, it is very similar to the way <a target="_blank" title="(external link)" href="http://www.dabbledb.com/">Dabble DB<!-- wiki-renamed-hyperlink "Dabble DB"<http://www.dabbledb.com/> --></a> parses your input. But, I never actually had any luck importing data into Dabble DB, so this one works better for me <tt>:-)</tt></p>
<p>
This will probably evolve to universal munger from CSV to arbitrary hash structure. What would be good name? <tt>Text::CSV::Mungler</tt>?</p>
<p>
This is a first post in series of posts which will cover one hack a week on my blog. This will (hopefully) force me to write at least one post a week on one side, and provide some historic trace about my work for later.</p>
</div>
]]></description>
<author>Dobrica Pavlinu&#x161;i&#x107;</author>
<category>csv, javascript, perl</category>
<guid isPermaLink="true">https://saturn.ffzg.hr/rot13/index.cgi?exhibit_facet_browsing</guid>
<pubDate>Fri, 24 Aug 2007 09:07:17 -0000</pubDate>
</item>
</channel>
</rss>