This previous weekend (11/2) I participated in the Hacking, Are you Intuit? UCSD hackathon. I was part of a three-person team (Myself, Vivek Iyer, and Abhijith Chitlur). We created an application for the Android platform, named BookBroker (Abhijith suggested the name).
The Application:
BookBroker is an application born from the idea that a student should have to only press a button to sell their textbook. I made an observation that students tended to sell books on Facebook groups (such as UCSD’s Free and For Sale), a phenomena that can also be observed at different Universities. The ability to sell books on Facebook has its advantages, such as the quick facilitated money transfer between students at University, removing the need to deal with hassles like shipping. Using Facebook organically also removes the need to have a separate login client, such that the only pain felt by the user is the need to download an android application.
The first screen of the application is a scanner application , to enable a student to grab one of their textbooks and scan it. The second screen of the application is a screen that shows the augmented data of the book scanned by the first screen. The second screen also has a button to allow a person to post to a Facebook group.
The Technology:
The Book-Scanning application on first screen is powered by scandit api, a scanner api that allows you to scan a barcode and return an ISBN number. Vivek was responsible for this part, so I only have a cursory understanding of what happened here.
The second screen’s Facebook connect feature is powered by Facebook’s OpenGraph API, and carried with it the problem of needing to authenticate with tokens. This is handled through the Facebook API and by remembering to encode all outgoing requests in an AsyncTask. The reason for this is because Android OS does not allow blocking on it’s UI thread, and without making an AsyncTask, the GET request would be waiting for a response on the UI thread, thus returning a denial exception. Careful use of the debugger was necessary to catch this error at around 3AM in the morning. The GET request to augment the data was fairly simple after that, as the GET request would cause the server to send a string of information that could be parsed to JSON using Jackson APIs. In this manner, the data could be accessed easily.
I worked on the data augmentation section, where we would query a server from the application with the ISBN number, and receive a set of information. The way this was implemented was through the express framework (node.js). This technology was chosen because it is relatively easy to code quickly, and our use case for it was such that it was simple to get started. Express was used primarily as a router. The android application hit a specific route with a parameter ISBN, and then express would be responsible for formulating the correct queries to hit the data store. The data store in this case was Apache SOLR, chosen primarily for it’s out-of-the-box http rest interface and minimal configuration requirements. While it probably isn’t the most efficient data store, it provides an excellent catch-all with rich lucene querying capability without writing a SQL query, which makes it easy to explain and relatively easy to use (simple GET request on query). This makes it an ideal data store for a hackathon, as all these features are open source, come straight out of the box, and simplify coding challenges by simply requiring us to provide a middle server that makes and serves http requests (i.e. no dealing with ORM).
The scraper that fed data into SOLR was also implemented in node.js using the cheerio module, which allows for quick and easy parsing of html sites. Cheerio is more or less based on a similar CSS-descriptor-style of parser, of the family of JSoup/BeautifulSoup modules. The scraping was relatively trivial for this part, and the main trouble was finding a site endpoint that would allow searches based on ISBN. Since this is not a general use case, I had to get a little more creative, and first scraped the UCSD site for section IDs. The IDs were then used to hit the bookstore website for ISBNs. In this manner, website data was made available for scraping. Since node.js is async with callbacks by nature, the quick hacky solution was to commit changes after every scrape (to make sure data is available), but this causes a very significant slowdown. As we only had ~4k documents to index, this was acceptable, but in further scaling of this application, a different solution should be found for this. An alternative solution would be to add all documents after each scrape, and then send an alternative signal to commit to the server. Both methods end with the same result, a properly-indexed SOLR blade waiting for connections.