Coding with a LEAP Motion — Greylock Hackfest 2014

Over the weekend I attended Greylock Hackfest (http://greylocku.com/hackfest/), and built LeapPad with my teammates Robert Ying, Aditya Majumdar, and Kevin Heh.

Disclaimer: This post reflects my views only.

About Greylock:


Greylock Hackfest was an amazingly well-run hackathon, hosted at Medium, a Greylock portfolio company with an extremely nice office. They provided an extremely well-done atmosphere for the hackers who attended, and didn’t feel compelled to crowd the space, so I felt as if there was plenty of space to stretch out and did not feel cramped at all. There can be something said for having a large hackathon atmosphere with 1000+ hackers, but I think Greylock had an excellent size and feel with only around 160 hackers in attendance (by rejecting 2/3 of the applicants). In addition, I was particularly excited to see @dpatil (djpatil) in person, since I had been following his twitter musings for a while now. He was as amusing as expected, serving as the MC for the top ten presentations and award ceremony. The top ten hacks were all amazing, and I think the top three were very well-deserved, which is a testament to great selection and judging at this hack. I would definitely encourage anyone eligible to apply for Greylock Hackfest next year, and go if selected.

About LEAP:


In this post I hope to document a few of the issues we ran into while developing for LEAP, with the hope that someday I can look back and appreciate just how much hands-free technology will have improved.

We built LeapPad, a Mac OSX interaction suite, using the leap motion. The GUI was designed in PyQt4, with the Mac OSX interactions enabled via automac and BetterTouchTool. We used LEAP motion SDKs with python bindings to interact with the LEAP. We developed with the beta 2.0 skeletal tracking APIs, which come with a neat set of bone abstractions which I will mention later.

LEAP is an extremely compelling device. It is almost magical to be able to wave your fingers in front of a computer screen and have the motion tracked, which is a technological feat that is nothing short of amazing. It has nice abstractions so that many things are provided for a developer, including finger tip joints and all manner of gestures, such as circles, swipes, as well as an easy way of thinking of the interaction zone above the LEAP by abstracting the zone into a box which maps the zone into a coordinate system, with points from 0 to 1. Velocity is provided in many cases as well, easing detection. Also, smoothing features can be found attached to many location apis such as apis for the hand and fingers, which adds to the available options for a developer to use. With so many available tracking options, the LEAP api becomes a powerful tool, and able to enable amazing visual demos, such as the visualizer that ships with the LEAP as a debugging tool.

The first problem we ran into was the fact that the python sample application didn’t run on my computer (OSX 10.9.2). After a good deal of debugging, we noticed a flag that controlled whether applications received LEAP actions while not in focus. Since I was running this from a console that was also running tmux, my application was, in fact, NOT in focus, and therefore any events from the LEAP motion were not sent to my application.
Adding this line to the initialization code fixed my issue:

controller.set_policy_flags(Leap.Controller.POLICY_BACKGROUND_FRAMES)

After enabling the correct policy flag, the LEAP finally decided to begin sending the sample application data to my program. I don’t think this problem would have arisen if I had just used the typical Mac OSX terminal, but it was quite surprising to find I had a non-functional demo, and I spent about thirty minutes debugging the reason.

The next problem was more application-specific. Our intent was to create a virtual QWERTY keyboard in spirit, and therefore we had to deal with finger tracking. LEAP motion actions are provided to the consumer in frames. In other words, the controller will fire a frame event when tracking a new frame worth of actions. One then has a frame’s worth of processing time to commit actions before the next frame arrives. The frame consists of hand data, finger data, and their relevant information. LEAP actually captures fingertip data quite impressively, and therefore we attempted to design our keyboard to take advantage of this fact, by tracking fingertips. However, due to biomechanical restrictions on how we move our fingers, it turns out that tracking fingertips for key presses is a suboptimal plan. The problem lies with what is considered a key press, and the need to debounce multiple key presses. A problem that still plagues the application that we developed is the unfortunate consequence of multiple key presses occurring if the LEAP enters an undefined state. In our case, the LEAP enters such a state if it loses track of your fingers, which is a state that is relatively easy to attain since the bounding box of where the LEAP can actually see your fingers is not marked (since it is virtual space), and thus one may easily be inclined to move in a direction or manner that would take their fingers out of view. We attempted to resolve this problem by introducing a multi-phase key press tracker. First, a determination of whether or not a key press had occurred in the frame was done. Then, a determination of which finger did the pressing occurred. Finally, the finger set a flag that indicated that it had pressed a key, and this flag was reset on the first frame that the finger was no longer the one considered ‘pressed’. This carries the trade-off that continuous presses with a repeat rate would not be supported.  A possible mitigation for this would be a repeat rate timer, but we did not implement this. To do the first determination, we used an empirically-determined velocity threshold on the fingertip velocity, helpfully provided by the field:

pointable.tip_velocity

The determination of the key being pressed was done by a comparison of the location of the fingers, and taking the lowest finger as the primary finger being pressed. The trade-off here was that we lost thumb functionality and had to map it to spacebar, since the triggering would often default to thumb when other key presses were intended.

Another problem with the keyboard is that the tip location presents where the user would like to click, if we draw the indicator markers according to tip location. However, the the tip moves sharply when a tap is expected, and thus the location data given on the frame where a key press is detected is unreliable. One solution would be to cache prior frames, but we felt our mitigation solution was superior.

We attempted to mitigate the problem by tracking from the joint of the metacarpal bone to the proximal bone instead of using finger tip location. We did this because we felt that this joint was more likely to remain stable during a keypress. Luckily, LEAP exposes the following function:

joint_position(jointIx)

This mitigation allowed for more stable finger position tracking at the cost of not being able to accurately track finger separation, which is a decently major UX hit since it isn’t clear to the user what is being tracked, and having indicator dots that don’t reflect a user’s movement entirely can be frustrating. We made this tradeoff because we found that having stable, intended keypresses wins over somewhat chaotic user experience.

These are still relatively unsolved problems, since we only managed to get a mostly functional demonstration, but I think it definitely is possible to make a stable keyboard, and am very excited to see one come out in the near future.

Thank you to PJ Loury, who unknowingly (kinda) lent out his LEAP for hacking this weekend. I’ll get it back to you soon, I promise.

Container-Ception

Motivation:

I started reading PAAS under the hood (henceforth referred to as ‘the book’) a few days ago. It’s a collection of posts by dotcloud (docker, inc.) engineering staff about what linux kernel processes are involved in their PAAS offering. Applications like dotcloud and heroku have fascinated me for some time, and I was really impressed with the explanations given in this book. To further my understanding of these concepts, I decided to write a series of blog posts about them, and document my explorations. You’ll find my logs below.


 

Step 0: Setup

I felt that the only way to fully understand what was being talked about in the book was to grab myself an instance and start messing around with it. Thus, I went and got the cheapest instance I could off of digitalocean, selected ubuntu 12.04, created a new user (other than root) using instructions located here, added my ssh keys using instructions located here, and considered that my starting point. After a quick set of code executions, i’m ready to begin!

One thing to address before I jump off into the land of Linux Containers, is why we need virtualization to begin with. In essence, the idea of platform-as-a-service is to run multiple applications together, possibly from the same users/clients, and also possibly from different users/clients.


Step 1: LinuX Containers (LXC)

In Episode One of the book, on almost the first line, the authors mention something called a Linux Container (LXC). I had no idea what that is, so I looked it up.

Linux Containers are, as described here, a lightweight virtualization technology. A natural follow-up question to this kind of explanation is, what is LXC a “lightweight” compared to, and the answer is that LXCs seem to be compared to Virtual Machines. A nice explanation in video form that I found is located here, and I have documented a bit of background research and a summary below:

A bit of background information for the video:
In virtual-machine-land, there is a thing called a hypervisor, which is something (hardware, software, firmware) that creates and runs virtual machines. There are two main kinds of hypervisors, known as Type 1 and Type 2. The difference between Type 1 and Type 2 hypervisors is that Type 1 hypervisors run directly on the box (‘bare-metal’), whereas Type 2 hypervisors run on top of an os on top of the box (‘hosted’). One would expect that Type 2 hypervisors would therefore perform worse, but this article suggests that the trade-off is not that bad.

Summary:
We are trying to solve the problem of process isolation, since we want to run apps from different people on the same machine. The apps should be unaware of each other, and be in a state such that they don’t know the other apps exist. We want to consider two options for this, Virtual Machines and Containers.

Virtual Machines (VM) are capable of running their own OS, and then within the OS they can run apps. This provides full isolation from other apps, and thus provides a potential way to solve the problem of running multiple isolated apps from different users. The idea here is you spawn a VM instance per app to allow for full separation.

Containers are based on linux kernel features, and can isolate processes on the same OS. Therefore, the idea is you can spawn a container per application, and all the containers share one operating system, which performs isolation on the applications.

A major drawback of using the VM approach is the duplication of OS and software common to all applications, since a copy must be included in every VM.

A main drawback of using containers is that you can’t have apps using multiple types or versions of operating systems on one box, since they all share the same OS. Another drawback is that the application must be linux-based, since it will be running on a linux OS.


To get a full sense of what the authors of the book are talking about, I decided to try to run LXC on the digitalocean box.

In the case of running containers (LXC) on an instance provided by digitalocean, the box will be using both virtual machines and containers. DigitalOcean runs on a hypervisor called KVM, which is a Type 1 hypervisor. Since DigitalOcean provided a VM that had ubuntu 12.04 loaded on it, and the linux containers will run on top of this OS, we have the following stack:

Stack

 

Naturally, since we are only one of the many clients that DigitalOcean has, there may be many VMs (from other people) that are similarly running on the same KVM hypervisor running on the same bare metal box. We cannot access or see the other VMs from our VM, which is a desirable quality that demonstrates the isolation that a VM provides.

Explorations with Cardboard! (and Android Studio)

So it turns out that getting the Cardboard Sample to compile is a lot easier than I thought. Who knew that things you fetch from github tend to work on the first try? (or rather, second…)

I became really interested in Google’s new Cardboard project after attending Google I/O this past week. After all, having no experience with computer imaging or rendering and also having no experience with virtual reality makes me the best target candidate for this project, right?

The project developers state that Cardboard is intended to bring virtual reality to the masses. The goggles are made out of their namesake material, cardboard, with the most complex parts being a set of lenses (used to focus the viewer’s eyesight), one’s smartphone, and a set of magnets used for input to the smartphone. This greatly enhances the approachability of the device compared to the futuristic creations by OculusVR and similar competitors in the virtual reality space. I hope the devices gain popularity among developers, especially among curious individuals who haven’t ever developed for this platform (like me!). I can’t wait to download some amazing applications.

Enough speculation. Onto the run!

I haven’t messed with Android for a few months now, and I was quite surprised to notice that Google has released their Android Studio, so I decided to give it a whirl. This combined with their new build system, Gradle, that the sample project uses as a dependency manager caused me to have a slight hiccup that I will document here.

Upon downloading Android Studio, and opening it for the first time, I was very surprised to find that Android Studio has Github Integration! Thus, importing the project was a breeze, as I simply put in my github credentials, selected the repository I wanted to pull from, and voilà, the repository was cloned.

 Screen Shot 2014-06-30 at 12.14.28 AM

 After cloning my project, I unfortunately fell into the trap of using the old Eclipse Android Developer Tools method of running the project, with the Run command. However, a more astute individual would have realized that this is a Gradle project! Since it is a Gradle project, we must use the Gradle method of running it. There are two choices, using the Gradle sidebar (red arrows), or the Gradle Console (blue arrow). Since I don’t know the console commands, I’ll use the sidebar.

Screen Shot 2014-06-30 at 12.20.50 AM

So, after clicking the Gradle sidebar, I noticed that there are a lot of build commands. I chose the ‘install debug’ one, which builds the android project, signs it with a debug key, and loads it onto my development device.

Screen Shot 2014-06-30 at 12.24.51 AM

 

Oh no! Looks like I got an error. For some reason, the build does not like using the ‘L’ support library, as shown in this helpful StackOverflow post. Guess I’ll have to edit the AndroidManifest.xml.

Screen Shot 2014-06-30 at 12.27.31 AM

 

Per the solutions in the StackOverflow post, I changed compile ‘com.android.support:support-v4:+’ to ‘compile ‘com.android.support:support-v4:19.1.0’. I think this problem only occurs if my android build tool is not updated to the version that supports ‘L’, since the older build tool doesn’t have support for the codename. Haven’t tested it though. In any case, this seems to resolve the build problem.

Screen Shot 2014-06-30 at 12.30.32 AM

 

Yep, happy build.

Screen Shot 2014-06-30 at 12.40.13 AM

Since I’m debugging on a HTC One with USB Debugging enabled, I noticed that the app is now on my phone. Mission Success!

Screenshot_2014-06-30-00-55-37

 

 

Hooray for Cardboard!

Microsoft AppHack

Yesterday, 11/9/2013, was the Microsoft Hackathon. The fact that this hackathon occurred just one week after the Intuit hackathon means that I can easily compare and contrast my experiences. I’ll start by talking about the nontechnical differences between the two (organizational, etc.), and then move into talking about the differences in my projects. I did not present at the Microsoft Hackathon.

Nontechnical:

Food-wise, the Microsoft Hackathon was slightly better. Intuit ran out of dinner, but Microsoft managed to keep it somewhat decent, even with certain individuals taking a lot of food back to their tables (thus causing a resource-allocation problem!).

Theme-wise, the Intuit Hackathon did a better job of communicating what they wanted. The theme was ‘As a student, I wish I could…’, which was fairly restrictive. For the Microsoft Hackathon, I felt that the only direction we got was that the goal was to make a windows App out of it. This is merely an observation of fact; there are benefits to both approaches. The first approach allows for a more direct competition (comparing apples to apples), and a comparison of different approaches to solving similar problems. The second approach allows for full creative license, and the opportunity to see some truly outrageous things.

As a side note, the restriction of the team members to one to two people by Microsoft really hurt this hackathon in my opinion. While people are very nice at UCSD, and willing to offer advice, part of having larger teams (3-4 people) allows for combining people of different backgrounds (perhaps one person that you know, and one person that you don’t), and allows for more interesting groups. The restriction to Windows platform hacks makes sense as this was a Microsoft AppHack, but I felt that this was a large problem, considering in the beginning even the provided Samsung tablets had problems, and the staff seemed rather confused about the whole scenario. Even though I think Microsoft had more technology problems overall, it was nice to see them handle the problems well. I was impressed with both aspects.

 

 

 

Intuit Hackathon

This previous weekend (11/2) I participated in the Hacking, Are you Intuit? UCSD hackathon. I was part of a three-person team (Myself, Vivek Iyer, and Abhijith Chitlur). We created an application for the Android platform, named BookBroker (Abhijith suggested the name).

The Application:

BookBroker is an application born from the idea that a student should have to only press a button to sell their textbook. I made an observation that students tended to sell books on Facebook groups (such as UCSD’s Free and For Sale), a phenomena that can also be observed at different Universities. The ability to sell books on Facebook has its advantages, such as the quick facilitated money transfer between students at University, removing the need to deal with hassles like shipping. Using Facebook organically also removes the need to have a separate login client, such that the only pain felt by the user is the need to download an android application.

The first screen of the application is a scanner application , to enable a student to grab one of their textbooks and scan it. The second screen of the application is a screen that shows the augmented data of the book scanned by the first screen. The second screen also has a button to allow a person to post to a Facebook group.

The Technology:

The Book-Scanning application on first screen is powered by scandit api, a scanner api that allows you to scan a barcode and return an ISBN number. Vivek was responsible for this part, so I only have a cursory understanding of what happened here.

The second screen’s Facebook connect feature is powered by Facebook’s OpenGraph API, and carried with it the problem of needing to authenticate with tokens. This is handled through the Facebook API and by remembering to encode all outgoing requests in an AsyncTask. The reason for this is because Android OS does not allow blocking on it’s UI thread, and without making an AsyncTask, the GET request would be waiting for a response on the UI thread, thus returning a denial exception. Careful use of the debugger was necessary to catch this error at around 3AM in the morning. The GET request to augment the data was fairly simple after that, as the GET request would cause the server to send a string of information that could be parsed to JSON using Jackson APIs. In this manner, the data could be accessed easily.

I worked on the data augmentation section, where we would query a server from the application with the ISBN number, and receive a set of information. The way this was implemented was through the express framework (node.js). This technology was chosen because it is relatively easy to code quickly, and our use case for it was such that it was simple to get started. Express was used primarily as a router. The android application hit a specific route with a parameter ISBN, and then express would be responsible for formulating the correct queries to hit the data store. The data store in this case was Apache SOLR, chosen primarily for it’s out-of-the-box http rest interface and minimal configuration requirements. While it probably isn’t the most efficient data store, it provides an excellent catch-all with rich lucene querying capability without writing a SQL query, which makes it easy to explain and relatively easy to use (simple GET request on query). This makes it an ideal data store for a hackathon, as all these features are open source, come straight out of the box, and simplify coding challenges by simply requiring us to provide a middle server that makes and serves http requests (i.e. no dealing with ORM).

The scraper that fed data into SOLR was also implemented in node.js using the cheerio module, which allows for quick and easy parsing of html sites. Cheerio is more or less based on a similar CSS-descriptor-style of parser, of the family of JSoup/BeautifulSoup modules. The scraping was relatively trivial for this part, and the main trouble was finding a site endpoint that would allow searches based on ISBN. Since this is not a general use case, I had to get a little more creative, and first scraped the UCSD site for section IDs. The IDs were then used to hit the bookstore website for ISBNs. In this manner, website data was made available for scraping. Since node.js is async with callbacks by nature, the quick hacky solution was to commit changes after every scrape (to make sure data is available), but this causes a very significant slowdown. As we only had ~4k documents to index, this was acceptable, but in further scaling of this application, a different solution should be found for this. An alternative solution would be to add all documents after each scrape, and then send an alternative signal to commit to the server. Both methods end with the same result, a properly-indexed SOLR blade waiting for connections.

Adventures in Machine Learning

So recently I’ve been taking the Web Intelligence and Big Data course on coursera (www.coursera.com), offered by IIT Delhi with professor Gautam Shroff. In this course, students were exposed to the basic machine learning algorithm, naive bayes, and also were asked to calculate likelihood ratios by hand. The algorithm was presented as a classification method, able to discern sentiment from text by probabilistically determining the likelihood that certain words would appear or not appear in the query, compared to the ‘trained’ model.

I tried to apply this to the Yelp competition dataset, found at http://www.yelp.com/dataset_challenge/ .

I wanted to try to determine, with the reviews dataset, whether or not an arbitrary review was ‘good’ or ‘bad’. Thus, I took the reviews dataset, and attempted to divide it into good and bad pieces. Three stars and above were considered to be ‘good’, and this left two and one star examples as ‘bad’.

I used roughly 5k reviews as the sample space, and validated it with a set of roughly 1k reviews, yielding an accuracy of 70.07%.

 

 

Video Search

Disclaimer: The following musings are by someone who is potentially ignorant and confused (although very willing to be criticized and corrected!)

I recently read an article on video search, a problem that I originally (due to my naiveté) thought was impossible to solve. I won’t pretend to be the most informed individual in this field, but I found the author’s claimed capabilities quite intriguing.

Perhaps the most compelling claim is a way to distinguish (in their words, ‘encode subtle meaning’) between sentences that are innately similar; an example of this would be their example of the difference between the two sentences: ‘the person rode the horse’ versus ‘the horse rode the person’.

More on this later.

Source:

Click to access 1309.5174v1.pdf

Embedded JavaScript (ejs) Tag Family: <%, <=, <%- and -%>

The following should be true up to ejs 0.8.3:

<% %> or [% %] tags are used to execute javascript code. Something like

<% log = “hi” %>

will set hi equal to the global javascript variable ‘log’.

<%= %> or [%= %] is used to execute code, and then print the tostring of the result to the screen. Thus, something like

<%= log %>

will print out ‘hi’ to the screen. Note that HTML is escaped when using this tag. Thus, <%= <img src=”blahblah”> %> will print literally ‘<img src=”blahblah”>.

<%- %> is used to allow ‘unescaped buffering’, which essentially means that HTML is NOT escaped, and thus you can do something like <img src=”blah”> and have an image actually render to the page.

<% -%> tag slurps a newline following the expression. Thus, something like

var x = ‘inbetween’;

str = ‘me <%= x %> \nhere’;

ejs.render(str);

should return the result ‘me inbetween here’ (newline after is slurped).

The last tag was added to ejs in this github chain for those interested:

https://github.com/visionmedia/ejs/pull/45