Making robots happy

January 18, 6:07 p.m. | OpenWeb

Machine learning is super-cool. I've spent some free time over the last year trying to write a good way to analyze sentiment on Twitter. I started by picking out positive and negative words, but that didn't work too well, especially without any corpus of positive and negative words to go off of. So, before winter break last semester, I wrote a tool that used a naive Bayesian classifier to do the sentiment analysis.

Which sounds pretty intense if you're not into CS, but it really isn't.

Basically, what it does is look at tweets that people have already categorized as positive or negative to learn from them. It then makes a guess for a new tweet based on how holistically similar it is to those old tweets. There's a lot more to it than that, of course (for the technically-inclined, I took a lot of ideas from this guy's suggestions, which were AWESOME), but that's basically how it works.

So, here's an example. It gets weird data points from time-to-time, but is mostly pretty accurate. I chose to look at SOPA sentiment on Twitter, so I expected pretty negative results.







Anona-what?

January 13, 8:52 p.m. | OpenWeb

Yet another person is harassed by a sleazy business-owner after posting a negative review on Yelp:

http://boingboing.net/2012/01/13/lawsuit-store-owner-tried-to.html

I'm sick of hearing people claim that we need full accountability for the things we say online. While that's an admirable goal in a perfect world, the reality is that we often post things that may be unintentionally dangerous. If it's a company trying to destroy a woman's online reputation or someone posting critiques of the Chinese government, our actions and words online often have consequences that go far beyond what we expect when we post them. 

Some people may be comfortable with authenticating themselves online. That's great. But any service that relies on some degree of personally identifiable information being shared should also offer an option to hide that information, or at least disguise it via a pseudonym. And when authentication is served up via an API, the API itself should offer the same ability to hide information. 

Google+ drew this issue into the spotlight a few months ago when it forbade pseudonyms on the service. However, Facebook is an even more dangerous offender when it comes to this. When you use a Facebook platform application (like Facebook comments or any game or site that uses Facebook auth), you're automatically sharing your identity with that application. No matter what. Anything you post using that application can easily share that identity with the public. 

I'm not sure if the person here used Facebook to connect with Yelp for her account, but I know I do. And that worries me. I know I also use it to post comments on certain sites. To play certain games. Etc.  If I ever posted something that could be dangerous on a site that required Facebook authentication, I would have no way to disguise that post. Which is a problem. I'd suggest that Facebook implement MUCH more granular authentication permissions (something which I happened to do a semester research project about recently). From the users I talked to, people want this kind of control. Something like this Chrome extension would work.

Even if your site isn't using the Facebook API, the solution is simple (even simpler, probably). Sites can continue asking people for their names, but allow an option to hide that from the public. Or do what Twitter does and just let the user post under a handle. Not hard. If your site DOES use the Facebook API (which is totally fine, lots of mine do), then let the user know what information you have, why you have it, and how they can control what you do with it. Give them a "hide this from the public" option too.

I'm sick of people being punished for things they say online. The web lets us be more open and honest with businesses, governments and each other than anything has ever done before; let's not let that go to waste.





Fun with data

September 24, 8:13 p.m. | OpenWeb

So, I've been wanting to experiment with data-viz for quite a while, and I figured why not play around with it some today, while watching OSU almost lose to (the traitorous) Texas A&M. 


I wrote a python script to query the Facebook and Twitter APIs to get a sample of all my statuses, then break it up by percent of the total for each service by each day of the week. I then used gRaphael to graph it. Turned out that there actually IS a difference, and over the course of the week, my Twitter and Facebook use shifts. Interesting stuff!

Facebook:

Twitter:



Here's the script.