Friday, November 23, 2007

Is Britney Spears Spam?

From my old blog (21st August 2007):
In the last post, I was raving on about ..., uhm, probably about trust bootstrapping, right? :-) I went from a definition of cold reading to a very personal interpretation of Posner's review of Blink. Now, in the same vein (i.e., keeping on being delirious), I move on this nice paper, which carries out (in a way) not offender profiling but MySpace user profiling.
Title: Is Britney Spears Spam? (pdf)

Problem: In social network websites (e.g., MySpace), to decide whether to accept invitations to connect, users manually examine the senders' profiles. However that may be time consuming!

Existing Solutions: One may automate the acceptance of invitations by having users running trust propagation algorithms.

Complication: The authors write that using current trust propagation algorithms may be less than desirable since trust both decays with the number of hops and is usually one-dimensional.

Proposal: Use machine learning techniques to classify user profiles. The classification describes a profile across two dimensions: sociability and promotin. Based on these dimensions' values for a profile, users then decide whether
to accept the invitation of that profile's user. To come up with a dataset on which to evaluate their algorithm, the authors randomly select and rate by hand MySpace users.
Future: I would:
> Apply a new trust propagation algorithm (pdf) to avoid trust decay and apply TRULLO (pdf) to handle multi-dimensional trust.
> Look at literature on criminal profiling. In UCL's main library, I noticed many books about criminal profiling. I wonder whether those books could inform a (future) paper titled "On profiling (not only criminals but) Web 2.0 users" ;-)
> Look at literature on statistical discrimination (previous post) and on customer profiling (mining customer data).
> Consider Tim Finin's comments:"It would be interesting to see how well various measures of the network structure around false and true profies serve as features. I think this is very similar to the problem of recognizing spam blogs (splogs). In our work, we’ve found that local features work well, but splogs can also be recognized by looking at the network structure as well."

No comments: