Friday, March 7, 2008

Seeing Our Signals: Combining location traces and web-based models for personal discovery

Here is the paper.

By combining web applications and mobile phones, they are developing algorithms for implementing a Personal Environmental Impact Report (PEIR). PEIR is a mechanism for longitudinal documentation of both impact—what an individual does to the environment—and what the environment does to the individual, or exposure.

Eigen-Trend: Trend Analysis in the Blogosphere based on Singular Value Decompositions

Here is the paper.

Situation: The blogosphere provides free large-scale information sources from which businesses can quickly learn trends (e.g., opinions and complaints from their customers). How? For example, by keeping track of how often relevant keywords are mentioned across blogs (summing up the occurrence of keywords).

Complication: This simple way of extracting trends has tree problems:
1) Different blogs contribute to the trend differently.
2) For the same keyword, different groups of blogs may have different interests.
3) Can we directly study and extract meaningful trends from such a dynamically changing blog graph structure? The blogosphere can be considered as a blog graph where the nodes are blogs and the links reflect endorsements and interactions among blogs. In addition, such a blog graph is changing with time as a result of the development of internal relationships (e.g., interactions among blogs) and external events (e.g., breaking news).

Solution: The authors propose eigen-trends, temporal indicators derived through singular value decomposition, that take differences among individual blogs in consideration. The key idea is to represent the observed data as a combination of information that captures temporal changes of the underlying data (i.e., eigen-trends) and information that captures the characteristics of individual bloggers (e.g., authority).

Boosting Topic-Based Publish-Subscribe Systems with Dynamic Clustering

Great paper.

Problem: The maintenance overhead of a topic-based pub-sub system becomes particularly dominating when the system supports a large number of topics with moderate event frequency (e.g., news syndication scene).

Fact: One can typically detect correlations between users subscriptions, which can be used to group topics and reduce the overall maintenance cost. For instance, users that are subscribed to updates for a given piece of software (say the free bitmap image editor GIMP [25]) are likely to be also subscribed to updates of other software pieces on which the given software depends (e.g. GTK+, libart and Pango [9]).

Proposal: A dynamic distributed clustering algorithm that

+ utilizes correlations between user subscriptions to dynamically group topics together, into virtual topics (called topic-clusters).

+ continuously adapts the topic-clusters and, resp., the user subscriptions, to the changing system state by employing local cluster updates. Each local update is performed only when it is estimated to be (globally) cost effective. Furthermore, to minimize the overhead involved in gain estimations, a probabilistic component is employed to guarantee that (with high probability) gain estimation are computed only for updates that are likely to be beneficial.