Friday, March 7, 2008

Boosting Topic-Based Publish-Subscribe Systems with Dynamic Clustering

Great paper.

Problem: The maintenance overhead of a topic-based pub-sub system becomes particularly dominating when the system supports a large number of topics with moderate event frequency (e.g., news syndication scene).

Fact: One can typically detect correlations between users subscriptions, which can be used to group topics and reduce the overall maintenance cost. For instance, users that are subscribed to updates for a given piece of software (say the free bitmap image editor GIMP [25]) are likely to be also subscribed to updates of other software pieces on which the given software depends (e.g. GTK+, libart and Pango [9]).

Proposal: A dynamic distributed clustering algorithm that

+ utilizes correlations between user subscriptions to dynamically group topics together, into virtual topics (called topic-clusters).

+ continuously adapts the topic-clusters and, resp., the user subscriptions, to the changing system state by employing local cluster updates. Each local update is performed only when it is estimated to be (globally) cost effective. Furthermore, to minimize the overhead involved in gain estimations, a probabilistic component is employed to guarantee that (with high probability) gain estimation are computed only for updates that are likely to be beneficial.

No comments: