Thursday, November 12, 2009

Patterns of File-Sharing in an Enterprise: Authors, Contributors, Collectors, and Lurkers

Great work by Michael Muller of IBM

We describe Cattail, an experimental enterprise file-sharing service in IBM. Over the past several years, 17985 Cattail users have uploaded 132041 files, which have been used by 115538 users. In addition 15240 people have shared 75951 of the files with other users, and, 5444 people have created 12461 collections comprising 60476 of the files. We use this rich set of data to characterize file-sharing in the enterprise. This talk will describe Cattail, and the factors that lead to a file being of use to other people, analyzed in two timeframes: over the lifetime of a file, and within the microstructure of a user's session. We will also explore emergent roles within the file-sharing system, and we will conclude with a look at the work of lurkers in the enterprise.

Friday, February 27, 2009

Importance Algorithms by Jung

>>> BaryCenter(someGraph)
A simple node importance ranker based on the total shortest path of the node. More central nodes in a connected component will have smaller overall shortest paths, and 'peripheral' nodes on the network will have larger overall shortest paths. Runing this ranker on a graph with more than one connected component will arbitarily mix nodes from both components. For this reason you should probably run this ranker on one
component only (but that goes for all rankers).
A simple example of usage is:
BaryCenter ranker = new BaryCenter(someGraph);
ranker.evaluate();
ranker.printRankings();


>>>BetweennessCentrality(someGraph)

Computes betweenness centrality for each vertex and edge in the graph. The result is that each vertex and edge has a UserData element of type MutableDouble whose key is 'centrality.BetweennessCentrality'.
Note: Many social network researchers like to normalize the betweenness values by dividing the values by (n-1)(n-2)/2. The values given here are unnormalized.
A simple example of usage is:
BetweennessCentrality ranker = new BetweennessCentrality(someGraph);
ranker.evaluate();
ranker.printRankings();


>>> DegreeDistributionRanker(someGraph)

A simple node importance ranker based on the degree of the node. The user can specify whether s/he wants to use the indegree or the outdegree as the metric. If the graph is undirected this option is effectively ignored. So for example, if the graph is directed and the user chooses to use in-degree, nodes with the highest in-degree will be ranked highest and similarly nodes with the lowest in-degree will be ranked lowest.
A simple example of usage is:
DegreeDistributionRanker ranker = new DegreeDistributionRanker(someGraph);
ranker.evaluate();
ranker.printRankings();


>>> HITS(someGraph)
Calculates the "hubs-and-authorities" importance measures for each node in a graph. These measures are defined recursively as follows:
The *hubness* of a node is the degree to which a node links to other important authorities
The *authoritativeness* of a node is the degree to which a node is pointed to by important hubs
Note: This algorithm uses the same key as HITSWithPriors for storing rank sccores.
A simple example of usage is:
HITS ranker = new HITS(someGraph);
ranker.evaluate();
ranker.printRankings();


>>> HITSWithPriors(someGraph,0.3,rootSet)

Algorithm that extends the HITS algorithm by incorporating root nodes (priors). Whereas in HITS the importance of a node is implicitly computed relative to all nodes in the graph, now importance is computed relative to the specified root nodes.
A simple example of usage is:
HITSWithPriors ranker = new HITSWithPriors(someGraph,0.3,rootSet);
ranker.evaluate();
ranker.printRankings();


>>> KStepMarkov(someGraph,rootSet,6,null)
Algorithm variant of PageRankWithPriors that computes the importance of a node based upon taking fixed-length random walks out from the root set and then computing the stationary probability of being at each node. Specifically, it computes the relative probability that the markov chain will spend at any particular node, given that it start in the root set and ends after k steps.
A simple example of usage is:
KStepMarkov ranker = new KStepMarkov(someGraph,rootSet,6,null);
ranker.evaluate();
ranker.printRankings();


>>> MarkovCentrality


>>> PageRank(someGraph,0.15)
This algorithm measures the importance of a node in terms of the fraction of time spent at that node relative to all other nodes. This fraction is measured by first transforming the graph into a first-order Markov chain where the transition probability of going from node u to node v is equal to (1-alpha)*[1/outdegree(u)] + alpha*(1/|V|) where |V| is the # of vertices in the graph and alpha is a parameter typically set to be between 0.1 and 0.2 (according to the authors). If u has no out-edges in the original graph then 0 is used instead of 1/outdegree(v). Once the markov chain is created, the stationary probability of being at each node (state) is computed using an iterative update method that is guaranteed to converge if the markov chain is ergodic.
A simple example of usage is:
PageRank ranker = new PageRank(someGraph,0.15);
ranker.evaluate();
ranker.printRankings();


>>> PageRankWithPriors(someGraph,0.3,1,rootSet,null)
Algorithm that extends the PageRank algorithm by incorporating root nodes (priors). Whereas in PageRank the importance of a node is implicitly computed relative to all nodes in the graph now importance is computed relative to the specified root nodes.
Note: This algorithm uses the same key as PageRank for storing rank sccores
A simple example of usage is:
PageRankWithPriors ranker = new PageRankWithPriors(someGraph,0.3,1,rootSet,null);
ranker.evaluate();
ranker.printRankings();


>>> RandomWalkBetweenness(someGraph) !!! undirected

Computes betweenness centrality for each vertex in the graph. The betweenness values in this case are based on random walks, measuring the expected number of times a node is traversed by a random walk averaged over all pairs of nodes. The result is that each vertex has a UserData element of type
MutableDouble whose key is 'centrality.RandomWalkBetweennessCentrality'
A simple example of usage is:
RandomWalkBetweenness ranker = new RandomWalkBetweenness(someGraph);
ranker.evaluate();
ranker.printRankings();


>>> RandomWalkBetweenness(someGraph,someSource,someTarget)
Computes s-t betweenness centrality for each vertex in the graph. The betweenness values in this case are based on random walks, measuring the expected number of times a node is traversed by a random walk from s to t. The result is that each vertex has a UserData element of type
MutableDouble whose key is 'centrality.RandomWalkBetweennessCentrality'
A simple example of usage is:
RandomWalkSTBetweenness ranker = new RandomWalkBetweenness(someGraph,someSource,someTarget);
ranker.evaluate();
ranker.printRankings();


>>> VoltageRanker
Ranks vertices in a graph according to their 'voltage' in an approximate solution to the Kirchoff equations. This is accomplished by tying "source" vertices to specified positive voltages, "sink" vertices to 0 V, and iteratively updating the voltage of each other vertex to the (weighted) average of the voltages of its neighbors. The resultant voltages will all be in the range [0, max] where max is the largest voltage of any source vertex (in the absence of negative source voltages; see below). A few notes about this algorithm's interpretation of the graph data: Higher edge weights are interpreted as indicative of greater influence/effect than lower edge weights. Negative edge weights (and negative "source" voltages) invalidate the interpretation of the resultant values as voltages. However, this algorithm will not reject graphs with negative edge weights or source voltages.Parallel edges are equivalent to a single edge whose weight is the sum of the weights on the parallel edges. Current flows along undirected edges in both directions, but only flows along directed edges in the direction of the edge.


>>> WeightedNIPaths(someGraph,2.0,6,rootSet)

This algorithm measures the importance of nodes based upon both the number and length of disjoint paths that lead to a given node from each of the nodes in the root set. Specifically the formula for measuring the importance of a node is given by: I(t|R) = sum_i=1_|P(r,t)|_{alpha^|p_i|} where alpha is the path decay coefficient, p_i is path i and P(r,t) is a set of maximum-sized node-disjoint paths from r to t.
This algorithm uses heuristic breadth-first search to try and find the maximum-sized set of node-disjoint paths between two nodes. As such, it is not guaranteed to give exact answers.
A simple example of usage is:
WeightedNIPaths ranker = new WeightedNIPaths(someGraph,2.0,6,rootSet);
ranker.evaluate();
ranker.printRankings();


Tuesday, February 17, 2009

Napoleon Dynamite Effect

they can't seem to quite get to this 10% mark because of the effect of a the ratings on a certain subset of indie movies, the flash point of which is "Napoleon Dynamite."

The other polarizing indie-minded gems the programmers have noticed this same glitch with are: "Fahrenheit 9/11," "I Heart Huckabees," "Lost in Translation," "Kill Bill: Volume 1" (note, not volume 2!), "Sideways," and "The Life Aquatic with Steve Zissou."

here

Surviving the exaflood

From The Economist: The internet: Predictions that an “exaflood” of traffic will overload the internet have been doing the rounds. But will it really happen?

Some notes on collaboration

here

Wednesday, January 21, 2009

Less is More

The Economist

Constant improvements mean that more features can be added to these products each year without increasing the price.

But now things are changing, partly because the industry is maturing, and partly because of the recession. Suddenly there is much more interest in products that apply the flip side of Moore’s law: instead of providing ever-increasing performance at a particular price, they provide a particular level of performance at an ever-lower price.

People do good to look good. Plus, don't pay them!

The Economist

Question: what motivates people to give money to charities or donate blood, acts which are costly to the doer and primarily benefit others.

Answer: people do good in part because it makes them look good to those whose opinions they care about. Economists call this image motivation. Dan Ariely, Anat Bracha & Stephan Meier tested the importance of image motivation. if people do good to look good, introducing monetary or other rewards into the mix might be counterproductive (An observer who sees someone getting paid for donating blood, for example, would find it hard to differentiate between the donor’s intrinsic goodness and his greed).

Conclusion (for charities):
Suppose, for example, that rewards were used to encourage people to support a certain cause with a minimum donation. If that cause then publicised those who were generous well beyond the minimum required of them, it would show that they were not just "in it for the money".

Weird (research)parallel: how to draw out more participation to human experiments by exploiting image motivation