Friday, December 21, 2007

Strong Social Networks

A well-written article (pdf) reports on the work by Oxford folks (previous post). Plus, it says that a "pair of Oxford physicists, Neil Johnson and Sean Gourley, have teamed up with social scientists at the Conflict Analysis Resource Center (CERAC), based in Bogotá"

"When the researchers graphed all the attacks within a given conflict, with the number of attacks plotted against the number killed in each, it produces a fat-tailed exponential curve. And the exponent of the function, which determines the curve’s shape, is nearly always the same. “Terrorism and guerrilla warfare everywhere in the world has a signature of about 2.5,” says Gourley. Plotting the distribution of these events over time produces another, distinctive signature.

Thursday, December 20, 2007

At 71, physics professor is a Web star

Professor Walter H. G. Lewin "delivers his lectures [owc] with the panache of Julia Child bringing French cooking to amateurs and the zany theatricality of YouTube's greatest hits. He is part of a new generation of academic stars who hold forth in cyberspace on their college Web sites and even, without charge, on iTunes U, which went up in May on Apple's iTunes Store.

In his lectures at ocw.mit.edu, Professor Lewin beats a student with cat fur to demonstrate electrostatics. Wearing shorts, sandals with socks and a pith helmet — nerd safari garb — he fires a cannon loaded with a golf ball at a stuffed monkey wearing a bulletproof vest to demonstrate the trajectories of objects in free fall.

He rides a fire-extinguisher-propelled tricycle across his classroom to show how a rocket lifts off." Source

The stars this week included Hubert Dreyfus, a philosophy professor at the University of California, Berkeley, and Leonard Susskind, a professor of quantum mechanics at Stanford.

Last week, Yale put some of its most popular undergraduate courses and professors online free. The list includes Controversies in Astrophysics with Charles Bailyn, Modern Poetry with Langdon Hammer and Introduction to the Old Testament with Christine Hayes.

Wednesday, December 19, 2007

The weakness of weak ties

This is a very interesting paper: Structure and tie strengths in mobile communication networks. It studies the communication patterns of millions of mobile phone users by arranging them in a big weighted social network. The weight between two individuals corresponds to the aggregated duration of calls between them.

Findings:
"Weak ties appear to be crucial for maintaining the network’s structural integrity, but strong ties play an important role in maintaining local communities. Both weak and strong ties are ineffective, however, when it comes to information transfer, given that most news in the real simulations reaches an individual for the first time through ties of intermediate strength." ..."The speed of spread then depended on the strength of each link. The results suggest that information spreads most quickly via links of intermediate strength, or medium length calls. This is because information spreads slowly through weaker links, or shorter calls, and stronger links tend to bind only a limited number of people."

Consequence:
To enhance the spreading of information, one needs to intentionally force it through the intermediate- to weak-strenght ties (while avoiding hubs!)

Tuesday, December 18, 2007

You, poor? Out!

Switzerland welcomes rich people with its banks and deports poor ones with this new initiative:

Switzerland has started a television campaign in African countries to keep potential illegal migrants from trying to immigrate to Switzerland.

Monday, December 17, 2007

Robustness of community structure in networks

"The discovery of community structure is a common challenge in the analysis of network data. Many methods have been proposed for finding community structure, but few have been proposed for determining whether the structure found is statistically significant or whether, conversely, it could have arisen purely as a result of chance. In this paper we show that the significance of community structure can be effectively quantified by measuring its robustness to small perturbations in network structure. We propose a suitable method for perturbing networks and a measure of the resulting change in community structure and use them to assess the significance of community structure in a variety of networks, both real and computer generated." Source

Does your data follow a power-law distribution?

Power-law distributions in empirical data (htm)

Science paper (pdf)

How brain catalogs info. PageRank-style?

"Human Brain Cloud: a multiplayer word association game that started with a single word ("volcano") and has since taken on a life of its own. Players are given a word, which is culled from the database of previously entered words, and asked to enter the first thing that comes to mind. As people interact with the game it collects data about word associations that can be formed into a giant network (the cloud)."

"Researchers at the University of California recently conducted a study in which they found evidence to suggest that our brains catalog and rate the relevance of information by forming connections between data. The researchers compared the brain's system to Google's PageRank algorithm"Source

" The investigators found that a word’s “Page­Rank” was a good pre­dic­tor of how of­ten it would show up when peo­ple were asked to think of words that start with A, with B, and so on.

When it came pre­dict­ing these re­sults, “Page­Rank” beat two oth­er seem­ingly rea­son­a­ble rank­ing sys­tems: tal­lies of how of­ten words show up in or­di­nary writ­ing; and a sim­ple count of di­rect “links” to a word that does­n’t con­sid­er how many words, in turn, link to those link­ing words.

In the PageR­ank for­mu­la, a page gains “im­por­tance” based on how many oth­er pages link to it. But links from pages that are them­selves “im­por­tant,” con­fer more im­por­tance than those that aren’t. Thus, im­por­tance can be thought of as flow­ing through the Web’s link net­work to­ward the most highly “linked-in” sites.

One ex­plana­t­ion for the new find­ings, wrote Grif­fiths and col­leagues, could be that con­nec­tions among brain cells work si­m­i­larly to Web links. Cells that are tar­gets of many con­nec­tions might be­come more ac­tive than oth­ers, in the same way that highly linked-in web­sites are deemed more im­por­tant." Source

Sunday, December 16, 2007

Recommenders Everywhere - WikiLens

Here is the talk. "Suppose you have a passion for items of a certain type, and you wish to start a recommender system around those items. You want a system like Amazon or Epinions, but for cookie recipes, local theater, or microbrew beer. How can you set up your recommender system without assembling complicated algorithms, large software infrastructure, a large community of contributors, or even a full catalog of items?

WikiLens is open source software that enables anyone, anywhere to start a community-maintained recommender around any type of item. We introduce five principles for community-maintained recommenders that address the two
key issues: (1) community contribution of items and associated information; and (2) finding items of interest. Since all recommender communities start small, we look at feasibility and utility in the small world, one with few users, few items, few ratings. We describe the features of WikiLens, which are based on our principles, and give lessons learned from two years of experience running
wikilens.org."

A mobile phone that can buy clothes for you


"A revolutionary new mobile phone will soon be able to let shoppers snatch a photo of clothes they want before ordering them online. Nokia is currently developing the device which lets you buy clothes, furniture or holidays in the High Street — without going into a store. Buyers can now avoid queuing at the checkout and even buy clothes simlpy by looking through a shop window and taking a photo of the window display. The phone then uses image recognition software to find the same object on the Internet." Source.

Monday, December 10, 2007

What's on CIO wish lists?

Number 5: Collaboration Technologies "Web 2.0 and social networking may be becoming candidates for the mainstream, although some CIOs have their reservations. Bob Worrall, for example, CIO of Sun Microsystems, reckons to have talked to well over 100 of his contemporaries over the past year and believes that social networking represents a new threat. There is a lot of information out there on blogs and wiki, but there is no easy way to harvest that information and make it available to the organisation” he says. Sun, however, has created a virtual Californian building in cyberspace and is experimenting with its use as a meeting place for remote staff... RM, the supplier of IT to UK schools, places collaboration and mobility at the top of the list... One of the biggest challenges is to evaluate Web 2.0 opportunities and select those which will add real value to the business" FT

Eight business technology trends to watch
(McKinsey)

The Natural Pattern Behind our Votes

From 30 years of elections around the world: "The most important factor determining a candidate’s success compared with his rivals in the same party turns out to be his or her personal ability to connect with the public."

How opinions form?

Person-to-person process is enough to explain the data! "In their model, they supposed that each candidate starts out trying to convince others to vote in their favour. Those he or she convinces, then try to convince others. These influences percolate through the scoial net until everyone has made a decision."
Consequence
Candidates should focus on WHO they contact - influential people may easily convince others.
More on this pdf.

Sunday, December 9, 2007

Lightweight Distributed Trust Propagation

I just finished to present our work at ICDM. Here are the slides (also in ppt).





SlideShare | View | Upload your own

Trust Bootstrapping: TRULLO @ Mobiquitous

At Mobiquitous 2007, we presented TRULLO. Here are the slides (some animations do not work properly in slideshare, sorry). A brief description follows.



Situation: Using mobile devices, such as smart phones, people may create and distribute different types of digital content (e.g., photos, videos). One of the problems is that digital content, being easy to create and replicate, may likely swamp users rather than informing them. To avoid that, users may run trust models on their mobile devices. A trust model is a piece of software that keeps track of who provides quality content and who does not.

Problem: Devices should be able to set their initial trust for other devices. One way of doing so is for devices to learn from their own past experiences. To see how, consider the following quotes about human trust: ``We may initially trust or not trust those involved on our projects based on past experience'', and ``If your boyfriend is unfaithful, you won't initially trust the next man you date'' :-) Algorithms that model human trust on pervasive devices, one might say, ought to do the same thing - they should assign their initial trust upon `similar' past experiences.

Existing Solutions: Existing solutions usually require an ontology upon which they decide which past experiences are similar, and, in so doing, they require both that the same ontology is shared by all users (which is hardly the case in reality) and that users agree on that ontology for good (ie, the ontology is not supposed to change over time) :-(

Proposal: TRULLO gathers ratings of past experiences in a matrix, learns staticial "features" from that matrix, and combines those features to set initial trust values. It works quite well in a simulated antique market and its implementation is reasonably fast on a Nokia mobile phone.

Future: TRULLO does not work if one does not have past experiences. That is why we will propose a distributed trust propadation algorithm (pdf)

Monday, November 26, 2007

The wireless epidemic

The wireless epidemic by Jon Kleinberg

At one end are network models that reflect strong spatial effects, with nodes at fixed positions in two dimensions, each connected to a small number of other nodes a short distance away [9]. At the other end are ‘scale-free’ networks, which are essentially unconstrained by physical proximity, and in which the number of contacts per node are widely spread [14]. Models based on human travel data occupy an intermediate position in this spectrum of spatial constraints. The different network structures lead in turn to qualitative differences in the way epidemics spread: whereas epidemics can persist at arbitrarily low levels of virulence in scale-free networks[14,15], epidemics in simple two-dimensional models need a minimum level of virulence to prevent
them from dying out quickly [9].

Bluetooth ...is disrupting this dichotomy by making possible computer-virus outbreaks whose progress closely tracks human mobility patterns. These types of wireless worm are designed to infect mobile devices such as cell phones, and then to continuously scan for other devices within a few tens of metres or less, looking for new targets. A computer virus thus becomes something you catch not necessarily from a compromised computer halfway around the world, but possibly from the person sitting next to you on a bus, or at a nearby table in a restaurant.

9. Durrett, R. SIAM Rev. 41, 677–718 (1999).
14. Pastor-Satorras, R. & Vespignani, A. Phys. Rev. Lett. 86, 3200–3203 (2000).
15. Berger, N., Borgs, C., Chayes, J. T. & Saberi, A. I. Proc. 16th ACM Symp. Discr. Algor. 301–310 (ACM, New York, 2005).

The impact of social structure on economic outcomes

Some extracts from The impact of social structure on economic outcomes.

4 core principles:

1) Norms and Network Density. ... the denser a network, the more unique paths along which information, ideas and influence can travel between any two nodes. Thus, greater density makes ideas about proper behavior more likely to be encountered repeatedly, discussed and fixed; it also renders deviance from resulting norms harder to hide and, thus, more likely to be punished. ... larger groups will have lower network density because people have cognitive, emotional, spatial and temporal limits on how many social ties they can sustain.

2) The Strength of Weak Ties. More novel information flows to individuals through weak than through strong ties. Because our close friends tend to move in the same circles that we do, the information they receive overlaps considerably with what we already know. ...This is so even though close friends may be more interested than acquaintances in helping us; social structure can dominate motivation. This is one aspect of what I have called “the strength of weak ties” (Granovetter, 1973, 1983). ... if cliques are connected to one another, it is mainly by weak ties. This implies that such ties determine the extent of information diffusion in large-scale social structures. One outcome is that in scientific fields, new information and ideas are more efficiently diffused through weak ties.

3) The Importance of “Structural Holes.” Burt (1992) extended and reformulated the “weak ties” argument by emphasizing that ... the strategic advantage that may be enjoyed by individuals with ties into multiple networks that are largely separated from one another. Insofar as they constitute the only route through which information or other resources may flow from one network sector to another, they can be said to exploit “structural holes” in the network. ... One reason resources may be unconnected is that they reside in separated networks of individuals or transactions. Thus, the actor who sits astride structural holes in networks (as described in Burt, 1992) is well placed to innovate.

Prospective employers and employees prefer to learn about one another from personal sources whose information they trust. This is an example of what has been called “social capital” (Lin, 2001). ... for goods where assessment is difficult, such as used cars, legal advice and home repairs, one-quarter to one-half of purchases in the United States are made through personal networks.

Studies of peasant markets often suggest that “clientelization,” defined as dealing exclusively with known buyers and sellers, raises prices above their competitive level

Social relations are also closely linked to productivity. Economic models attribute productivity to personal traits, modifiable by learning. But one’s position in a social group can also be a central influence on productivity, for several reasons. One is that many tasks cannot be accomplished without serious cooperation from
others; another is that many tasks are too complex and subtle to be done “by the book” (which is why the “rulebook slowdown” is a potent labor weapon) and require the exercise of “tacit knowledge” appropriable only through interaction with knowledgeable others.

“loyalty systems”—attempts to elicit cooperation from workers deriving not only from incentives but also from identification with the firm or with some set of individuals that encourages high standards and productivity.

Your personal data? If you can't take it out, don't put it in

Many companies are going open. Are they? Take OpenSocial (APIs by Google), Open Handset Alliance (alliance of 34 companies led by Google), and Open Media (an initiative announced by Bebo). What is common to all three initiatives, apart from the use of the word "open", is that none is directly aimed at benefiting the user (here).

Google is clearly "not responding to consumer needs. The applications it has
demonstrated using Android are readily available on existing phones and operating systems. Users are not crying out for yet another interface for their phones".

Tim O'Reilly: "We don't want to have the same application on multiple social networks, we want applications that can use data from multiple social networks".


In another FT's article:
"the technology industry has little financial incentive to reduce switching costs. While users are free to switch from one service to another at any time, the critical question is: can they take their data with them? Can they take their photos, their videos, their e-mails? And how easy is that? Data are often stored in proprietary file formats, which are protected by patents, and those are controlled by software and service vendors."

"Which raises the question: Do you actually own your own data?" The answer is unfortunately a qualified no! A very interesting research direction, right?

Sunday, November 25, 2007

Cold Reading, Statistical Discrimination and Initial Trust

"Cold reading is a technique used to convince another person that the reader knows much more about a subject than they actually do. Even without prior knowledge of a person, a practiced cold reader can still quickly obtain a great deal of information about the subject by carefully analyzing the person's body language, clothing or fashion, hairstyle, gender, sexual orientation, religion, race orethnicity, level of education, manner of speech, place of origin, etc. This technique is also called offender profiling. Cold readers commonly employ high probability guesses about the subject, quickly picking up on signals from their subjects as to whether their guesses are in the right direction or not, and then emphasizing and reinforcing any chance connections the subjects acknowledge while quickly moving on from missed guesses".

This definition of cold reading reminded me of Posner's (harsh) review of Blink - Blinkered (html). There are two points from Posner's essay that may suggest how person A may set her initial trust in B. The first point may suggest that A does so based not only on B's behavior but also on the (social) group(s) to which B belongs. The second point may remind us that: asking for recomendations about B may be costly; and that Bayesian reasoning may help in rationally deciding whether to trust. Here are the two (by now-coveted) points:

(1) "If two groups happen to differ on average, even though there is considerable overlap between the groups, it may be sensible to ascribe the group's average characteristics to each member of the group, even though one knows that many members deviate from the average. An individual's characteristics may be difficult to determine in a brief encounter, and a salesman cannot afford to waste his time in a protracted one, and so he may quote a high price to every black shopper even though he knows that some blacks are just as shrewd and experienced car shoppers as the average white, or more so. Economists use the term "statistical discrimination" to describe this behavior. It is a better label than stereotyping for what is going on in the auto-dealer case, because it is more precise and lacks the distracting negative connotation of stereotype, defined by Gladwell as "a rigid and unyielding system." But is it? Think of how stereotypes of professional women, Asians, and homosexuals have changed in recent years. Statistical discrimination erodes as the average characteristics of different groups converge."

(2) " Such pratfalls, together with the inaptness of the stories that constitute the entirety of the book, make me wonder how far Gladwell has actually delved into the literatures that bear on his subject, which is not a new one. These include a philosophical literature illustrated by the work of Michael Polanyi on tacit knowledge and on "know how" versus "know that"; a psychological literature on cognitive capabilities and distortions; a literature in both philosophy and psychology that explores the cognitive role of the emotions; a literature in evolutionary biology that relates some of these distortions to conditions in the "ancestral environment" (the environment in which the human brain reached approximately its current level of development); a psychiatric litetature on autism and other cognitive disturbances; an economic literature on the costs of acquiring and absorbing information; a literature at the intersection of philosophy, statistics, and economics that explores the rationality of basing decisions on subjective estimates of probability (Bayes's Theorem); and a literature in neuroscience that relates cognitive and emotional states to specific parts of and neuronal activities in the brain. "

Friday, November 23, 2007

Is Britney Spears Spam?

From my old blog (21st August 2007):
In the last post, I was raving on about ..., uhm, probably about trust bootstrapping, right? :-) I went from a definition of cold reading to a very personal interpretation of Posner's review of Blink. Now, in the same vein (i.e., keeping on being delirious), I move on this nice paper, which carries out (in a way) not offender profiling but MySpace user profiling.
Title: Is Britney Spears Spam? (pdf)

Problem: In social network websites (e.g., MySpace), to decide whether to accept invitations to connect, users manually examine the senders' profiles. However that may be time consuming!

Existing Solutions: One may automate the acceptance of invitations by having users running trust propagation algorithms.

Complication: The authors write that using current trust propagation algorithms may be less than desirable since trust both decays with the number of hops and is usually one-dimensional.

Proposal: Use machine learning techniques to classify user profiles. The classification describes a profile across two dimensions: sociability and promotin. Based on these dimensions' values for a profile, users then decide whether
to accept the invitation of that profile's user. To come up with a dataset on which to evaluate their algorithm, the authors randomly select and rate by hand MySpace users.
Future: I would:
> Apply a new trust propagation algorithm (pdf) to avoid trust decay and apply TRULLO (pdf) to handle multi-dimensional trust.
> Look at literature on criminal profiling. In UCL's main library, I noticed many books about criminal profiling. I wonder whether those books could inform a (future) paper titled "On profiling (not only criminals but) Web 2.0 users" ;-)
> Look at literature on statistical discrimination (previous post) and on customer profiling (mining customer data).
> Consider Tim Finin's comments:"It would be interesting to see how well various measures of the network structure around false and true profies serve as features. I think this is very similar to the problem of recognizing spam blogs (splogs). In our work, we’ve found that local features work well, but splogs can also be recognized by looking at the network structure as well."

Thursday, November 22, 2007

Efficient and Decentralized PageRank Approximation

I read this very well-written paper. The authors set out to design a way to compute (an approximated) pagerank in a distributed and efficient way.

"Starting with the local graph G of a peer, the peer first extends G by adding a special node W, called world node since its role is to represent all pages in the network that do not belong to G. An initial JXP score for local pages and the world node is obtained by running the PR algorithm in the extended local graph G' = G+W.
...
we take all the links from local pages to external pages and make them point to the world node. ... as the peer learns about external links that point to one of the local pages, we assign these links to the world node."


Optimized Merging

"At a peer meeting, instead of merging the graphs and world nodes, we could simply add relevant information received from the other peer into the local world node, and perform the PR computation on the extended local graph and still the JXP scores converge to the global PR scores."

Fortune companies don't blog

"Just 6 percent of the Fortune 500 companies have one [blog], according to Socialtext's Fortune 500 Blogging Wiki.

Why is this? If blogging is still not widespread among companies, it is basically because they are afraid to lose control of their messages , they fear the transparency effect and are not altogether convinced of the legal limits of this medium."

http://www.iese.edu/en/files/6_34173.pdf

Monday, November 19, 2007

How innovation happens in Silicon Valley

NESTA - How innovation happens in Silicon Valley, London 20/11/07
George Osborne MP will deliver a keynote address. Reid Hoffman, Megan Smith and Javes Slavet will take part in an interctive panel discussion to consider why the Silicon Valley has been so successful, and what lessons can be learned for the UK.

Sunday, November 18, 2007

The death of mass advertising?

Facebook Tries 'Social Advertising'. ..."a Facebook user who rents a movie on Blockbuster.com will be asked if he would like to have his movie choice broadcast out to all his friends on Facebook. And those friends would have no choice but to receive that movie message, along with an ad from Blockbuster."

MySpace reveals 'targeted' ads - "a pilot scheme that allows it to sell advertisements targeted to the individual tastes and interests of its millions of users...[It] will give advertisers the ability to drill down into 100 different user segments. This will allow them to differentiate between fans of romantic comedy films and action films, for example."