[Cryptography] graph theory and sybil attack

Wed Jul 3 21:02:45 EDT 2019

The sibyl attack is a graph theory problem, not a cryptography problem, 
and its application is in distributed systems.

But distributed systems are a cryptography problem, so, if the 
moderators permit, I would like to ask about this problem here:

Yacy is an open source distributed search engine for publishing, but the 
hard problem is the sibyl attack, the problem that Google refers to a a 
link farm – a bunch of fake sites that link to each other to give each 
other fake reputation. Not clear how Yacy addresses the sibyl attack, 
and I suspect that if usage of Yacy increases, it will be subject to 
such attacks.

Similarly, systems for giving people reputation for supplying goods are 
subject to sibyl attack with fake customers. Big problem with Dark Web 
sites that attempt to do an ebay on Tor. Distributed systems tend to be 
subject to sibyl attacks.

Resolving the sibyl attack is a problem of partitioning the graph to see 
if a group of nodes well connected to each other are, considered as a 
single node, well connected to rest of the network. Page rank and 
analogous measures such as seller reputation are a measure of connection 
to the graph as a whole, and the sibyl attack is faking good connection 
to the graph as a whole by creating a seemingly large subgraph that is 
well connected to itself but poorly connected to the rest of the graph.

Considering the sibyl attack as a graph theory problem, we have some 
ranking criterion that is in some sense or some part a measure of 
connection to the graph as a whole, such as page rank, seller 
reputation, or the node's provision of services (we want to penalize 
leecher nodes and reward seeder nodes) and we want the ranking criterion 
to exclude relatively isolated large subgraphs from having 
inappropriately high rank.

Assume for example that an pseudonymous purchaser can publish a review 
that is demonstrably linked to a payment actually made (but which might 
have been made by his left hand to his right hand), and we want to give 
sellers rank on the basis of reviews.  But we want to give high impact 
to unfavorable reviews by purchasers that have issued lots of favorable 
reviews to real sellers, and to give favorable reviews impact only if 
the purchaser nym has made lots of reviews of real sellers.

This is similar to the google page rank problem.  A site is high rank if 
it has lots of links from real sites that have high rank.  So what is a 
real site?

It would be nice to be able to state this problem as a graph theoretic 
problem and to relate it to well known graph traversal algorithms.

Excluding the sibyl attack and excluding a link farm is identifying a 
subgraph that is strongly internally connected, and then ranking that 
subgraph as a single node.  We want a rank value for nodes that is 
unaffected or not strongly affected by collapsing subgraphs that are 
strongly internally connected.

This is also related to the DuckDuckGo algorithm, where as search query 
is identified as a search on a topic, and a topic is associated with a 
small, human identified and human generated, set of subsets of the whole 
network.

It is also related to the optimization of performing one's rank 
algorithm on a model of the graph that is substantially smaller than the 
actual graph, in that groups of nodes are collapsed into a single node. 
A ranking algorithm optimized in this manner is likely to inherently be 
able to deal with the sibyl attack.

A distributed search engine like Yacy gets an inherently large number of 
humans contributing groupings, but it is hard to deal with such human 
generated data.

DuckDuckGo is apt to generate politically incorrect search results, and 
making AI engage in crimestop is currently a major research issue. 
DuckDuckGo resolves this problem as part of its topic identification 
algorithm, by passing searches likely to generate politically incorrect 
search results to Bing, which is of course not part of the graph  theory 
question that I am asking.  What I am asking is how do we abstract these 
messy particular problems as graph theory problems of treating groups of 
nodes with strong internal connection as a single node.

Google does this analysis with respect to politically incorrect groups 
of nodes. I don't know how Bing does it - I suspect they rely primarily 
on humans identifying major politically incorrect nodes, and then down 
ranking nodes that link to them.

---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus