The frequency with which users choose particular links in a hypertext network makes it possible to let the network learn from the implicit semantic knowledge of its users, and to reorganize itself in order to better fulfill their expectations. We propose a restructuring algorithm based on the
following ideas and processes:
1) the frequency of transition from one node to another
indicates the strength of the semantic relation between the two nodes.
Although this does not hold for some nodes like the home page and indexes,
it is a very plausible rule of thumb for most nodes. In fact, the rule
applies only to those links that connect nodes which represent concepts:
"conceptual nodes". Nodes that provide an oversight of available nodes or
bundle several concepts cannot be regarded as "conceptual".
Every link between two conceptual nodes is as such assigned a frequency
value that indicates the strength of the semantic relation between the two
nodes or concepts it connects. This frequency is measured during a certain
period of time and will then be used to modify the network's structure.
2) Transitivity: consider a certain node A that is connected to a given
node B and assume that the connection between node A and B is strong, i.e.
many people use the link A->B. Then, without loading your imagination too
much, imagine a node C for which a strong connection B->C exists. Then the
transitivity rule implies that a new link A->C should be constructed.
What happens can be described as the network facilitating access to a
certain node from another by bypassing two links that were otherwise
necessary to reach that node. It is, in fact, expected that this rule will
lead to what we describe as the dripping-effect in analogy with what
happens to drops of water that glide downwards a window. A drop of rain
starts to glide downwards and takes other drops along on its way down
forming a small channel of water. From the moment a new connection between
A and C exists, node C is more likely to be consulted and so are nodes that
receive links from C. These links are thus more likely to be used more
often and after some time it might turn out to be necessary to replace the
links A->C, C->X with A->X and so on.
After a while all related concepts should be connected by this simple rule.
3) Degradation of already existing links: links whose frequency values
indicate a weak semantic relation between two nodes or concepts should be
removed from the network. This can happen to links which were constructed
by the transitivity rule for the wrong reasons or already existing links
that are obsolete. Rule three cannot eliminate the last link to or from a
node so that no node will ever be disconnected from the web.
4) Noise: to ensure the web's "creativity" random links are constructed now
and then. This has obvious advantages: it avoids the network settling into
a state which it can't get out of and it creates unexpected but perhaps
useful connections. In the worst of all worlds it won't do any harm because
the degradation of weak links assures these links to disappear after a
short while.
What do we expect from such a self- restructuring network?
Firstly, we expect the network to eventually assimilate the common
semantics of its users by constantly restructuring itself. The problem here
is that the restructuring algorithm works only within the web itself. We
are still figuring out how new information should be integrated. On the
other hand, once a certain concept is connected to the network at any
position, it will eventually be semantically integrated if people need the
information and retrieve it often enough. Perhaps the newly developed
forms-software will enable us to integrate this feature.
Secondly, we expect the acquired semantic structure of the network to
enhance retrieval of information from the network for human browsers as
well as automated search algorithms.
Human browsers will find a network structured like they themselves have
used it and will probably retrieve information faster and with greater
ease. This is a presumption that can be verified. Some research has already
been done on the subject of the advantage of semantically structured
hypertext networks on other in terms of retrieval times (see "Hypertext, a
psychological perspective" ed. C. Mc Knight, A. Dillon & J. Richardson,
pub. Ellis Horwood).
Automated retrieval of information in a semantically organised hypertext
network could be achieved with the principle of spreading activation. One
could, for example, use the network in terms of their links and associated
frequencies as a connectionist network in which after activation of certain
concepts, the activations for other connected concepts are calculated as
the sum of the products of the activation of neighbouring concepts and the
values of their links to that particular concept. Concepts or nodes with an
above threshold activation could pop up and be served to the searcher. This
kind of search-algorithm would not only provide what one is searching but
could provide unsolicited and related material.
Thirdly, but this is very hypothetical, if we can assume that the final
structure of the network resembles that of its users, we could use the
structure of the network as a tool for idea-identification. We could derive
the semantic structure of a certain piece of knowledge or idea and compare
it to the network to see where it fits in. Variations of meaning could be
identified by comparing links and nodes of an idea to similar nodes and
their links in the network. This also assumes that our network covers the
knowledge and concepts expressed in these ideas. You could for example not
compare the structure of an idea in the field of agriculture to the
structure of our semantically formed Principia Cybernetica network.
In a first stage, we are planning to construct a test-network that links
100-200 English nouns. Initially all links would be randomised, then we
would start the network and let people log in on it. After a while,
connections in the network should get re-routed and the noun network should
organise itself so that semantically related nouns are indeed linked in the
network. Then we could use the network to investigate how it performs in
browsing and spreading activation search tasks.
If this proves to be successful, we might in a second stage consider
porting the system to a network that does not consist of nouns but of nodes
of text and bundled meanings.
In a third and final stage we could perhaps implement the system on the
PCP-web and see what happens there. This, of course depends on our
successes in the previous experiments.