|
|
The Future of Cyberspace
The Webspace and the Noosphere[1]
You may find 30,136 pages dealing with “noosphere” in
Altavista at 2.22 PM Eastern Time for USA and Canada on Thursday 12th
of April 2001. This is a rather strange word for many people that did not
deserve an entry in the Merriam Webster online dictionary yet. However we know,
use and enjoy the Cyberspace, concept that at nearly the same time deserves as
many as 777,290 entries in the same Altavista, but on the contrary it has an entry
in Merriam Webster since 1986, with the following meaning: the on-line world of computer networks. Webspace is another
neologism not yet included in that dictionary but deserves 485,805 entries in
Altavista.
The Webspace growths at a fantastic
pace holding today nearly half a billion of documents, ranging from Virtual
Libraries and virtual reference e-books dealing with the Major Subjects of the
human knowledge through ephemeral news and trivial virtual flyers generated “on
the fly” at any moment continuously. We may find in the Web documents belonging
to any of the three Internet major resources or categories: Information,
Knowledge and Entertainment.

In the above figure the black crown
represents the Webspace and the green circle the users. The gray crown
represents an intermediate net to be built in the near future with intelligent
resumes of the Human Knowledge, pointing to the basic documents and e-books of
it. One user is shown extracting a “cone” of what he/she needs in terms of
information and knowledge. The intelligent resumes must be engineered in order
to be good enough as introductory guides/tutorials with a set of essential
hyperlinks inside. If the user wants more detail goes then directly to the
right sources within the black region. Depending of the Major Subject dealt
with the user may go from resume to resume or jumping to higher level guides
inside the gray region going to the black region only to look for specific
themes.
Another user goes directly to the black
region guided by aid of classical search engines as now. The black region will
be always necessary and will grow fast in volume as time passes by. On the
contrary the gray region will fluctuate around a medium volume growing at a
relatively very low rhythm. Effectively, the Human Knowledge is almost bound,
changing its content but always around the same set of Major Subjects. The
growth of the gray region is extremely low in comparison to the black region.
Some Major Subjects die and some others are born but slowly.
As a science fiction exercise we
invite you to make some calculations resembling some Isaac Asimov stories.
Being the actual Human Knowledge bound to let’s say 250 Major Subjects or
Disciplines and if for each of them we define a Virtual Library with non
redundant 2,000 e-books, in the average, we will have a volume of 500,000
e-books. Now we could design a methodology to synthesize an intelligent resume
for each e-book in no more than 2,000 characters, in the average, totaling
1,000 MB ó 1 GB storing one character in one single byte. That would
be the volume of the gray region!, not too much really!.
Let’s then compare this volume to
the volume of the black region and to the volume of the resources of the Human
Knowledge. Once upon a time, there were a Webspace with half a billion
documents with an average volume estimated in 2.5 MB (we have documents ranging
from 10KB and less to 100MB and more: to get that figure we supposed the
following arbitrary size series 1, 10, 100, 1,000, 10,000, 100,000 in KB and we
assigned to each term the following arbitrary weights: .64, .32, .16, .08,
.004, .002 respectively). Then we have a volume of nearly 1250, 000,000 MB!.
Within that giant space float disperse the basic e-books, the resources of the
Human Knowledge with an estimated volume of nearly 500,000 MB assigning 1MB to
each one, half a million of text and 100 images of 5KB in the average.
Incredible result that demonstrates
how easy will be able to compile a rather stable HKIS, Human Knowledge
intelligent Summary in relation to the unstable, noisy, bubbling, fizzy and
always growing black region. Once the effort is done the upgrade will be
facilitated via Expert Systems that will take from the black region out only
the changes.
In the figure above we depict the
actual Webspace in black, resembling the physical space of the Universe. No
doubt the information we need as users is up there but where?. That virtual
space is really almost black for us. Some members of the Cynerspace that
provide searching services titled as Search Engines and/or Web World Wide
Directories are like stars that irradiate light all over the space to make
sites indirectly visible. Sometimes we may find quite a few sites with their
own light, like stars, activated by publicity in conventional media but the
rest is only illuminated by those services at request. Let’s go deepen a little
about the nature of this singular Webspace searching process.
For each resource (body) located in
the space in an URL, which stands for Uniform Resource Locator, robots of those
lighting services prepare a brief summary with some information extracted from it,
no more than a paragraph and then all the information collected goes to the
services’ databases. The summaries have attached to them some keywords
extracted from the resources visited and consequently are indexed in as many
keywords as they have attached.
The actual robots are very “clever”
but extremely primitive compared to human beings. They are doing their best and
they have to perform their work fast in fractions of millisecond per resource
as well so it would be unpractical being more sophisticated because the time of
“evaluation” grows exponentially with the level of cleverness. To facilitate
the robots work the Website programmers and developers have at hand wise tools
but many of them overuse those facilities so badly to make them unwise. In fact
with those tools the programmers could communicate to the robots some essential
information the site owners wish to be known about the site.
Those wise gateways are now noisy
because most people try to deceive the robots overselling what should be the
essential information. Why do they that?. Because the Search Engines must
present the sites listed hierarchically, the first the best!. It occurs
something like in the Classified Section of the newspapers: the people wishing
to be listed first unethically make nonsense use of the first letter of the
alphabet: AAAAAAA Home Services go first that for instance AA Home Services.
The Search Engines do not have too much room to design a “fair” methodology to
rank the sites with equity.
One trivial criterion should be to
count how many times a keyword is cited within the resource but that proved to
be misleading because the robots only browse the resource partially being
practically impossible to differentiate a sound academic treatise from a
student homework concerning the same subject. To make the things worse,
programmers, developers, and content experts know all those tricks and
consequently they make overuse of the keywords they believe are significant.
The Search Engines have improved
too much along the last two years but the searching process continues being
highly inefficient and tends to collapse. To help site owners to gain positions
within the lists (in fact to get more light) proliferate ethical and unethical
techniques and programs most of them apt to deceive the “enemy”, namely the
Search Engines. Even in a ‘Bona Fide” utopia it’s impossible for a robot to
differentiate between a complex site and a humble site dealing with the same
subject. Complex sites architectures could even make the sites invisible for
them because they are only well suited to evaluate flat and simple sites.
We emphasize again the fact that
the “light” that a Search Engine provides to each URL is indirect like the Moon
reflects the Sun’s light. Then our conclusion is that most of the information
and the knowledge is hidden in the darkness of the Cyberspace.

Now that we know the meaning of the
HK Human Knowledge we may define HKIS, the Human Knowledge Intelligent
Summaries, a set of summaries that we have to explain soon why do we title them
as intelligent, and NHKIS, for a Network of Human Knowledge Intelligent
Summaries that correspond to the gray crown of the above figures. Now we are
going to enter into the problem of the languages and jargons spoken in the
Black Region, in the Gray Region and mainly in the Green Region.
The Website are built to match
users, are like lighthouses in the darkness, to broadcast information,
knowledge and in the case of e-Commerce some kind of information we could title
as “opportunities”. What really happens is that at present Internet is more the
Realm of Mismatch than of Matching. The lighthouses owners cannot find the
users and the users neither cannot find the alleged opportunities nor
understand the messages. This mismatching scenario is dramatic in the case of
Portals, huge lighthouses created to attract as many people as possible via
general interest “attractions”.
Something similar occurs with the
databases where are stored millions of units of supposedly useful information
such as catalogs, services, manufacturers, professionals, jobs opportunities,
commercial firms, etc: users could not find what they need. When we are talking
of mismatch we mean figures well over 95% and in some databases less than 0,1%.
In the figure above we depicted
this dramatic mismatch. The yellow point is a Website with its offer
represented by the cone emerging from it, let’ say the Offer expressed in its
language and in its particular jargon. A point black within the green circle
represents a user and the cone emerging out from it his/her Demand expressed
also in his/her language and particular jargon.
What we discovered is that both
sides speak approximately the same language but by sure different jargons and
more than that, they think different!. We have depicted the gray crown because
the portion corresponding to its Major Subject virtually exists: that’s the
portion in dark gray within its cone.
They have the “truth” expressed in its particular jargon, and sometimes
the “official” and standard jargon. If the Website were for instance a
“Vertical” of the Chemical Industry, of course its jargon will then be within
the Chemical Industry Standards and its menu should be expressed technically
correct, resembling the Index of a Manual for that particular Major Subject:
Chemical Industry.
So our conclusion of a research
done along two years studying the mismatch causes was that the lighthouses
speak -or intend to- official jargons, certified by the establishment of their
particular Major Subjects. They are supposed having the truth and they think as
“teachers”, expressing their truth in their menus that are in fact “logical
trees”. They may allege to be e-books and they behave, think, and look, pretty
much the same as physical books.
Now let’s analyze how the users act,
express and behave. If one user meets the site to learn, the cones convergence
is obliged, the user thinks in terms of concepts of the menu that for him/her
resembles a program of study, and we have a match scenario. If the user meets
the site to search something, that’s different. When one goes to search
something one tends to think in keywords terms instead, keywords that belong to
our own jargon and at large in our own Thesaurus. So, either by ignorance or on
the contrary, being an expert, the users’ cones diverge substantially from the
site’s cone. One of the main reasons of this divergence is that the site owners
ignore what their market target needs. Many of them are migrating from
conventional businesses to e-Commerce approaches and extrapolate their market
know-how as it is. They were working hard along decades to match their markets
and to establish agreed jargons and now they have to face unknown users coming
virtually from all over the world.
Evidently the solution will be the
evolution from mismatch to match in the most efficient way. To accomplish that,
both the Offer and the Demand, have to approximate each other until both share
a win-win scenario and a common jargon.

In the figure above we depict a mismatch condition where
we might distinguish three zones: the red zone represents the idle and or
useless Knowledge; the gray zone corresponds to the common section with an
agreed Thesaurus concordance; and the blue zone corresponds to what the users
needs, wants, and apparently does not exist within the site. So the site owners
and administrators have two lines of action: a) reduce to zero the red zones,
for instance adapting and/or eliminating supposed “attractions” and b) learn as
much as possible about the blue zone.
At this moment the dark green zones are extremely tiny,
less than 5% being Internet the Realm of Mismatch between Users’ Demand and
Sites’ Offer. The big effort to be done consist in minimizing costs eliminating
useless attractions and learn from non-satisfied Users’ needs. To accomplish
both purposes the site owners need intelligent tools, agents that warn them
about red and blue events.
What’s does Intelligent
mean
Let’s analyze the basic process of users-Internet interactions. One user meets one site to interact in only two forms: making click over a link or filling a form or a box with some text, for instance to make a query to a database. The site statistic are well prepared to account for clicks, telling what “paths” were browsed by each user but they are not well suited to account for interaction derived from textual interactions. Of course, you may record the queries and even the answers but that’s not enough to learn from mismatching. To accomplish that we may create intelligent agents that account for the components of each answer, for instance documents, but they have to do then a rather heavy accounting.
If we query a commercial database for tires the answer would be a list of tires stores; and to have statistics about how frequent the users ask for this specific keyword we need to account for it; and to know about the “presence” of each store as a potential seller we need to account for it; and if we want to know about the popularity of each store we need to go farther, accounting for it and so forth. That accounting process involves a terrific burden even done in the site servers side.
An intelligent approach should be to have all possible counters built into the data to be queried. That’s the beginning of the idea: to provide a set of counters within the data to be queried by users for each type of statistic. So when a data is requested a counter is activated accounting for the presence, and when it is selected by a click another counter is activated and when the user by reading the “intelligent summary” received decide to make a click over the original site or over one of its inner hyperlinks, another counter is activated.

Here is represented a typical track of user-Site interaction. The user makes a query for “tires”. The I-database Intelligent Database answers sending all data it has indexed by tire adding a list of synonyms and related keywords it has for tire. Each activated I-URL accounts its presence in that answer adding one to the corresponding counter in the I-Tags zone. If the user makes click on a specific I-URL the system presents it to the user accounting for this preference in another counter of the I-Tags zone.
Finally if the user decides to access the commented site located in the black crown makes a click and another counter is activated within the I-Tags zone. At the same time the counter corresponding to the keyword tire is activated adding one and the same if the user activates some synonym or related keyword. If the answer is zero data it means a mismatch because an error or a warning about a non-existent resource within the database. In both cases the system has to activate different counters for the wrong or non-existing keyword in order to account for the popularity of this specific mismatch. If the popularity is high it is a warning signal to the site Chief Editor about the potential acceptance of the keyword, either as a synonym or a related keyword. At the same time, the system may urge to look for additional data within the black region. From time to time the systems could suggest the rehearsal of the I-URL’s summaries database in order to assign data to the new keywords as well.
Within the intelligent feature we consider to register the IP of the users interactions and the sequence of queries, normally related to something not found. The keywords users’ strings are in their turn related to specific subjects within the Major Subject of the site. So, statistically, the keywords strings analysis tells us about the popularity of the actual menu items and suggests new items to be considered.
[1] The
Noosphere is the part of the world of life that is created by man's thought and
culture. Pierre Teillhard De Chardin, Vladimir Ivanovich Verdansky and Edouard
Le Roy distinguish the noosphere from the geosphere, the non-living world, and
from the biosphere, the living world.