The Future of Cyberspace
The Webspace and the Noosphere
You may find 30,136 pages dealing with “noosphere” in Altavista at 2.22 PM Eastern Time for USA and Canada on Thursday 12th of April 2001. This is a rather strange word for many people that did not deserve an entry in the Merriam Webster online dictionary yet. However we know, use and enjoy the Cyberspace, concept that at nearly the same time deserves as many as 777,290 entries in the same Altavista, but on the contrary it has an entry in Merriam Webster since 1986, with the following meaning: the on-line world of computer networks. Webspace is another neologism not yet included in that dictionary but deserves 485,805 entries in Altavista.
The Webspace growths at a fantastic pace holding today nearly half a billion of documents, ranging from Virtual Libraries and virtual reference e-books dealing with the Major Subjects of the human knowledge through ephemeral news and trivial virtual flyers generated “on the fly” at any moment continuously. We may find in the Web documents belonging to any of the three Internet major resources or categories: Information, Knowledge and Entertainment.
In the above figure the black crown represents the Webspace and the green circle the users. The gray crown represents an intermediate net to be built in the near future with intelligent resumes of the Human Knowledge, pointing to the basic documents and e-books of it. One user is shown extracting a “cone” of what he/she needs in terms of information and knowledge. The intelligent resumes must be engineered in order to be good enough as introductory guides/tutorials with a set of essential hyperlinks inside. If the user wants more detail goes then directly to the right sources within the black region. Depending of the Major Subject dealt with the user may go from resume to resume or jumping to higher level guides inside the gray region going to the black region only to look for specific themes.
Another user goes directly to the black region guided by aid of classical search engines as now. The black region will be always necessary and will grow fast in volume as time passes by. On the contrary the gray region will fluctuate around a medium volume growing at a relatively very low rhythm. Effectively, the Human Knowledge is almost bound, changing its content but always around the same set of Major Subjects. The growth of the gray region is extremely low in comparison to the black region. Some Major Subjects die and some others are born but slowly.
As a science fiction exercise we invite you to make some calculations resembling some Isaac Asimov stories. Being the actual Human Knowledge bound to let’s say 250 Major Subjects or Disciplines and if for each of them we define a Virtual Library with non redundant 2,000 e-books, in the average, we will have a volume of 500,000 e-books. Now we could design a methodology to synthesize an intelligent resume for each e-book in no more than 2,000 characters, in the average, totaling 1,000 MB ó 1 GB storing one character in one single byte. That would be the volume of the gray region!, not too much really!.
Let’s then compare this volume to the volume of the black region and to the volume of the resources of the Human Knowledge. Once upon a time, there were a Webspace with half a billion documents with an average volume estimated in 2.5 MB (we have documents ranging from 10KB and less to 100MB and more: to get that figure we supposed the following arbitrary size series 1, 10, 100, 1,000, 10,000, 100,000 in KB and we assigned to each term the following arbitrary weights: .64, .32, .16, .08, .004, .002 respectively). Then we have a volume of nearly 1250, 000,000 MB!. Within that giant space float disperse the basic e-books, the resources of the Human Knowledge with an estimated volume of nearly 500,000 MB assigning 1MB to each one, half a million of text and 100 images of 5KB in the average.
Incredible result that demonstrates how easy will be able to compile a rather stable HKIS, Human Knowledge intelligent Summary in relation to the unstable, noisy, bubbling, fizzy and always growing black region. Once the effort is done the upgrade will be facilitated via Expert Systems that will take from the black region out only the changes.
In the figure above we depict the actual Webspace in black, resembling the physical space of the Universe. No doubt the information we need as users is up there but where?. That virtual space is really almost black for us. Some members of the Cynerspace that provide searching services titled as Search Engines and/or Web World Wide Directories are like stars that irradiate light all over the space to make sites indirectly visible. Sometimes we may find quite a few sites with their own light, like stars, activated by publicity in conventional media but the rest is only illuminated by those services at request. Let’s go deepen a little about the nature of this singular Webspace searching process.
For each resource (body) located in the space in an URL, which stands for Uniform Resource Locator, robots of those lighting services prepare a brief summary with some information extracted from it, no more than a paragraph and then all the information collected goes to the services’ databases. The summaries have attached to them some keywords extracted from the resources visited and consequently are indexed in as many keywords as they have attached.
The actual robots are very “clever” but extremely primitive compared to human beings. They are doing their best and they have to perform their work fast in fractions of millisecond per resource as well so it would be unpractical being more sophisticated because the time of “evaluation” grows exponentially with the level of cleverness. To facilitate the robots work the Website programmers and developers have at hand wise tools but many of them overuse those facilities so badly to make them unwise. In fact with those tools the programmers could communicate to the robots some essential information the site owners wish to be known about the site.
Those wise gateways are now noisy because most people try to deceive the robots overselling what should be the essential information. Why do they that?. Because the Search Engines must present the sites listed hierarchically, the first the best!. It occurs something like in the Classified Section of the newspapers: the people wishing to be listed first unethically make nonsense use of the first letter of the alphabet: AAAAAAA Home Services go first that for instance AA Home Services. The Search Engines do not have too much room to design a “fair” methodology to rank the sites with equity.
One trivial criterion should be to count how many times a keyword is cited within the resource but that proved to be misleading because the robots only browse the resource partially being practically impossible to differentiate a sound academic treatise from a student homework concerning the same subject. To make the things worse, programmers, developers, and content experts know all those tricks and consequently they make overuse of the keywords they believe are significant.
The Search Engines have improved too much along the last two years but the searching process continues being highly inefficient and tends to collapse. To help site owners to gain positions within the lists (in fact to get more light) proliferate ethical and unethical techniques and programs most of them apt to deceive the “enemy”, namely the Search Engines. Even in a ‘Bona Fide” utopia it’s impossible for a robot to differentiate between a complex site and a humble site dealing with the same subject. Complex sites architectures could even make the sites invisible for them because they are only well suited to evaluate flat and simple sites.
We emphasize again the fact that the “light” that a Search Engine provides to each URL is indirect like the Moon reflects the Sun’s light. Then our conclusion is that most of the information and the knowledge is hidden in the darkness of the Cyberspace.
Now that we know the meaning of the HK Human Knowledge we may define HKIS, the Human Knowledge Intelligent Summaries, a set of summaries that we have to explain soon why do we title them as intelligent, and NHKIS, for a Network of Human Knowledge Intelligent Summaries that correspond to the gray crown of the above figures. Now we are going to enter into the problem of the languages and jargons spoken in the Black Region, in the Gray Region and mainly in the Green Region.
The Website are built to match users, are like lighthouses in the darkness, to broadcast information, knowledge and in the case of e-Commerce some kind of information we could title as “opportunities”. What really happens is that at present Internet is more the Realm of Mismatch than of Matching. The lighthouses owners cannot find the users and the users neither cannot find the alleged opportunities nor understand the messages. This mismatching scenario is dramatic in the case of Portals, huge lighthouses created to attract as many people as possible via general interest “attractions”.
Something similar occurs with the databases where are stored millions of units of supposedly useful information such as catalogs, services, manufacturers, professionals, jobs opportunities, commercial firms, etc: users could not find what they need. When we are talking of mismatch we mean figures well over 95% and in some databases less than 0,1%.
In the figure above we depicted this dramatic mismatch. The yellow point is a Website with its offer represented by the cone emerging from it, let’ say the Offer expressed in its language and in its particular jargon. A point black within the green circle represents a user and the cone emerging out from it his/her Demand expressed also in his/her language and particular jargon.
What we discovered is that both sides speak approximately the same language but by sure different jargons and more than that, they think different!. We have depicted the gray crown because the portion corresponding to its Major Subject virtually exists: that’s the portion in dark gray within its cone. They have the “truth” expressed in its particular jargon, and sometimes the “official” and standard jargon. If the Website were for instance a “Vertical” of the Chemical Industry, of course its jargon will then be within the Chemical Industry Standards and its menu should be expressed technically correct, resembling the Index of a Manual for that particular Major Subject: Chemical Industry.
So our conclusion of a research done along two years studying the mismatch causes was that the lighthouses speak -or intend to- official jargons, certified by the establishment of their particular Major Subjects. They are supposed having the truth and they think as “teachers”, expressing their truth in their menus that are in fact “logical trees”. They may allege to be e-books and they behave, think, and look, pretty much the same as physical books.
Now let’s analyze how the users act, express and behave. If one user meets the site to learn, the cones convergence is obliged, the user thinks in terms of concepts of the menu that for him/her resembles a program of study, and we have a match scenario. If the user meets the site to search something, that’s different. When one goes to search something one tends to think in keywords terms instead, keywords that belong to our own jargon and at large in our own Thesaurus. So, either by ignorance or on the contrary, being an expert, the users’ cones diverge substantially from the site’s cone. One of the main reasons of this divergence is that the site owners ignore what their market target needs. Many of them are migrating from conventional businesses to e-Commerce approaches and extrapolate their market know-how as it is. They were working hard along decades to match their markets and to establish agreed jargons and now they have to face unknown users coming virtually from all over the world.
Evidently the solution will be the evolution from mismatch to match in the most efficient way. To accomplish that, both the Offer and the Demand, have to approximate each other until both share a win-win scenario and a common jargon.
In the figure above we depict a mismatch condition where we might distinguish three zones: the red zone represents the idle and or useless Knowledge; the gray zone corresponds to the common section with an agreed Thesaurus concordance; and the blue zone corresponds to what the users needs, wants, and apparently does not exist within the site. So the site owners and administrators have two lines of action: a) reduce to zero the red zones, for instance adapting and/or eliminating supposed “attractions” and b) learn as much as possible about the blue zone.
At this moment the dark green zones are extremely tiny, less than 5% being Internet the Realm of Mismatch between Users’ Demand and Sites’ Offer. The big effort to be done consist in minimizing costs eliminating useless attractions and learn from non-satisfied Users’ needs. To accomplish both purposes the site owners need intelligent tools, agents that warn them about red and blue events.
What’s does Intelligent mean
Let’s analyze the basic process of users-Internet interactions. One user meets one site to interact in only two forms: making click over a link or filling a form or a box with some text, for instance to make a query to a database. The site statistic are well prepared to account for clicks, telling what “paths” were browsed by each user but they are not well suited to account for interaction derived from textual interactions. Of course, you may record the queries and even the answers but that’s not enough to learn from mismatching. To accomplish that we may create intelligent agents that account for the components of each answer, for instance documents, but they have to do then a rather heavy accounting.
If we query a commercial database for tires the answer would be a list of tires stores; and to have statistics about how frequent the users ask for this specific keyword we need to account for it; and to know about the “presence” of each store as a potential seller we need to account for it; and if we want to know about the popularity of each store we need to go farther, accounting for it and so forth. That accounting process involves a terrific burden even done in the site servers side.
An intelligent approach should be to have all possible counters built into the data to be queried. That’s the beginning of the idea: to provide a set of counters within the data to be queried by users for each type of statistic. So when a data is requested a counter is activated accounting for the presence, and when it is selected by a click another counter is activated and when the user by reading the “intelligent summary” received decide to make a click over the original site or over one of its inner hyperlinks, another counter is activated.
Here is represented a typical track of user-Site interaction. The user makes a query for “tires”. The I-database Intelligent Database answers sending all data it has indexed by tire adding a list of synonyms and related keywords it has for tire. Each activated I-URL accounts its presence in that answer adding one to the corresponding counter in the I-Tags zone. If the user makes click on a specific I-URL the system presents it to the user accounting for this preference in another counter of the I-Tags zone.
Finally if the user decides to access the commented site located in the black crown makes a click and another counter is activated within the I-Tags zone. At the same time the counter corresponding to the keyword tire is activated adding one and the same if the user activates some synonym or related keyword. If the answer is zero data it means a mismatch because an error or a warning about a non-existent resource within the database. In both cases the system has to activate different counters for the wrong or non-existing keyword in order to account for the popularity of this specific mismatch. If the popularity is high it is a warning signal to the site Chief Editor about the potential acceptance of the keyword, either as a synonym or a related keyword. At the same time, the system may urge to look for additional data within the black region. From time to time the systems could suggest the rehearsal of the I-URL’s summaries database in order to assign data to the new keywords as well.
Within the intelligent feature we consider to register the IP of the users interactions and the sequence of queries, normally related to something not found. The keywords users’ strings are in their turn related to specific subjects within the Major Subject of the site. So, statistically, the keywords strings analysis tells us about the popularity of the actual menu items and suggests new items to be considered.
 The Noosphere is the part of the world of life that is created by man's thought and culture. Pierre Teillhard De Chardin, Vladimir Ivanovich Verdansky and Edouard Le Roy distinguish the noosphere from the geosphere, the non-living world, and from the biosphere, the living world.