Agents for the Internet

"A slow sort of country!" said the Queen. "Now, here, you see, it takes all the running you can do, to keep in the same place. If you want to get somewhere else, you must run at least twice as fast as that."

Lewis Carroll, in Through the Looking Glass

1.1 Introducing Internet agents

Soon, Alice will have another choice: To stand still and let someone else run for her. The idea of having a personal runner chasing down information has its appeal to those of us who have been blown sideways by yet another shockwave from this century's information explosions. Imagine Alice with a cadre of runners, all pursuing her interests all of the time and leaving her with nothing to do but to stand and wait for their return. As long as we're imagining, let's include runners with the capacity to focus their efforts based on an analysis of Alice's own likes and dislikes. Finally, let's use the Internet as the course to traverse, and we imagined one manifestation of the Internet agent--software that acts on your behalf.

Already other agents have found their way into our daily lives, reminding us of appointments, pointing out spelling errors, or periodically dialing in for our email. While some Internet agents can perform complex information gathering strategies autonomously; others Internet agents can gather discrete information from a limited set of sites, assembling a digest of topics of interest.

Agents, then, perform a service by either being reactive, responding to changes in their environment, or proactive, seeking to fulfill goals. Further, agents can remain stationary by, say, filtering incoming information, or become mobile, searching for specific information across the Internet and retrieving it. As it performs these actions, an agent ideally follows the first principle of good Internet behavior, "Do no harm." Finally, a key element of an agent's behavior, its autonomy, suggests that once goals are established its behavior is guided by its own capacities for action independent of intervention from its user.

1.2 The present, some examples

Even Alice got more help navigating than the average Web surfer gets when he/she first encounters the tangled Web that's been woven across the world. Two or three years ago Lycos and Yahoo! were the most famous locations for beginning any plunge into the

Internet. Now, a count of Internet search engines varies from between one hundred and over five hundred. Some of these engines are simply variations on Lycos or Yahoo!, while others are highly specialized. For example, you can use the Hypertext Webster Interface to search for definitions in Webster's Dictionary or SIFT, the Stanford Information Filtering Tool, developed by Tak Yan at Stanford University, which includes two services, one for computer science technical reports and one for USENET news articles.

Yahoo! presents its information hierarchically: users drill down through the layers of categories, refining the search for information with each selection. The success of this system depends upon the accuracy and judgment of both the people who submit their sites and the Yahoo! team that categorizes them. Additionally, Yahoo! uses its own cataloging software to organize itself. This mix of trained and untrained catalogers complicates the reliability of consistent, accurate listings. To supplement its own hierarchical scheme, Yahoo! also offers the option to search its database using a simple keyword search interface. This option also helps to overcome the silent vagaries of inaccurate cataloging to which Yahoo! is prone.

Lycos introduced brute force indexing of the Internet by using a program often referred to as a spider to search the World Wide Web every day and update its database of indexed sites. In the beginning, Lycos provided a simple search interface; however, when it moved from a university-sponsored research project to a commercial venture, it added other refinements such as a hierarchically organized subject guide and several prepared searches such as an interface for finding stock prices. Similarly, AltaVista, a search engine developed by Digital Equipment Corporation, also catalogs a vast quantity of pages and has recently added LiveTopics to assist users with refining their search by providing a way to select and eliminate search topics.

This effort to simplify searches and lower the inevitable frustration prompts sites like Yahoo! and others to encourage users simply to enter key words describing a topic and take a chance. Such an unstructured approach leads to a lot of hits or nothing at all. A dedicated user can find additional options, generally unique to each search engine's protocol, that claim to refine the search even further. Some sites even offer the chance to select the importance of one term in a search over another. In the end, the various search interfaces all attempt to overcome a basic problem: framing an effective and efficient question for a search requires a complex understanding of how knowledge may be structured within the unstructured universe of the Internet.

While librarians rely on a highly refined cataloging scheme to locate a book in a library, untangling the Web resembles the enigma of the Gordian knot. There simply are no cataloging rules.

The vast amorphous tangle that Lycos, AltaVista, Excite, Web Crawler, or any of the other search engines must confront suggests the magnitude of difficulty and frustration awaiting anyone hoping to use the Web as an efficient source of information.

A search of the Web for information on Internet agents produces a blizzard of responses that include references to real estate agents, insurance agents, theatrical agents, houses for sale, and so on, often with the most needed pages sorted at the start, then, maddeningly, another burst of relevance appears some twenty to thirty pages of pointless citations later. In short, a simple search is never simple.

Another recent practice with Web page creation has only added to the clutter of pages returned as a response: keywords. In an effort to ensure that search engines generate numerous hits to the same page, Web page creators have begun to frontload the keyword sections of their Web pages, often piling on ten, twenty, even thirty occurrences of the same word so that a search engine will then place the site higher in its rankings. This practice virtually guarantees that there are pages that will show at the top of completely unrelated searches and guarantees the additional, wasted time to review and page past these repetitions.

As an answer to the incompleteness of any search engine's database, as well as a method for combating excessive hits, meta-search engines have been developed. These engines often perform some preprocessing of keywords before submitting the search to other services. Some of these systems then add postprocessing of the results in an effort to reduce redundancies and to rank order the hits. Some of these meta-search engines include iFind, MetaCrawler, and SavvySearch.

While these meta-search engines seek to cover more of the available indexed sites and often rank order a query's successful hits, they continue to rely on simple brute force to complete their searches.

Into this chaos steps the first important generation of Internet agents. Some offer to reduce the results garbled by inefficient searches and the noise generated by duplicated pages. Others keep users abreast of particular sites, maintaining the latest information from that site by checking it regularly and downloading new information. And still others will remember every page you've visited, indexing and cross referencing matching words for later searches. In short, these agents act to filter, retrieve, or index information.

Some of these agents, available commercially, include SurfBot from Surflogic LLC and WebCompass from Quarterdeck Corporation. Surfbot and WebCompass offer users the chance to use their existing search agents, modify their criteria, and schedule their actions.

Beyond these search engines, a growing group of agents is beginning to edge onto the Internet. Some take the form of scripts for existing technology such as mailers, others are personal implementations of agent technology focused on fulfilling personal needs.

1.2.1 Discussion help, a stationary agent

The problem:	To allow topics under discussion throughout a company to be accessible to all employees, using a browser, in an organized fashion.
The solution:	An agent that uses a scripting system and integrated database to allow employees to read and post messages to discussion threads.

Dave Winer of UserLand Software has built such a system and provided to the Internet community for free. To handle this on our system we have set up a Web server and a copy of Dave's Frontier scripting environment, also provided to the Internet community for free. Then we installed a BBS system Dave wrote (yes, it's free) and everything was ready to go. Anyone who wants a message posted to the bulletin board goes to the BBS' Web page and logs on using his or her email address. The person who maintains the BBS can set up any number of discussion groups and can manage the messages as the system becomes too large to handle. Since the BBS is on a Web server, the new message is now posted to everyone on the net.

With such an agent in place, the history of a topic's discussion is readily available to all. Such a history also allows a team or a project newcomer the chance to catch up on the discussion's progress without having to find someone to reassemble the thread's developing ideas.

Although this agent works in a specific environment and relies on a set of scripts to log, post, and maintain the messages, it manages to keep all of the discussions at our site neatly organized. As an elementary agent, ours satisfies our needs and also points toward what will be a developing use for agents.

Just to burst the bubble, this really isn't an agent. Why not? Many of you may have noticed that this system is really just a bulletin board system. BBSs have been around far longer than the Internet and have always been able to do what Winer's system does. The BBS that Winer is distributing is a step toward agent technology compared to the BBSs with which many of us are more familiar. The use of scripts and a general architecture (Frontier) to build this specialized system are features that agents promise. All the BBSs I used in college were specially written programs that ran on one machine and usually didn't allow you to do anything else (like word processing or email) while it was running. Forget it if you wanted any special features or wanted to change the way the BBS worked. The BBS we use is built on top of Frontier, which wasn't built with the intention of creating a BBS for me to use. Someone looked at all the power Frontier has, and came up with something really cool to do with it.

But there are a number of features that the BBS doesn't (and really can't) provide that I want. The BBS and Frontier were designed as status systems: they run on only one type of computer (machines running MacOS), they reside on only one machine, and their behaviors do not change as time goes by.

This BBS goes a long way in showing us how to build something that helps us organize our conversations using a generic architecture. Removing the static-ness of the BBS lets us see what the next generation will provide. We can do this using the mobile agent systems presented in Parts II and III; each is a generic system but with the added ability of movement.

The agent version has the ability to run on a variety of platforms, whether by using an already transportable language like Java (this is what Aglets and the future Ara and Agent Tcl do) or by sitting on top of a Java-like virtual machine (which is what Telescript, Ara, and AgentTcl currently do). So now that the agent can run on different machines, it can move around the Internet. Now the BBS begins to take on a whole new meaning.

What is this future BBS? It could evolve into the next version of a software package like IBM's Lotus Notes, an application that helps people share information. It could be used to help children all over the world collaborate on projects by providing connections to other schools, helping the children and teachers understand the context of the other students, or even create friendships among children who will probably never talk to each other face-to-face.

Imagine this scenario: The agent of a small school in the United States finds out from an agent run by a university that a school outside Moscow is discussing similar issues related to pollution in their communities. The U.S. school's agent contacts the Russian school's agent and begins to coordinate information to show to the teachers of the respective schools.

In the morning each of the teachers will come to school and read about the other school's work, its children, and some of the conversations that have been going on. By that afternoon the school's agents are solving any issues of coordinating each school's BBS. For the rest of the year the children in both countries get to share in the experiences and lives of their schoolmates.

Did I mention that the children and teachers at the two schools don't speak each other's language? The agents found a program at another university that allows the conversations to be translated.

It will happen.

1.2.2 Classified help, an Internet agent

The problem:	Managing a career as a computer contractor/consultant.
The solution:	An agent that automatically checks selected classified categories in the Sunday Washington Post and emails résumés with no intervention required; another agent that checks and saves the total information for selected classified categories; and a third agent, in development, that downloads selected classified sections, and creates a searchable database of job listings.

Charles Crizer (cfcrizer@dyncon.net) is a computer contractor who developed these agents to assist him in his search for new jobs. On a typical Sunday the first agent will go to the Washington Post's classified ad site and download all of the jobs filed under selected categories such as technology, programmer, computers, and so forth. Once the agent has done this, it parses the files for email addresses and stores this information along with the date the ad was downloaded. Crizer then activates the agent to search this information and it then emails his résumé to the email addresses that match his criteria. Sometimes, the agent will pump out as many as a thousand résumés in the five minutes its takes to manage this task and it does so with no intervention on Crizer's part other than establishing the date range for the ads the agent should access.

Each Sunday, a second agent downloads the total information from various classified sections and stores them for perusal later. Crizer is developing a variant of this agent which will not only download the classified ads for a section, but will strip out the HTML code, parse each ad, and generate a database that, later, can be searched

productively and used to assemble mailing lists for résumés, or a list of available jobs meeting specific criteria.

Just as with the BBS example, this isn't really an agent, but this system also provides some of the features that we want from our mobile agents. One of the major features of this system is that it saves Crizer lots of time while performing necessary but mindless tasks (sending out the résumés).

The mobile agents will allow Crizer to build agents that save him more time while increasing the value of the information that he has to review. Right now the script Crizer has written searches only his local newspaper. With the growth of telecommuting, Crizer may soon be able to answer job ads in different parts of the country by telling the mobile agents to search everywhere for the types of jobs he likes. Since the agents will be looking in different parts of the country, they can wander on the 'Net all week long.

One day, Crizer will wake up to find that one of his agents not only found him the best job he has ever seen, but the agent has already sent his résumé to the company. In this scenario, before Crizer even finishes his first cup of coffee, one of his agents reports back that the company sent an agent requesting his schedule for the day (and was responded to instantaneously by another agent) and that a Mr. Smith would like to call at either 1 or 3 o'clock this afternoon to discuss the position. A job search, complete with interview, and Crizer hasn't even had a chance to butter his toast.

. . .