AI in Web Search Engine

Sidebar: Artificial intelligence and the smarter search engine
By Linda Rosencrance November 10, 2003 12:00 PM ET
Computerworld - Within three to five years, we could see a very different, next-generation search engine -- one that could extract specific facts, draw inferences and organize those facts based on a few key words, says Tom Mitchell, former president of the American Association of Artificial Intelligence in Menlo Park, Calif. "Everybody who uses a search engine like Google knows it's a tremendously useful thing," says Mitchell. "You can type in words and get back Web pages that mention those words." But, he says, there has been a lot of progress in artificial intelligence based on a much more sophisticated, natural-language analysis by computer, the goal of which is to get computers to read text and understand the meaning of that text. Mitchell says what people are now able to do in the laboratory is develop computer software that can, when given a Web page or Web site, examine that page or site and find names of people, dates and locations. "It can't read text and understand it in the level of detail people can, but already it can read text and can say, 'Oh, this is the name of a person' with about 95% accuracy and, 'Oh, this is a location; this is a date,'" he says. Researchers have written computer programs that can find names and job titles of people mentioned on a Web site. For example, such programs can find "Jane Smith, vice president of marketing," or "Joe Jones, CEO," according to Mitchell. "We already have in AI a very active and rapidly progressing research effort on automatically extracting really factual information out of the text," he says. "So now think about the search engine you'd really like to have." Say you're a student looking at Web sites of universities to decide where to go to college, Mitchell continues. You go to a search engine and type in the things you're interested in, such as colleges that have a meteorology department. But what if you want to know what the faculty to student ratios are at those colleges? Well, you can type in words like meteorology, and then you can go and browse through the Department of Meteorology at a specific college's Web site, which might have 5,000 pages on it. And you might stumble across a list of faculty members on one Web page, and then on a different Web page, you might find how many students in the department. "From that, as a person, I could dig around and maybe figure out that the faculty-to-student ratio is 27 to 300," he says.
But in the future, computers could be able to do that for you. So you can go to a search engine and type in, "Show me a list of universities that offer meteorology as a major and rank-order them by student-tofaculty ratio," Mitchell says. To do that, the search engine will have to examine many different Web pages, extract information from those pages, put it together, organize it and have the computer present that table of results to the user, he says. "It will give us a very different kind of search engine," Mitchell says. "Usually, when you use a search
engine, it's because you have a question you're trying to answer, and as a person, right now, you turn that question into a bunch of keywords. And then you go search for the answer, but with the nextgeneration search engine, you're going to be able to ask a specific question." The user will be able to do that because of technology that's under development that partially allows a computer to read -- in a sense that it's able to extract specific facts and draw inferences from those facts and then present them, according to Mitchell. Researchers working on this problem now are using machine-learning methods to train the software to do that reading. And people are using learning algorithms to train the software by giving it a Web page and highlighting in red the names of people and highlighting their titles in orange and then drawing a link between which name goes with which title. That becomes the training data for the program, which then infers general rules about what sequences of words tend to surround names, what names typically look like and what sequence of words between a name and a title indicate that that title belongs to that person, i.e. "Jane Smith comma CEO" usually means that she is currently the CEO. But if you see the words, "Jane Smith was CEO of IBM," then it means she was the CEO in the past. There are many different text patterns that indicate this relationship, and one of the reasons there has been so much progress lately is that people have said, "OK we're not going to figure out by hand what all those patterns are; we're going to give lots of training data to the program and let it find those patterns using algorithms," Mitchell says. So, he says, both the technology push and the commercial pull exist. "Google is a very successful company," Mitchell says. "They have the best search engine now, but if somebody started a search engine where you could just ask the questions and get the answers in a much more rich way, that would quickly become the dominant search engine."
Web search engine

A web search engine is designed to search for information on the World Wide Web and FTP servers. The search results are generally presented in a list of results often referred to as SERPS, or "search engine results pages". The information may consist of web pages, images, information and other types of files. Some search engines also mine data available in databases or open directories. Unlike web directories, which are maintained only by human editors, search engines also maintain real-time information by running an algorithm on a web crawler.
Artificial intelligence
It is the science and engineering of making intelligent machines, especially intelligent computer programs. It is related to the similar task of using computers to understand human intelligence, but AI does not have to confine itself to methods that are biologically observable.
Artificial intelligence (AI) is the intelligence of machines and the branch of computer science that aims to create it. AI textbooks define the field as "the study and design of intelligent agents"[1] where an intelligent agent is a system that perceives its environment and takes actions that maximize its chances of success.[2] John McCarthy, who coined the term in 1956,[3] defines it as "the science and engineering of making intelligent machines.
Intelligence
Intelligence is the computational part of the ability to achieve goals in the world. Varying kinds and degrees of intelligence occur in people, many animals and some machines.
Produce In the vast ocean of information, people only rely on search engines can not lose direction, can quickly find the required information. Also more and more search engine. Various search engine function focus is not the same, some comprehensive search, some business search, some software search, some knowledge search. Relying on a single search engine can provide people with information, so the need to generate a software or web site to various search engines seamlessly together, so the intelligent search engine came into existence. Definition Intelligent search engine is a combination of artificial intelligence technology in the new generation of search engine. He can not only provide the traditional rapid retrieval, relevance ranking function, but also provide the user registration, user role in automatic recognition, semantic content understanding, intelligent information filtering and pushing function. Intelligent search engine design goal is: according to the users request, can be obtained from the cyber source to search out the most valuable to the user information. Intelligent search engine has the information services of intelligent, humanistic characteristics, allows users to use the natural language information retrieval, as they provide more convenient, more precise search service. Search engine Baidu, domestic representative: search, search; representative abroad : WolframAlpha, Ask Jeeves, Powerset, Google etc.. Characteristic As long as the user input search key words can be disposable by clicking the mouse quickly switching to a different classification or engine, greatly reducing the manual input the website to open the search engine, select a category, and then enter the keyword search time. The intelligent
search interface similar, generally the top line is the search classification, intermediate is the key input box, the following line is the search engine. Intelligent search can realize a one-stop search webpage, music, games, pictures, movies, shopping on the Internet can query to the mainstream of all resources. With the general search engine ( such as Baidu and Google ) is different: he can set various search engine search results in one, so that we in the use of more convenient. Single strictly speaking, he is not a search engine, but he is more convenient than the search engine.
3) Intelligent search
Probably the most widely used form of AI is in Intelligent Search. We benefit from it when we use the mighty Google search engine, while finding Amazon products, and when we get driving directions off Mapquest. Intelligent search is everywhere. There are so many different technologies that are employed in Intelligent search it's hard to know where to begin.
Intelligent Search One popular method uses and records the interaction of users to increasingly improve the intelligence of their search results. If a user clicks on a certain document or page it is given added weight in terms of its relevance. If web site owners link to one document or page more often than others that too is taken into account. This of course is only scratching the surface. Another way intelligence is used in search is to find what a user is looking for as quickly as possible. The search algorithm A* (a star) is used to make sure that the amount of time, which is used for some information to travel on given paths, is decreased drastically. A function known as F(x) or the distance + cost heuristic function is utilized by the A*algorithm. If you'd like to add intelligent search to your application we recommend you check out Lucene. Lucene is a free open source and a very innovative system which was created by Apache. It produces and optimizes intelligent searches. It's very easy to use and you don't have to make drastic changes in your system environment because it's completely file independent.
4) Document Filtering
Artificial intelligence is important and highly effective in its use of text and document classification. One common example is filtering out SPAM from your email inbox. This is
accomplished by using a common classification algorithm like Naive Bayes classifier or employing an artificial neural network.
"When using this kind of technology [Document Filtering] in your applications, it's not uncommon to see success rates of up to 99.5%. "
When using this kind of technology in your applications, it's not uncommon to see success rates of up to 99.5%. So how does this all work? In a nutshell, it takes sample data that you give to it and it learns from it. For example, if an email arrives in your mailbox and you mark it as SPAM, your software then parses that email to learn what elements make it SPAM. Then when the program sees other messages with similar elements it then classifies that as SPAM and deletes it. By no means is document filtering restricted to its application in SPAM filtration. This kind of technology can be used in countless ways. If you have interest in integrating this kind of technology in your program you might want to check out the free open source library WEKA (Waikato Environment for Knowledge Analysis).
5) Using AI In Data Mining

Thanks to the large amount of information available on the internet today - the amount of which doubles every three years - data mining has become an important tool.
"...the amount of [data] which doubles every three years...."

The object of data mining is to find and extract patterns from large data sets. Some common uses of data mining are marketing, scientific discovery, and even surveillance. A classic example is a company mining their database to find the customers that would be most likely to accept a certain offer. Companies are also using data mining extensively for getting to know more about consumer interest, spending trends, and habits. Data mining is a cost effective way for large and small companies to get the necessary data when required. If you'd like to explore this technology further do some searching on important algorithms like, Classification, Segmentation, Association, Regression and Sequence analysis. Also, decision trees, rules, and neural networks are utilized in the modern data mining.
Artificial Intelligence Tied to Search Future

By Paul Krill, Infoworld Jul 14, 2008 1:30 am
AI (Artificial intelligence) has the potential to enhance Internet searches, but obstacles still must be overcome, a speaker stressed at a technical conference Thursday hosted by IBM. Entitled "The New AI: New Paradigms for Using Computers Workshop," the event at the IBM Almaden Research Center in San Jose, Calif. featured a presentation by Oren Etzioni, director of the Turing Center at the University of Washington. Multiple AI and machine learning projects also were highlighted at the event. Etzioni emphasized more intelligent Internet- searching. "We're going to see in the next five years next-generation search systems based on things like Open IE (Information Extraction)," Etzioni said. Open IE involves techniques for mapping sentences to logical expressions and could apply to arbitrary sentences on the Web, he said. Etzioni cited work on Softbot intelligent interface technology. But he noted issues, such as a Softbot that might be given the goal of deleting a file but instead deletes an old server log. Using contemporary humor to illustrate another potential problem, he even referred to a Doonesbury cartoon where a search for milk has the Softbot buying luggage along with the milk, against what the searcher desired. But solutions for enhanced search are emerging, including semantic tractability, in which simple sentences can be understood, and the clarifying of dialogs that could have double meanings, said Etzioni. Natural language interfaces have been preferred as the way to talk to Softbots, but these must be reliable, he said. Etzioni also cited work on the KnowItAll project, which is about extracting high-quality information from text on the Web. Another effort, TextRunner, pertains to open information extraction and is meant to serve as a foundation for a massive knowledge base. An organizer of Thursday's event shied away from the term "artificial intelligence." "[The term] artificial intelligence has fallen out of favor. You're not hearing about expert systems anymore," said Stefan Nusser, senior manager of the IBM User Systems & Experience Research Group. "But right now, there is sort of a re-emergence of some of these methodologies." The event also showcased several projects in the AI and machine learning spaces. These included: -- Using AI to Identify Interesting Assertions. With this University of Washington project, machine learning is combined with human computation to identify which assertions extracted from the Internet are more interesting. TextRunner is used in this project as well as content creation sites like Wikipedia. -- Data Visualizations and Continuous Interfaces. This Yahoo effort features various applications offering advanced visualizations of data, such as FAA flight paths.
-- Examining Obstacles to Software Developer Adoption of Statistical Machine Learning. This University of Washington and Intel project involves studies to provide the basis for development tools to better support software developers applying statistical machine learning within applications. -- CueFlik: Interactive Concept Learning in Image Search. Sponsored by the University of Washington and Microsoft, the project provides a Web image search application enabling users to develop rules for re-ranking Web images according to visual characteristics. -- Towards PR2: A Personalized Robot Platform. This Willow Garage effort features a hardware and software platform for robots that do tasks for humans in human environments. In collaboration with Stanford University, an open-source robot operating system is being developed as well. -- SparTag.us: A Low Cost Tagging System for Foraging of Web Content. This Palo Alto Research Center (PARC) project features a new tagging system with a "Clik2Tag" technique to provide low-cost tagging of Web content. Users can highlight text snippets and collect tagged or highlighted paragraphs into a system-created notebook that can be browsed and searched. -- WikiDashboard: Social Transparency and Visualization for Wikipedia. Also a PARC effort, the project features an analysis tool intended to improve social transparency and accountability on Wikipedia articles. -- Responsive Mirror: An Intelligent Fitting Room Using Multi-Camera Perception. A PARC project involving a system for retail fitting rooms enabling online social fashion comparisons based on multi-camera perceptions. -- Magitti: Mobile Recommendations for Leisure Activities. This PARC system uses context filtering to narrow down the overload of leisure time offerings in urban areas. The system infers interests and activities for models learned over time based on individual and aggregate user behavior. -- Intelligent E-mail: Reply and Attachment Prediction. A University of Pennsylvania project that involves enhanced e-mail interfaces intended to reduce the stress of email overload. -- Model-driven Content Connectors and Web Intelligence. Consider the Source. An IBM approach for making predictions about relevant content and what should be made accessible in an intelligent navigation system. Unified Modeling Language is leveraged to form connectors between user goals, objects, and content types. -- AALIM: Diagnostic Decision Support for Cardiologists. This is an IBM-developed decision support system to identify similar patient records and aid in diagnostic decision support. -- CoScripter: Programming the Web by Demonstration. An IBM project involving a system for recording, automating, and sharing processes performed in a Web browser. Repetitive activities are automated. It is an extension to the Firefox browser.
-- Highlight: Mobilizing Existing Web Sites. This IBM project enables users to create mobile versions of existing Web sites that are customized to their own tasks and devices. -- ShapeWriter: Intelligent Gesture Input. An IBM endeavor involving an advanced mobile text input solution that recognizes a user's intended words through real-time statistical analysis of a user's gesture stroke on the graphical keyboard. -- CALO (Cognitive Assistant that Learns and Organizes) Express. This is a Windows-based version of SRI International's CALO project to build an intelligent personal assistant. For example, it can figure out RSS feeds and suggest new feeds for the user.
Artificial Intelligence and the Web

The web was always evolving to deliver the best to the web users. At one point of the time it got so different that people started to use term Web 2.0. According to Wikipedia[1] The term "Web 2.0" (2004 - present) is commonly associated with web applications that facilitate interactive information sharing, interoperability, user-centered design, and collaboration on the World Wide Web. Examples of Web 2.0 include web-based communities, hosted services, web applications, social-networking sites, video-sharing sites, wikis, blogs, mashups, and folksonomies. The web was always great area of application for artificial intelligence. Search engines, web usage data analysis, finding and delivering information are some examples. But with Web 2.0 and then Web 3.0 artificial intelligence is getting new focus and a lot more applications. Jay M. Tenenbaum in [2] is showing new applications of artificial intelligence for Web 2.0. And perl is very useful for programming such applications since it has a lot of modules for artificial intelligence, web programming, text processing.

AI in Web Search Engine

Transféré par

Informations du document

Description originale:

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

AI in Web Search Engine

Transféré par

Droits d'auteur :

Formats disponibles

Sidebar: Artificial intelligence and the smarter search engine

By Linda Rosencrance November 10, 2003 12:00 PM ET

Web search engine

5) Using AI In Data Mining

"...the amount of [data] which doubles every three years...."

Artificial Intelligence Tied to Search Future

Artificial Intelligence and the Web

Vous aimerez peut-être aussi