Vous êtes sur la page 1sur 5

Competitive intelligence and the web

I.

INTRODUCTION

The paper provides an overview as to how the web can be used for CI. After defining CI, a review is made of the World Wide Web to provide a basis for understanding how information works on the web and the problems related to using this approach. Techniques that can be used to carry out CI as well as their related problems are discussed in sections that follow. II. COMPETITIVE INTELLIGENCE

The author gives a definition of competitive intelligence by The Society of Competitive Intelligence Professionals (SCIP) defines Competitive Intelligence as: the process of ethically collecting, analyzing and disseminating accurate, relevant, specific, timely, foresighted and actionable intelligence regarding the implications of the business environment, competitors and the organization itself [SCIP, 2003]. The process involves different activities to be carried out by organization that indulge in CI. The CI project should be continuous cycle with the following steps: 1. Planning and direction 2. Collection of data 3. Analysis of data 4. Dissemination of intelligence generated 5. Feedback Steps 1 and 2 are critical to the success of the CI project. Hence many information resources are consulted to undertake these two steps. Internet resources are frequently used in the CI process for the following reasons: 1. A business website can contain a vast information about the company 2. It is cheap to tap this source of information 3. Access to open sources does not require proprietary software III. THE WEB STRUCTURE

This determined by the HTTP protocol and the use of Uniform Resource Locators (URL). It provides a natural retrieval technique for the contents of the Web. The logical structure of the Web can be understood as a mathematical network of nodes and arcs. The nodes represent the web documents whereas the arcs represent the URLs (links) located within a document. In a simple retrieval technique one starts a particular HTML or XML document and follow the link from one document to another. This

process of following the links is termed as document retrieval or information retrieval. The content of the documents retrieved are evaluated. This may lead to other URLs. The retrieval techniques are graph search algorithms adapted to use a documents links to implement and control the search. An example of a graph search algorithm is a breadth first search on links contained in the initial document.

IV.

INFORMATION GATHERING ON THE WEB

The use of search engines is the common way to collect information from the web. Search engines take a users query and return a set of web pages that correspond to a certain degree the request of the user. In most cases the set of pages are ranked as to how well each satisfies a request. The following can most often be found in a search engine: 1. Web Crawlers or Spiders are used to collect Web pages using graph search techniques. 2. An indexing method is used to index collected Web pages and store the indices into a database. 3. Retrieval and ranking methods are used to retrieve search results from the database and present ranked results to users. 4. A user interface allows users to query the database and customize their searches Also there a number of domain specific search engines are available apart from the general ones. For example there are search engines specific for commercial publications, legal materials, and medical information and so on. Another type of search engine is the meta-search engine. This search engine connects to several popular search engines after receiving a query and integrates the results returned by those search engines. Metasearch engines use the indices created by the search engines being searched to respond to the request. Search engines are now using the technology of P2P because of the success of this technology. In this type, when a computer cannot respond to a request presented to it, it passes it to its neighboring computers. An example of this approach is the JXTA search engine The size of the web makes the use of a graph search algorithm approach problematic. Because it takes a long time to crawl and index all the relevant Web pages associated with a query. Information collected may be outdated and incorrect. This approach does not take into consideration the continuous updates of web pages. The pages and documents found on the internet can be classified into two basic forms:  Surface web: refers to pages or documents that are freely available to any user.  Deep web: pages and documents which consists of dynamic pages, intranet sites, and the content of Web-connected proprietary databases. Deep Web documents are generally made available only to members of organizations that produce them or purchase them, such as businesses, professional associations, libraries, or universities. They are not usually indexed by search engines. The difficulty attached to searching for information on surface web is that some sites are starting to charge a fee for access to information. V. INFORMATION ANALYSIS

To limit the large number of pages generated by an uncontrolled search, it is necessary to control the search by monitoring the graph search technique. That is what we call analysis of the information. The initial form of analysis is referred to as web mining. This can be categorized into three classes: 1. Web content mining: Refines the basic technique basic viewed as:  On-line such the content of the page  Off-line: web content mining maybe carried out using an unsophisticated search engine that use key words to control the graph search algorithm.  Text mining: The goal is to perform automated analysis of natural language texts. This analysis: Leads to the creation of summaries of document Determines to what degree a document is relevant to users query

2. Web structure mining It uses the logical network model of the web to determine the importance web page. Example: the PageRank, Hyperlink-Induced Topic Search (HITS) This technique combined with keyboard search is the foundation of the Google search engine. 3. Web usage mining This performs data mining on web logs that contains clickstream data. This data can be analyzed to provide information about the use of the web the behavior of the client depending upon what clickstream is being analyzed. VI. INFORMATION VERIFICATION There are some web search engines that evaluate the information sources, order them according to the request of the user, but dont modify the precision of this information. To have more confidence in information retrieved, its appropriate to first of all know the source of that information (deep web or surface web) and if possible verify it in other non-web source. Some questions can be asked for this verification: Who is the author?

Who maintains the web site? How current is the web page?

VII. INFORMATION SECURITY

A system of security information is need as soon as it is admit that a company can be the target of another one through the process of CI. So measures must be taken about: the privacy and integrity of private information thought the network methods Assuring the accuracy of its public information with network methods to avoid exploits like: Web Defacing, Web Page Hijacking, Cognitive Hacking, and Negative Information. Revealing unintentionally information that ought to be private.

Some definitions: Web defacing: means modifying the content of a web page. When its done in subtle ways it can also modify the accuracy of the information. Web page hijacking: it occurs when a user is directed to a web page other than the one which is associated to the URL. Information contains in these other page can be inaccurate. Cognitive hacking: also called semantic attack, it acts on the image of the firm; it gives a bad image of the firm. Generally, it is done by disgruntled customers/ employees, competition, or simply a random act of vandalism. There are two types of cognitive hacking: single source and multiple sources. Single source cognitive hacking occurs when a reader sees information but cant trace the source of that information, so cant verify it. Multiple sources cognitive hacking when several sources are available for a topic but the information isnt accurate. Multiple sources cognitive hacking can be declined into two categories of cognitive attacks: Overt cognitive attack: In this case, the attack is not masked. We have web defacing as example. Covert cognitive attack: The purpose here is influence readers decisions distributed or inserted. The misinformation appears to be reliable.

Possible countermeasures to cognitive hacking Countermeasures to cognitive hacking exploits need to be employed by a CI researcher. For instance competitors may put wrong information in the internet as a counter-CI measure. Counter measures to single source cognitive hacking include authentication of source, information "trajectory" modeling, and Ulam games. Determining source reliability via collaborative filtering and reliability reporting, detection of collusion by information sources, and the Byzantine Generals Model represent the counter measures to multiple source cognitive hacking.

Countermeasures: Single Source Due diligence is required of the CI researcher in carrying out the authentication of source countermeasures. Implied verification can be used by the researcher. An example is the use of PKI (Digital Signature) to verify the source of the information. Negative information A form of cognitive hacking is to build a Website that is a repository for negative information about a particular firm. A number of Websites include the word sucks as part of the URL. The firm needs to monitor those sites and respond accordingly. Unintentional disclosure of sensitive information A firm may reveal information about itself without knowing it.

CONCLUSION The article presented an overview of issues relating to CI and web while outlining the methods and techniques that can be used to get the right information from the internet

Vous aimerez peut-être aussi