Vous êtes sur la page 1sur 6

H. P.

Luhn

A Business Intelligence System

Abstract: An automatic system i s being developed to disseminate information to the various sections of any
industrial, scientific or government organization. This intelligence system will utilize data-processing
machines for auto-abstracting and auto-encoding of documents and for creating interest profiles for each
of the “action points” in an organization. Both incoming and internally generated documents are automati-
cally abstracted, characterized by a word pattern, and sent automatically to appropriate action points. This
paper shows the flexibility of such a system in identifying known information, in finding who needs to know
it and in disseminating it efficiently either in abstract form or a s a complete document.

Introduction
Efficient communication is a key to progress in all fields in its original form, disseminate the data promptly to the
of human endeavor. It has become evidentin recent years proper places and furnish information on demand.
that presentcommunication methods are totallyinade- The techniques proposed here to makethese things pos-
quate for future requirements. Information is now being sible are:
generated and utilized at an ever-increasing rate because
1. Auto-abstracting of documents;
of the accelerated pace and scope of human activities and
the steady rise in the average level of education. At the 2 . Auto-encoding of documents;
same time thegrowth of organizations and increased spe-
3. Automaticcreationandupdating of action-point
cialization and divisionalization have created new barriers
profiles.
to the flow of information. There is also a growing need
for more prompt decisions at levels of responsibility far All of these techniques are based on statistical proce-
below those customary in thepast. Undoubtedly the most dures which can be performed on present-day data proc-
formidable communications problem is the sheer bulk of essing machines. Together with proper communication
informationthathasto be dealt with. In view of the facilities andinput-outputequipment acomprehensive
presentgrowthtrends, automation appears to offer the system may be assembled to accommodateall information
most efficient methods for retrieval and dissemination of problems of an organization. We call this a Business Intel-
this information. ligence System.
During the past decade significant progress has been
made in applying machines to the processes of informa- Objectives and principles
tion retrieval. Automatic dissemination has so far been Before the system operation is described, the term Bus-
given little consideration; however, unless substantial por- iness Intelligence System should be defined and the objec-
tions of human effort in this area can bereplaced by tives and principles stated.
automatic operations, no significant over-all improvement In this paper, business is a collection of activities car-
will be achieved. Eventheinformation retrievalpro- ried on for whatever purpose, be it science, technology,
cesses mechanized so far still require appreciable human commerce, industry, law, government, defense, et cetera.
effort to organize the information before it is entered into The communication facility serving the conduct of a bus-
machines. iness (in the broad sense) may be referred to as an intel-
It is believed that techniques now being developed will ligence system. The notion of intelligence is also defined
greatly contribute to the solution of the problem by ex- here, in a more general sense, as “the ability to apprehend
tending automatic processes to the preparatory phases of the interrelationships of presented facts in such a way as
mechanical information-retrieval systems, to the area of to guide action towards a desired goal.”l
dissemination andto associatedfunctions.Ideally, an The term document is used to designateablock of
automatic system is needed which can accept information information confined physically in a medium such as a
letter, report, paper or book. The term may also include that its existence will be readily recognized.
the medium itself. 3. Transmittal of information either as a result of dis-
The objective of the system is to supply suitable infor- semination or of retrieval is to beguided by pro-
mation to support specific activities carried out by indi- gressive stages of acceptance by anaction point. This
viduals, groups, departments, divisions, or even larger procedure saves the recipient’s time by reducing the
units. These are the action points previously referred to. amount of material to be transmitted and eliminating the
To this end the system concerns itself with the admission non-pertinent material.
or acquisition of new information, its dissemination, stor- 4. The system is to provide means for quickly discovering
age, retrieval and transmittalto the action points it serves. similarity of interests and activities that might exist
More particularly the objectof the system is to perform amongst action points so that subjects and problems of
these functions speedily and efficiently, taking advantage common concernmay be discussed and advanced through
of novel procedures which utilize the inherentcapabilities directinterchange of ideas between such points, if SO
of electronic devices. desired.
One of the most crucial problems in communication is 5. The system is notto imposeconditions on its user
that of channeling a given item of information to those which require special training to obtain its services.
who need to know it. Present methods of accomplishing Instead the system is to be operated by experienced
this are inadequate and the general practice is to dissem- library workers. Thus, in the case of an inquiry, the user
inate information rather broadly to be on the safe side. will be required only to call the librarian, who will accept
Since this methodtends to swamp the recipientswith the query and will ask for any amplification which,in
paper, the probability of not communicatingat all be- accordance with his experience, will be most helpful in
comes great. The Business Intelligence System provides securing the desired information.
means for selective dissemination to each of itsaction 6. Similarly, information lingering at an action point but
points in accordance with their current requirements or of potential value to other action points is mobilized
desires. This is accomplished by the mechanical creation for efficient communication through inquiries of skilled
of profiles reflecting the sphere of interest of each point reporters.
and by updating these profiles as dictated by changes in
theattitude of the respective actionpoints and as re- Description of the Business Intelligence System
corded by the system on the basis of certain transactions. The following description is given in rathergeneral terms,
Another problem in communication is to discover the and references to any specific type of business have been
person or section within an organization whose interests substantially avoided. Furthermore, the fact that certain
or activities coincide most closely with a given situation. devices are being referred to as implementation of the
Presently, the difficulty of finding such relationships often system, should not be interpreted as implying a specific
results in improper decisions, wrong actions, inaction, or size of the operation.
duplication. An objective of the Business Intelligence The description is given in accordancewith main func-
System is to identify related interests by use of profiles of tional sections of the system, each illustrated by the dia-
action points. gram. Our assembly of these functional sections into a
The problem of discovering information which has a complete system is shown in Fig. 1.
bearing on a given situation has probably received the
Document input
most attention in recent years, and various mechanical
systems have been developed and put intooperation. This Each document entering the system shown in Fig. 1 is
phase of communication is commonly referred toas infor- assigned a serial number and is photographically repro-
mation retrieval or, more broadly, as the library problem duced on some medium such as microfilm. In those cases
Information retrieval is necessarily a major function of where the document hasbeen addressed specifically to an
the Business Intelligence System. Means are provided not action point, the original is promptly transmitted to the
only to integrate this function with the rest of the system addressee. In all other cases the original is stored in a file
but also to produce additional useful functions, as will be for a reasonably short time and thereafter destroyed, un-
described later. less there are reasons for preserving it for longer periods.
The achievement of these objectives is governed by The microfilm copy of the documentis transcribed onto
principles essential to effective service and convenience of magnetic tape by a human transcriber or a print-reading
the user. Some of these are listed below: device. In those cases where the original document is
1. Information admitted to the system includes communi- available in machine-readable form, the transcription is
cations, addressed to action points individually, which done mechanically. The document is now available both
contain information of potential interest to other action as a microfilm copy and a magnetic tape record.
points. The microfilm copy is then recopied onto the storage
2. New information which is pertinent or useful to cer- medium of a document microcopy storage device. The
tain action points is selectively disseminated to such microfilm record is stored elsewhere to constitute a micro-
points without delay. A function of the system is to pre- film master file which may serve to regenerate records in
sent this information to the action point in such a manner cases of emergency.

IBM J
The magnetic tape record is now introduced into the Initially, the creation of these action-point profiles is
auto-abstracting and encoding device. This device submits best accomplished by having each action point create a
the document to a statistical analysis based on the physi- document describing the various aspects of its activities
cal properties of the text, and data are derived on word and enumerating the types of information needed. Such
frequency and distribution. From these data the device documents are then introduced at the inputof the system
then selects certain sentences of the document to produce and are identified by action-point designation. The ma-
an auto-abstract.2 This is printed out, together with the chine-readable transcripts of thesedocuments are then
title, author, and document serial number. This printout described in connectionwith the document input. The
is photographically transferred onto the storage medium resulting patterns are then stored in the Pattern Storage
of the auto-abstract microcopy storagedevice. area in a special profile-storage device. Also stored, with
The process of creating auto-abstracts consists of ascer- each of these profile patterns, is the date of entry.
taining the frequencyof word occurrences in a document.
Selective dissemination of new information
Apredeterminedportion of the words of highest fre-
quency is then given the status of significant words and Based on the document-input operation and the creation
an analysis is made of all the sentences in the textcontain- of profiles, the system is ready to perform the service
ing such words. A relative valueof sentence significance is function of selective dissemination of new information.
then established by a formula which reflects the number As soon as a new document has been entered into the
of significant words contained in a sentenceand the prox- system and its pattern developed, this pattern is set up in
imity of these words to each other within this sentence. a comparison device which has access to all of the action-
Several sentences which rank highest in value of signifi- point profiles. The comparisons are carried out on the
cance are then extracted from the text to constitute the basis of degree of similarity, expressed in terms of a frac-
auto-abstract. tion, for each of the profile patterns. This fraction is sub-
As soon as the auto-abstract has been created, the sta- ject to changeas time goes on, depending upon conditions
tistical data are furtherprocessed to derive an information to be explained later.
pattern which characterizes the document. This process Whenever a profile agrees to a given extent with agiven
of encoding constitutes a further abstraction and involves document pattern, the serial number, title, and author of
procedures such as the categorization of words by means the affected document, together withthe action-point pro-
of a thesaurus.3 file designation, are transferred and stored in a monitoring
Useful patterns may be derived by listing a given por- device. Thisprocedure is repeated for anysubsequent
tion of the words of highest frequency together with a similar occasion. The monitor is substantially a random-
selection of specific words. The interrelationship of words access storage device and has the functional capabilities
may also be indicated and certain frequently occurring of performing inventory operations. In this capacity it will
combinations of words may be noted. Because of varia- transmit the serial number, title and author of the docu-
tion of word usage amongst authors the normalization of ment in question to the desk printer at theselected action
such words becomes an important function of encoding. point and keep a record of this transaction.
Index lookup in a thesaurus-like dictionary will replace Of the various ways in which such an announcement
words, including those of foreign languages, by a notional may be transmitted to theaffected action points, the most
family designation. The selection of specific words may effective one is by means of a printing device at each
also be accomplished by index lookup. action-pointlocation. An objective of the system is to
The document pattern derived by the above process is command attention of the recipient. The use of individ-
thentransferredinto a special pattern-storage device ual printing devices is more effective than are centrally
together with the title, author, and documentserial num- located devices serving several action points.
ber. Thisinformation is stored in coded formon a
Selective acceptance of disseminated information
medium that may be subjected to serial scanning. As an
alternative the resulting pattern may be rearranged and The dissemination of information so far has consisted in
bedistributedoverastorage arraytopermitrandom furnishing the action point with the serial number, title,
access according tocharacteristics. and author of documents selected for it. This selection,
The tape or film transcript of the document may be however, is considered to be a provisional one, and the
stored in a library for reference if it later becomes neces- system withholds any further information if the action
sary to change the method or scope of encoding. point can determine, on the basis of information given so
far, that certain of the selected subjects are not of suffi-
Action-point profiles cient interest. If an announcementis of interest, and more
As indicated earlier, one of the basic requirements of the detailed information on thesubject is desired, the system
system is the ability to recognize by mechanical means will produce such information on demand. This step is
the sphere of interest and the type of activities that char- initiated when the action point connects itself by tele-
acterize each of the action points the system is to serve. phone to the monitor anddials the serial numbers of the
This is accomplished by means of an information pattern documents affected. Upon receipt of this message the
I 316 similar to that of the documents. monitor will relay an instruction to themicrocopy storage

I IBM JOURNAL OCTOBER 1958


device toproducephotoprints of the auto-abstracts of The resulting query pattern,together with a serial num-
these documents and to mark them with the action-point ber and designation of the originating action point, is then
designation. The auto-abstracts are then transmitted to sent to the queries section of the pattern-storage device.
the action point either in the form of a paper copy or by Subsequently, a copy of this query pattern is set up in the
speedier means, such as Telefax or TV display. comparison device and. is compared with all of the docu-
The action point may now peruse the abstracts to de- ment patternsstoredin thedocument-pattern storage
termine which of thedocumentsare desired intheir device. This operation is similar to the one described in
entirety. These decisions are then entered into the system connection with selective dissemination. In the present
in the form of acceptances. An acceptance is made at an case, the query pattern replaces the profile pattern.
action point by dialing the document number, prefixed Whenever similar patterns are detected by this means,
by a code symbol, whereupon the monitor will instruct the document designation is transmitted to the monitor,
the microcopy storage device to produce a photocopy of where it is registered and then announced to the action
the complete document, properly markedwith the action- point.
point designation. These photocopies are then delivered Although the service of a librarian is considered a con-
to the action point. venience to the action point, in certain cases, means may
The monitorwill record theincidence of acceptance by be provided at the action-point location to permit direct
modifying the affected records contained in its storage. access to the system. This would be justified where many
At the same time the monitorwill also instruct the auto- of the inquiries concern lookup-type retrieval of data.
encoding device to transfer copies of the code patternsof When an action point desires information relative to a
the affected documents to the profile section of pattern given document, the number of the document at hand
storage, together withthe identification of the action point would be dialed and instructions for search given to the
involved and the dateof transferral. monitor. Thereupon the monitor would select the corre-
As a result of these operations the profile of a given sponding pattern from document-patternstorage and pro-
action point has been updated to reflect interest in a cur- vide instruction for use as a query pattern in the ensuing
rently communicated subject. As time goes on thereis the comparison operation.
probability that an increasing number of new documents
Selective acceptance of retrieved information
will be announced to an action point because of possible
shift of interests. Inorder to avoid such cumulative The considerations which prompted the step-by-step ac-
effects, the system is so arranged that theresponse to past ceptance of documents in the dissemination process are
interests is gradually relaxed. This relaxation is related to also applied to information retrieval. The processes em-
the date affixed to each new pattern that is superimposed ployed, therefore, are identical.
on anaction point’s profile. Depending on theage of each The function of information retrieval, however, differs
of these patterns, an adjustment is made on the fraction from thatof dissemination in that thechoice is not that of
of similarity that must be met in the comparison process accepting or rejecting one document, but rather a selec-
of new documents. The older the profile pattern,the tion of one or several from a special group of potentially
closer an agreement is needed €or selection for dissemina- relevant documents. Although in some cases a first search
tion, and consequently the fewer documents are selected. may have produced satisfactory references, in other cases
On the other hand those documents selected are more the material produced may not be satisfactory.The action
closely related to the original subject. point must then relay this fact to the librarian and dis-
cuss with him how the searching procedure or the query
Information retrieval should be modified so as to improve the probability of
This phase of the system concerns itself with the retrieval getting relevant material.
of those stored documents which might be relevant to a In those cases wherepertinentinformationhas been
topic under consideration by an action point. The infor- discovered, the acceptance of the complete documents of
mation tobe discovered may vary widely and may consist such information will cause the updating of the action-
of anythingranging from factual data to an extensive point profile, as was the case in dissemination. The query
bibliography on a broad subject. Under the supervision pattern will be impressed on the profile as a matter of
of an experienced librarian the process of information course, whether or not the inquiry has been satisfied, so
retrieval is performed in the following way. that new documents relevant to the subject of the inquiry
An action point telephones the librarian and states the will be made known subsequently.
information wanted. The librarian will then interpret the
Detection of an action point having
inquiry and will solicit sufficient background information
given characteristics
from the action point in order to provide a document
similar in format to thatof documents normally entering In the process of transacting business it is often desired to
the system. This query document is transmitted to the determine who concerns himself with a given subject. The
auto-encoding device in machine-readable form. An in- usual type of question asked is: “Who does or knows a
formation patternis then derived from the query document certain thing?” A function of the Business Intelligence
31 8 in a manner similar to that used for normal documents. System is to answer questions of this type.

IBM .JOURNAL (
The manner inwhich this function is performed by the Since a history of the usage of the system is stored in
system is similar to the information retrieval procedure. the monitor, an analysis of its records will disclose the
However,instead of simulatinga document pattern, a efficiency of system operation. The findings may serve to
profile pattern is developed which represents mostclosely adjust the system for optimumefficiency.
the characteristics of an action point sought. This syn- There are many details which might have to be pro-
thetic profile is then compared with those in the profile vided to adjust the general form of the system to specific
storage and when a given degree of similarity is discov- applications. One such requirement might be classifica-
ered, the identification of the affected actionpoint is tion, by an editor, of documents with regard to security,
transferred to the monitor, together with the identification proprietary interests and properutilization of information.
of the inquirer point. Thereafter the identities are an- A plurality of systems may be organized in hierarchical
Pounced by the tape-printing device atthe inquiring fashion, in which a first system would serve a number of
lction point so that personal contacts may be made. more specialized systems. In this case the specialized sys-
tem would each assume the role of an action pointin the
Document output
mother system.
The functionsdescribed so far haveconcerned themselves It also appears quitefeasible to share thesystem equip-
with documents admitted or acquired by the system from ment among a number of organizations.
the outside. The document-output phase deals with in-
ternally generated documents. This type of document is Prospects for establishing a
essentially the product of action points and may be ad- Business Intelligence System
dressed tootheraction pointswithin the organization The system described here employs rather advanced de-
or to external points. An objective of the system is to sign techniques and thequestion arises as to how faraway
facilitate selective dissemination and retrieval of such such systems may be from realization. It may therefore
documents in substantially the same way as for outside be of interest to review the state of system and machine
documents. development.
When a document has been created at an action point, The availability of documentsin machine-readable
a copy is produced, preferably in machinable form. This form is a basic requirement of the system. Typewriters
copy is then dispatched for processing to the input point with paper-tape punching attachments are already used
of the system and the original is sent to the addressee. extensively in information processing and communication
Sincethistype of document is an indication of the operations. Their use as standard equipment in the future
interest of the originating action point, the information would provide machine-readable records of new informa-
pattern derived by the auto-encoding process is not only tion. The transcription of old records would pose a prob-
stored in document-pattern storage but also is impressed lem,since in most cases it would be uneconomical to
on the profile of its originator, thereby updating it. perform this job by hand.The mechanization of this
Inthe dissemination process thisinternallycreated operation will thcrefore have to wait until print-reading
document is announced to otheraction points in the same devices have been perfected.
fashion as were outside documents. The type of equipment required for processing infor-
mation in accordance with the system is presently avail-
Miscellaneous functions of the system
able as far as the functions are concerned. It is safe to
The comprehensive system for the various functions so assume that special equipment will eventually be required
far described is illustrated by Fig. 1. A number of addi- to optimize the operation.
tional useful functions which may be derived from the The auto-abstracting and auto-encoding systems are in
system are briefly described here. their early stage of development and a great deal of re-
It mightbedesirable to checkeach new document search has yet to be done to perfect them. Perhaps the
for duplication by comparing it with all of the documents techniques which ultimately find greatest use will bear
instorage.Similarlya list of related documents may little resemblance to those now visualized, but some form
be preparedto serveasreferencesapplying to a new of automation will ultimately provide an effective answer
document. to business intelligence problems.
When retrieving information it might be found advan-
tageous tocompare aquery first with all the queries References
stored, in order to discover whether similar queries have 1. Webster’s New CollegiateDictionary, G. & C . Merriarn
been submittedin the past. If a list of the documents Co., Springfield, Mass.
2. H. P. Luhn. “The Automatic Creation of Literature Ab-
retrieved is available, the process of retrievalmay be
stracts,” IBM Jor~rrzul of ResearchandDevelopment, 2,
greatly simplified. This method may also be used to bring No. 2, 159 (April 1958).
together the respective inquirers to furnish an opportun- 3. H. P. Luhn, ‘*AStatistical Approach to Mechanized Encod-
ity to discuss the problemswhich apparently brought ing and Searching of Literary Information,” ZBM Journal
about similar inquiries. Periodic analysis of the profiles of Research und Development, 1, No. 4, 309 (October
1957).
may also furnish valuable information on trends and pos-
sible overlapping of activities or interests. Received July I , 1958 319

IB M JOURNAL .OCTOBER 1958

Vous aimerez peut-être aussi