Vous êtes sur la page 1sur 27

The Emerging

Framework for Scholarly


Communication
Steve Hitchcock
The Open Citation Project (OpCit), Southampton University

These slides prepared for The Future of Journal Publishing


at Nottingham University, 22 March 2002

OpCit is a joint JISC-NSF


International Digital Libraries Project 1999-2002
Emerging framework: the hypothesis …

Scholarly electronic information


will be ‘seamless’ and ‘integrated’
Scholarly electronic information
will be ‘seamless’ and ‘integrated’
The provable truth, using Google*
• “seamless integration of information”
500 results, mostly companies offering network and inter-application software
• “seamless access to information”
almost 1000 sites, portals and gateways to the fore
• “seamless linking”
450 sites, leading with journal publishers and databases

* Results based on Google searches November 2001


What is “seamless integration”?
From any given document the user might expect to be able to
retrieve any related document within one mouse click.

Typically what is related is defined, and linked, by the author or


publisher or other service provider, and is constrained by the tools
and information services at their disposal.

Longer term the relation may be anything the user might consider to
be related.
Achieving seamless integration –
Web services
Emerging Web services standards are motivated by the need to connect
business processes, especially databases, across the Web. The basic
platform for Web services is XML plus HTTP, maintaining the ubiquity
and simplicity of the Web. Web services are based on three mechanisms:

• to register a service (e.g. Web Service Definition Language, WSDL)


• to find a service (e.g. a registry such as Universal Description,
Discovery, and Integration, UDDI)
• to communicate (e.g. Simple Object Access Protocol, SOAP)
http://www.w3.org/2002/ws/

Digital library architectures are evolving to include Web services-like


components, and may ultimately migrate to these emerging standards
Is seamless integration possible for
the refereed scholarly literature?
For scholarly research papers - those destined for peer reviewed journal
publication, by authors who have no intention of receiving direct payment
for publication for the work they produce - this prospect raises two
subsidiary questions about the ‘seamlessly integrated’ literature:

• Will it be complete (from the viewpoint of every user)?


• Will it be free (or appear to be free)? A work may appear to be free to the
user when it is accessed via a library, for example.

The refereed scholarly literature will need to be complete, everywhere,


if seamless integration, even on a modest scale, is to be achieved.
Progress in libraries
• Site licenses for electronic journals, and more aggregated content
from database services
• Alternative journals, e.g. support for the Scholarly Publishing &
Academic Resources Coalition (SPARC), to increase competition in
the journal market by facilitating partnerships with publishers and
other journal producers
• Open Archives Initiative, interoperability standards to facilitate
the efficient dissemination of content
• Fast-track standardization of OpenURL, to link users to these
subscription and document services, recognising this vast new array
of electronic content would need to be accessible and navigable by
users within the library’s information environment
Site licences
By licencing access to ‘bundled’ collections of e-journals, libraries can
claim to have satisfied their objective of better value for money in
terms of cost per page delivered to users.

The ‘site’ from which users access content could be an institution, a state-
wide group of institutions (e.g. OhioLINK), a national collective, such as
in Canada, or even all the people of a nation, as in Iceland. The UK has
the National Electronic Site Licence Initiative (NESLI), which brokers
deals between publishers and participating institutions.

The OhioLINK strategy: “Enablers rather than gatekeepers”


OhioLINK claims to have overcome “the library-imposed, self-limiting,
collection development mentality of information rationing that pervades
our community.” Thomas Sanville, Executive Director, OhioLINK
Making appropriate connections
Site licenses give libraries access to more journal titles. Another outcome
of the serials crisis is that fewer, non-core journals are subscribed to and
libraries have resorted to just-in-time document delivery and collections
from licensed full-text aggregators.

Library users may thus have authority to access a paper free of charge via
one library subscription or another. This has become know as the
‘appropriate copy’ problem.

OpenURL is a generalized framework for communicating and resolving


links and supports software solutions to the appropriate copy problem.
OpenURL is described as an ‘interoperability specification’.
Syntax of OpenURL
 http://(who you are, where you are, your institution)/(where you want to go)
A B C

(A) An OpenURL is mediated by the HTTP protocol


(B) BASEURL, data about the user, typically inserted during transport
between servers. One interim mechanism is to store the BASEURL as a
cookie in the user’s browser. The cookie identifies the resolver that
provides context-sensitive services for the user.
(C) QUERY, points to the referenced object, which might be an identifier,
e.g.
– Digital Object Identifier (DOI)
– Metadata derived from an authored reference
– Partial metadata - a secondary service identifies the required
document

OpenURL has been proposed as a National Information Standards


Organization (NISO) standard http://library.caltech.edu/openurl/
Example OpenURL architecture

OpenURLs might be based on CrossRef–DOI services


(from Beit-Arie et al., 2001, D-Lib Magazine, September)
http://www.dlib.org/dlib/september01/caplan/09caplan.html
The Open Archives Initiative
(OAI)
The OAI (http://www.openarchives.org/) defines
• A Metadata Harvesting Protocol (MHP), an application-
independent interoperability framework that can be used by a variety
of communities engaged in publishing content on the Web
• Two classes of participants
– Data providers expose metadata about content
– Service providers issue protocol requests to data providers
OAI is a very simple, low-barrier-to-entry interface, shifting
implementation complexity and operational processing load away
from the data repositories to the developers of federated search
services, repository redistribution services, etc.
OAI service providers: an example

The Open Citation project: interposing an OAI service


provider between document (eprints) source and user interface
Creating information interfaces:
portals
We have to manage the underlying complexity in the form of
interfaces. Portals have become important interfaces in the scholarly
environment. Portal strategies

• by publishers (e.g. Elsevier’s ScienceDirect)


• by associated networked information services (e.g. Ingenta),
• by library resource discovery networks (e.g. JISC’s RDN)

have yet to establish a pre-eminent model. This is because all have


concentrated on content, mostly owned content. The best next-
generation portals will build services on top of content, and for
researchers will become the starting point for all lines of enquiry.
Information interfaces: RDN example
JISC RDN is a good
example of building on
content to provide new
services and adaptable
interfaces. The
individual subject
networks, in medicine,
engineering, humanities
and others, can be
searched as though they
were one unified
repository, and an Guiding the implementation of these
interface presenting services is the JISC Information
users with this search Environment (from Powell and Lyon 2001)
facility can be embedded http://www.ukoln.ac.uk/distributed-
in any library Web page. systems/dner/arch/dner-arch.html
Multiple cooperating services in
the communication chain
FROM
Documents User interface
http
Server Client
TO
OpenURL,
OAI,
JISC IE
MEDIATING
CONTENT
Site licenses,
eprint archives,
etc.
Access and interfaces:
implications for journals
Digital information, rich in media and resources, formal and
informal, mediated by multiple services, presents the user with an
array of choices that might answer his or her queries most efficiently.

Those queries might be expressed as input to a search engine, or by


selecting a link. Where might these citations come from? Personal emails,
discussion lists, open access services such as OAI, eprint archives,
newsletters, library services, Z-gateways and academic subject portals, as
well as formal research papers and commercial indexing services. There
will be many more.

The journal package has traditionally been bound in issues and volumes.
With the advent of multiple networked sources mediated by services
such as OpenURL, the binding has been unstitched.
What are digital journals for?
Journals will be scaled back to the single essential function of
quality control, in the form of managed peer review

Access to journal contents will be mediated by multiple interfaces -


open access services, portals and information interfaces, other than just
the journal.

Journals cannot remain the exclusive provider of peer-reviewed papers


A post-Google information
environment
Electronic journals exist in a post-Gutenberg and a post-Google
information environment

By March 2001 the Internet Archive had stored 10 billion Web pages
(100 terabytes of data)

The ability to locate a specified item of information precisely and


instantly among the mass of information available on the Web has
profound implications. In the electronic environment the search
engine has become the de facto interface to information, rather than
the fragmented packages that have migrated from the print world.
Building eprint archives
EPrints.org software for building institutional eprint archives
for author self-archiving
• Version 2.0 February 2002
• OAI-compliant
• Free open source software

Developed at the Electronics and Computer Science Department,


University of Southampton
http://www.eprints.org/
A maximising strategy for
authors
Authors who self-archive their papers in OAI-compliant
institutional or discipline-based eprint archives will

• Maximise interfaces to their work


• Maximise access to their work
• Maximise impact of their work
Maximising access: arXiv example

Decreasing citation latencies: The latency of the citation peak has been reducing
over the period of the archive, i.e. each year papers are cited sooner and more often
Mining the Social Life of an Eprint Archive http://opcit.eprints.org/tdb198/opcit/
Maximising impact: arXiv example

More highly cited papers show higher and more sustained download frequencies
Mining the Social Life of an Eprint Archive http://opcit.eprints.org/tdb198/opcit/
Maximising interfaces
Measuring arXiv access and impact data: the Open Citation project has mined:
• Usage data from selected arXiv mirror server logs
• Reference lists from 155,000+ arXiv papers to build CiteBase, an open
citation database

•CiteBase, a new interface to the refereed literature http://citebase.eprints.org


Initiatives promoting open access
to scholarly research papers
• Budapest Open Access Initiative (BOAI), funded by George Soros'
Open Society Institute. Open access "gives readers extraordinary power
to find and make use of relevant literature, and gives authors and their
works vast and measurable new visibility, readership, and impact.”
February 2002, has received almost 1800 signatories to date
http://www.soros.org/openaccess/read.shtml
• Public Library of Science, scientists urge publishers to allow the
research reports that have appeared in their journals to be distributed
freely by independent, online public libraries of science. Open letter
March 2001, received almost 30 000 signatories
http://www.publiclibraryofscience.org/
“A dynamic digital archive”
Scientists and researchers, Nobel Laureates among them, have
produced the clearest declaration of their requirement for access to
published research papers – a comprehensive collection that can be
efficiently indexed, searched, and linked:

“Unimpeded access to these archives and open distribution of


their contents will enable researchers to take on the challenge of
integrating and interconnecting the fantastically rich, but
extremely fragmented and chaotic, scientific literature.”

Roberts et al. (2001) Science, 23rd March, 2001


http://www.sciencemag.org/cgi/content/full/291/5512/2318a
Credits
The Open Citation project is a collaboration between Southampton
University, Cornell University and arXiv

• The project is lead by Stevan Harnad and Carl Lagoze


• Technical development at Southampton is directed by Les Carr
• EPrints.org software is being developed by Chris Gutteridge
• CiteBase is produced and managed by Tim Brody

A copy of these slides can be found on the OpCit Web site


http://opcit.eprints.org/. Look for Papers and Presentations

Contact Steve Hitchcock: sh94r@ecs.soton.ac.uk

Vous aimerez peut-être aussi