Vous êtes sur la page 1sur 36

Open Data Publication and Consumption

An Overview of Relevant Data Access Approaches and


DaaS Solutions
@ESWC Summer School, 2014
Dumitru Roman, SINTEF, Norway
dumitru.roman@sintef.no
Outline
The context: Open Data
Data access: Web APIs, OData, SPARQL/LDP
DaaS solutions landscape and open DaaS architecture
2
Outline
The context: Open Data
Data access: Web APIs, OData, SPARQL/LDP
DaaS solutions landscape and open DaaS architecture
3
The context: Open Data
Open Data Movement: make data available (primarily government
data)
Businesses and citizens can develop new ideas, services and
applications
Can support (government) transparency and accountability
4
Source: McKinsey
http://www.mckinsey.com/insights/business_technology/open_data_unlocking_innovation_a
nd_performance_with_liquid_information
Gartner:
By 2016, the use of "open data" will continue to
increase but slowly, and predominantly limited to
Type A enterprises.
By 2017, over 60% of government open data
programs that do not effectively use open data
internally, will be scaled back or discontinued.
By 2020, enterprises and governments will fail to
protect 75% of sensitive data and will declassify and
grant broad/public access to it.
Source: Garner
http://training.gsn.gov.tw/uploads/news/6.Gartner+ExP+Briefing_Open+Data
_JUN+2014_v2.pdf
Lots of open datasets on the Web
A large number of datasets have been published as open data in the
recent years
Many kinds of data: cultural, science, finance, statistics, transport
environment,
Popular formats: tabular (e.g. CSV, XLS), HTML, XML, JSON,
5
but few applications
Applications utilizing open and distributed datasets have been rather
few, e.g.
Challenges include:
Lack of resources: unreliable data access
Lack of expertise: not easily available to organisations
Technical/organizational
6
Open Data Portal Datasets Applications
data.gov ~ 110 000 ~ 350
publicdata.eu ~ 50 000 ~ 80
data.gov.uk ~ 20 000 ~ 350
data.norge.no ~ 300 ~ 40
Open data publication and access
Data publishers: complicated data publishing and maintenance
process
Data consumers/developers: complicated programmatic data
access
A decision which lifts a data publication burden from a data
publisher will place that burden on the data access for the data
consumer
7
Easy data
publication
Easy data
access
Complicated
data access
Complicated data
publication
Simplify data access! Simplify data publication !
Outline
The context: Open Data
Data access: Web APIs, OData, SPARQL/LDP
DaaS solutions landscape and open DaaS architecture
8
(Programmatic/Web-based) Data access
Traditional approaches for programmatically consuming data: ODBC,
JDBC, RMI, CORBA, ...
Modern Web applications and data services rely extensively on
lightweight Web service based approaches exchanging data via
standard protocols (HTTP) and formats (e.g. XML, JSON, RDF, )
Relevant approaches for programmatic access to open data
Web APIs
OData
SPARQL and Linked Data Platform (LDP)
9
Web APIs
Programmatic interfaces accessible through HTTP calls (e.g. GET,
POST)
Data (requests/responses) typically in JSON or XML
Very popular among application developers
10
Source: http://www.programmableweb.com/
Protocol: HTTP
Payload: JSON/XML/
Data Consumer / Dev
Data Provider
Client
Library
App
Web
Service
Web API
Web APIs - example
11
Request:
GET http://api.yr.no/weatherapi/locationforecast/1.9/?lat=60.10;lon=9.58
Response payload:
http://api.yr.no/weatherapi/locationforecast/1.9/documentation
Open Data Protocol (OData)
ODBC for the Web
A protocol for
creating and
consuming data APIs
Builds on HTTP and
REST
OASIS Standard
(2014), promoted by
Microsoft, IBM, and
SAP
12
http://www.odata.org/
OData
Principles: Metadata, Data, Querying, Editing, Operations,
Vocabularies
The OData Data Model based on the Entity Data Model (EDM)
The OData protocol: CRUD + query language
XML and JSON serialization
Source: Microsoft
http://msdn.microsoft.com/en-us/data/hh237663.aspx
OData - requesting data examples
14
Request (entity by ID):
GET serviceRoot/People('russellwhyte')
Source: http://www.odata.org/getting-started/basic-tutorial/
Response payload:
Request (collections):
GET serviceRoot/People
Request (individual property):
GET serviceRoot/Airports('KSFO')/Name
OData - querying data examples
15
Source: http://www.odata.org/getting-started/basic-tutorial/
Request (filter):
GET serviceRoot/People?$filter=FirstName eq 'Scott'
Response payload:
Filter on complex type:
GET serviceRoot/Airports?$filter=contains(Location/
Address, 'San Francisco')
orderby:
GET serviceRoot/People('scottketchum')/Trips?
$orderby=EndsAt desc
top:
GET serviceRoot/People?$top=2
count:
GET serviceRoot/People/$count
expand:
GET serviceRoot/People('keithpinckney')?$expand=
Friends
select:
GET serviceRoot/Airports?$select=Name, IcaoCode
search:
GET serviceRoot/People?$search=Boise
Lambda Operators: any / all GET serviceRoot/People?$filter=Emails/any(s:endswith(s, 'contoso.com'))
OData - data modification example
16
Source: http://www.odata.org/getting-started/basic-tutorial/
Request (Create an Entity):
POST serviceRoot/People
OData-Version: 4.0
Content-Type:
application/json;odata.metadata=minimal
Accept: application/json
{
"@odata.type" :
"Microsoft.OData.SampleService.Models.TripP
in.Person",
"UserName": "teresa", "FirstName" : "Teresa",
"LastName" : "Gilbert", "Gender" : "Female",
"Emails" : ["teresa@example.com",
"teresa@contoso.com"], "AddressInfo" : [
{ "Address" : "1 Suffolk Ln.", "City" : {
"CountryRegion" : "United States", "Name" :
"Boise", "Region" : "ID }
}] }
Response payload:
Remove an Entity:
DELETE serviceRoot/People('vincentcalabrese')
Update an Entity (uses PATCH or PUT)
Relationship Operations (Link to Related Entities):
POST serviceRoot/People('scottketchum')/Friends/$ref

{
"@odata.id": "serviceRoot/People('vincentcalabrese')"
}
SPARQL
A set of specifications that provide languages and protocols to query
and manipulate RDF graph content on the Web or in an RDF store
17
Service Description
Request:
GET /sparql/
Host: www.example.org
Response: An RDF description,
using the Service Description
vocabulary
Protocol for RDF
Request:
GET /sparql/?query=[SPARQL
Query]
Host: www.example.org
Response: A SPARQL Results
Document or RDF graph
Update Language
PREFIX foaf: <http://xmlns.com/foaf/0.1/> .
INSERT DATA { <http://www.example.org/alice#me>
foaf:knows [ foaf:name "Dorothy" ]. } ;
DELETE { ?person foaf:name ?mbox }
WHERE { <http://www.example.org/alice#me> foaf:knows
?person .
?person foaf:name ?name FILTER ( lang(?name) = "EN"
) .}
Examples taken from http://www.w3.org/TR/sparql11-overview/
Query Language
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?name (COUNT(?friend) AS ?count)
WHERE {
?person foaf:name ?name .
?person foaf:knows ?friend .
} GROUP BY ?person ?name
Result (serialized in XML, JSON, CSV, TSV):
Graph Store HTTP Protocol
POST /rdf-
graphs/service?graph=http%3A%2F%2Fwww.example.org%2Falice
Host: example.org
Content-Type: text/turtle
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
<http://www.example.org/alice#me> foaf:knows [ foaf:name "Dorothy" ] .
http://www.w3.org/TR/sparql11-overview/
Linked Data Platform
Describes the use of HTTP for accessing, updating, creating and
deleting resources from servers that expose data as Linked Data
Centered around LDPRs, LDPCs, membership, containment
Under development at W3C; working draft
18
http://www.w3.org/TR/ldp/
LDP-BC
Request: GET /c1/
Response payload:
Resource
Request: GET /netWorth/nw1
Response payload:
LDP-DC
Request: GET /netWorth/nw1/liabilities/
Response payload:
Examples taken from http://www.w3.org/TR/ldp/
LDP-DC
Request:
Data Access Summary
Web APIs
Very flexible, popular with Web developers, no specific commitment to data
models
OData
ER-based data model, abstract interface to datastores (focus on CRUD),
perceived as vendor-pushed (strong tool support)
SPARQL and LDP
Graph data model, community-pushed, some interesting features (querying,
federation, linking,)
Though there is overlapping between the various approaches, they all aim
to simplify access to distributed data sources for application developers
Which approach to choose depends on many factors, e.g. type of data, size,
relationships, infrastructure, skills to support, frequency of updates, end-use
scenarios,
19
Outline
The context: Open Data
Data access: Web APIs, OData, SPARQL/LDP
DaaS solutions landscape and open DaaS architecture
20
Data publication
Data access mechanisms simplify data consumption for application
developers
But data needs to be provisioned to applications according to the
chosen data access mechanism
And applications will always be dependent on the hosting for the data
they use
Data publishers and application developers need to rely on generic
Cloud platforms and build, deploy and maintain a complex Open
Data software and data stack from scratch
Complicated data provisioning and maintenance process
Data-as-a-Service (DaaS) solutions are emerging to address this issue
21
Like all members of the "as a Service" (XaaS) family, DaaS is based on the concept that the product,
data in this case, can be provided on demand to the user regardless of geographic or
organizational separation of provider and consumer.
Source: Wikipedia; https://en.wikipedia.org/wiki/DaaS
Relevant DaaS solutions
22
Windows Azure
Marketplace
Socrata DataMarket
Factual Junar
PublishMyData
DaPaaS
Windows Azure Marketplace
A marketplace for applications
and data (~170 datasets; ~700
applications)
Charging data consumers
Tools and APIs for data
publishing, analytics, metadata
management, account
management and pricing,
monitoring and billing, as well
as a data portal for dataset
exploration
Supports OData
23
https://datamarket.azure.com/
Source: Microsoft
http://go.microsoft.com/fwlink/?LinkID=201129&clcid=0x409
Socrata
Specific focus on Open Data
Open Data Portal: data publishing &
clean-up, metadata generation, data-
driven portals for data exploration and
portal management
API Foundry for creating and deploying
RESTful APIs on top of the data
Hosted data is accessible through the
Socrata Open Data API (SODA) a
RESTful interface for searching and
reading data in XML, JSON or RDF
24
http://www.socrata.com/
Source: Socrata
DataMarket
Provides statistical data from
almost 100 data providers
~ 71 000 datasets
Supports embeddable
visualisations of data, data
export, live feeds for data
updates, ability for data
publishers to monetize data via
the marketplace, custom data
driven portals for publishers,
data portal, Web API
25
http://datamarket.com/
Factual
Data for ~ 65 million local business and points
of interest in 50 countries; a product database
of over 650,000 products
Used to provide the option for hosting
thousands of 3rd party data sets (Community
Data

) but activity has been discontinued


Data is populated by means of Web crawls,
data extraction and 3
rd
party data services;
data model is tabular, based on taxonomy of
around 400 categories
Pricing is based on a pay-per-use model
Data access is provided through a RESTful API
Provides a set of tools for data management
26
http://www.factual.com/
Junar
Cloud-based Open Data
platform to collect,
enrich, publish and
analyse open data
Data can be consumed
either directly via the
Junar API, or via various
visual widgets
27
http://www.junar.com/
PublishMyData
28
Hosted, as-a-service solution for Open and Linked Data
publishing
Uses DCAT and provides data access via Web APIs, a
SPARQL endpoint and raw data-dumps
http://www.swirrl.com/publishmydata
Other relevant solutions
Comprehensive Knowledge Archive Network (CKAN)
(http://ckan.org/) web-based open source data management system for
the storage and distribution of open data; datahub (http://datahub.io/)
LOD2 (http://lod2.eu/) research project aimed at providing an open
source, integrated software stack for managing the lifecycle of Linked Data,
from data extraction, enrichment, interlinking, to maintenance; not meant
to be as-a-service solution
Project Open Data (http://project-open-data.github.io/) a set of open
source tools, methodologies and use cases for publishing and utilising Open
Data
COMSODE (http://www.comsode.eu/) research project aiming to create
a publication platform for Open Data called Open Data Node
29
DaPaaS towards an Open Data- and
Platfom-as-a-Service for Open Data
DaPaaS research project for simplifying data publication and
consumption via a Data- and Platform-as-a-Service approach
30
http://dapaas.eu
DaPaaS Platform
Data Publisher
End-Users Data Consumer
Application Developer
publishes
open data
develops and deploys
applications on top
published data
consumes data resulting
from the available
applications
DaPaaS Requirements for Data Publisher
31
DP-02: Data
storage and
querying
DP-04: Data
interlinking
DP-03: Dataset
search &
exploration
DP-09: Data availability
DaPaaS Platform
DP-05: Data
cleaning &
transformation
DP-01: Dataset
Import
DP-11: Secure
access to platform
DP-10: User
registration & profile
management
Data
Publisher
DP-08: Data scalability
DP-06: Dataset
bookmarking &
notifications
DP-07: Dataset metadata
management, statistics &
access policies
DP-12: UI for data
publisher
DP-13: Data
publishing
methodology support
DaPaaS Requirements for Application
Developer
32
AD-04:
Configure
application
deployment
AD-01: Access to
Data Publisher
services
(DP-01 DP-13)
AD-03: Develop
applications in state-
of-art programming
languages
AD-05: Deploy
and monitor
application
AD-06: Application
metadata management,
statistics & access policies
DaPaaS Platform
AD-07: UI for
application
developer
AD-08: Application
development methodology
support
AD-02: Data
export
Application
Developer
DaPaaS Requirements for End-Users Data
Consumer
33
DaPaaS Platform
End-User
Data Consumer
EU-03: Datasets and
applications bookmarking
and notifications
EU-01: User
registration & profile
management
EU-02: Search &
explore datasets
and applications
EU-04: Mobile and
desktop GUI access
EU-07: High availability of
data and applications
EU-05: Data export and
download
DaPaaS Platform
Abstract High-Level Architecture
34
Data Layer
UX Layer
UX Services
Open Data
Warehouse
Platform Layer
U
s
a
g
e

M
o
n
i
t
o
r
i
n
g

Application Hosting
Environment
S
e
c
u
r
i
t
y

&

A
c
c
e
s
s

C
o
n
t
r
o
l
T
o
o
l
-
s
u
p
p
o
r
t
e
d

M
e
t
h
o
d
o
l
o
g
y

f
o
r
D
a
t
a

P
u
b
l
i
s
h
i
n
g
/
C
o
n
s
u
m
p
t
i
o
n
DaaS Services
PaaS Services
Datasets
DaaS Services
DaaS Services
Data-Driven
Applications
PaaS Services
PaaS Services
UX Services
UX Services
Summary
Lots of open datasets, but few applications using them
Simplifying data publication/consumption can enable an
increase in the number (and quality) of applications
using open data
Various approaches emerging
For data access: Web APIs, OData, SPARQL/LDP
For data publication/provisioning: DaaS solutions
35
Thank you!
36
Contact: dumitru.roman@sintef.no

Vous aimerez peut-être aussi