Académique Documents
Professionnel Documents
Culture Documents
Roadmap
Sco,
Thorne
Mary
Weisse
1/26/11
1
Agenda
• Current
state
and
background
• Issues
and
problems
• Approach
• Next
steps
– issues
to
be
addressed
– issues
deferred
1/26/11
2
MIT
Data
Warehouse
-‐
Overview
• Most
administraJve
informaJon
is
available
• Widely
used
and
relied
on
• Recognized
by
peer
insJtuJons
as
one
of
the
best
in
higher
educaJon
• Improvements
can
make
it
more
useful
Examples:
– adding
more
data
– cleaning
up
data
– real
Jme
data
– true
24x7
availability
1/26/11
3
Why
the
Warehouse
was
Created
• Primary
reporJng
plaRorm
– provide
a
common
access
mechanism
for
all
InsJtute
wide
administraJve
data
– easier
access
provided
by
a
simpler
view
of
data
• Support
departmental
reporJng
needs
– ability
to
combine
central
and
local
data
– reduce
rekeying
of
data
• Improve
data
quality
through
usage
• Self
service
– minimize
support
&
operaJonal
costs
– DLC's
don’t
need
to
rely
on
a
central
reporJng
group
1/26/11
4
What
Data
is
Available
• Academic
Chairs
• Fund
data
(Resource
Development)
• Alumni
Biographic,
OccupaJon
&
Degree
• Graduate
Admissions
• AssociaJon
of
Student
AcJviJes
Group,
• Graduate
Awards
Requests
and
Space
• HR
(Employees,
PosiJons,
&
Appointments)
• Benefits
Cyborg
&
SAP
• Budgets
(Nimbus)
• IAP
Credit
&
Non-‐Credit
AcJviJes
• CIP
(ClassificaJon
of
InstrucJonal
Programs)
• Instructor
• Concur
Travel
• InsJtuJonal
Research
-‐
Honors
&
Awards
• Coop
Textbook
• Kerberos
• Course
Catalog
• Keys
• Cybersource
MIT
Storefront
transacJons
• Libraries
(Financials,
Orders,
CirculaJons,
• Department
Master
Hierarchy
(and
other
CollecJon
Management,
Serials
Control
hierarchies)
Processing)
• EHS
Contact,
Room
set,
Sarah,
Training
Data
• Metadata
(Table,
view,
&
field
definiJons,
• FaciliJes/Plant
Maintenance
Orders
data
models;
report
templates)
• Fidelity
–
Benefits
• MIT
Card
–
Pictures
• Financial
• MIT
ID
• Moira
List
data
• Financial
Aid
1/26/11
5
What
Data
is
Available
(cont)
• OSP
Proposals
and
Award
Cost
Sharing
• Telephone
Detail
(ice
9)
• Overhead
Rates
• Student
Term
Enrollment
• Payroll
(DeducJons,
DistribuJons
&
• Undergrad
Admissions
CerJficaJons)
• VacaJon/Absences/Leaves
• Pension
• Web
StaJsJcal
Data
• Person
• Pre-‐registraJon
&
RegistraJon
• Procurement
(Purchasing,
RequisiJon
&
Credit
Card)
• Request
Tracker
• Roles
/AuthorizaJon
• Space
• Student
Biographic
• Student
Degree
• Subject
Enrollment
• Student
Grades
• Training
&
Events
• TLO
1/26/11
6
Context
Diagram
Web
ReporJng
Extract
Transform
Tool
&
Load
DBMS
SQL
Files
Brio
User
Pull
Files
Metadata
Extract
1/26/11
7
Issues
and
Problem
CollecJon
• Who
– Data
Warehouse
Community
Survey
– Task
Force
Reports
– ReporJng
Tool
Review
Commi,ee
– DW
Staff
– Interview
data
users
• InsJtuJonal,
School,
Department
• Central,
Research,
Academic
• What
– Document
• specific
issues
• brief
analysis
– IdenJfy
next
steps
1/26/11
8
Categories
of
Data
Issues
• Warehouse
infrastructure
&
operaJons
• ReporJng
environment
• Missing
data
• Data
access
policy
• Strategic
reporJng
(hard
to
answer
quesJons)
1/26/11
9
Warehouse
Infrastructure
&
OperaJons
Issues
idenJfied
by
the
Warehouse
Staff
• Job
scheduling
– start
Jme
manually
set
– extra
Jme
used
to
pad
– hard
to
maintain
• 7x24
availability
• Congested
nightly
loads
– scheduling
– full/incremental
– a
few
intensive
jobs
1/26/11
10
Warehouse
Infrastructure
&
OperaJons
• Performance
of
large
detailed
tables
as
history
grows
• Managing
the
daily
operaJon
– reduce
the
number
of
non
standard
processes
• Incomplete
DocumentaJon
• Lack
of
Real
Jme
data
feeds
• Lack
a
process
to
build
and
deliver
higher
level
informaJon
1/26/11
11
ReporJng
Environment
Issues
• Data
definiJons
need
improvement
• Need
intuiJve
table
and
field
names
• Report
descripJons
aren't
sufficient
– hard
to
find
the
one
you
need
– report
names
should
be
in
the
footer,
so
once
you
have
paper
you
can
find
the
report
used
to
produce
it
– organize
reports
be,er
– easy
to
idenJfy
reports
that
need
to
be
run
by
DLC
monthly
• Easier
way
to
get
the
data
from
the
Warehouse
and
format
in
Excel
• Easy
way
to
join
data
from
mulJple
domains
1/26/11
12
Data
Missing
from
Warehouse
• OSP
– Pre-‐Proposal
– Award
Budgets
• Property
• TLO
–
Basic
• Alumni
• ISO
informaJon
• Travel
charges
that
haven't
been
submi,ed
on
a
trip
report
• Travel
Guest
Card
charges
not
available
for
DLC's
• Historical
Request
Tracker
InformaJon
• Profit
Center
Group
History
• Building
Data
–
Date
Built
and
Date
Renovated
• Space
Survey
Data
• Building
DepreciaJon
1/26/11
13
Data
Access
Policy
Issues
• Room
usage
type
• Appointment
supplements
• HR
informaJon
for
people
affiliated
with
labs
&
centers
– paid
but
not
appointed
– faculty
not
charged
or
appointed
1/26/11
14
Strategic
ReporJng
(Hard
to
answer
quesJons)
• What
does
it
cost
MIT
to
deliver
its
programs?
– Undergraduate
EducaJon,
Graduate
EducaJon
and
Research
– How
do
costs
vary
by
school
and
program?
– How
do
educaJon
costs
relate
to
tuiJon
charged
(sJcker
price)
and
tuiJon
collected
(net
tuiJon
revenue)?
– How
do
research
costs
relate
to
research
revenues
collected
(direct
and
indirect)?
• What
does
it
cost
MIT
to
own
and
maintain
its
space?
– capital
(interest,
depreciaJon)
– operaJonal
(uJliJes,
repair
and
maintenance
costs)
– How
do
costs
vary
by
building
and
relate
to
market
rates?
1/26/11
15
Strategic
ReporJng
(Hard
to
answer
quesJons)
• How
much
money
has
MIT
received
from
all
sources
(gios,
research
revenues,
other)
in
support
of
major
areas
such
as
Cancer
Research
or
Energy
compared
to
total
resources
used
and
needed?
• What
are
MIT’s
gross
expenses
by
funcJon
before
internal
charges?
1/26/11
16
State
of
Data
PracJces
• Current
sooware
project
development
does
not
always
include
– reporJng
requirements
– conceptual
data
model
– data
definiJons
• Projects
ooen
focus
on
a
transacJonal
orientaJon
– not
what
ways
the
informaJon
might
be
used
in
the
future
1/26/11
17
ReporJng
Commi,ee
Findings
• IBM/COGNOS
Tool
Suite
meets
user
requirements
for
funcJonality
and
look
&
feel.
– Handling
of
local
data
– Web-‐based
soluJon
• Data
issues
will
not
be
resolved
by
a
tool
• Rollout
of
the
tool
will
be
a
mulJyear
effort
• Data
issues
should
be
addressed
as
the
tool
is
rolled
out
• Requires
the
involvement
of
the
Business
Owners
and
Community
1/26/11
18
Approach
• Demonstrate
to
the
community
that
we
are
working
on
all
aspects
of
the
problem
• Create
a
repeatable
process
that
can
applied
per
subject
• Gain
experience
with
the
new
tool
• Begin
engaging
Business
Owners
• Partner
with
Development
Projects
• Make
demonstrable
incremental
progress
1/26/11
19
Areas
of
Focus
• Improving
ReporJng
Infrastructure
• Improving
Development
PracJces
• Improving
Data
PracJces
1/26/11
20
Categories
of
Work
• Day
to
day
operaJons
– upgrading
the
environment
– monitoring
the
loads
– supporJng
minor
changes
• Housekeeping
– Metadata
maintenance
– Warehouse
documentaJon
• New
reporJng
tool
administraJon
• Building
reports
• Strategic
quesJons
• Modeling
• New
informaJon
in
Warehouse
1/26/11
21
Resources
• Team
– 2
Data
Administrators
– 2
Developers
– 2
ReporJng
Consultants
• Project
Teams
• Business
Owners
1/26/11
22
Next
Steps
2011
• Improving
ReporJng
Infrastructure
– migraJon
to
VM
– develop
plan
for
24x7
– reporJng
tool
proof
of
concept
• Improving
Development
PracJces
– iniJal
data
modeling
support
• Improving
Data
PracJces
– document
problems
&
issues
– Improve
metadata
documentaJon
1/26/11
23
Issues
to
be
Addressed
• Tested
in
reporJng
tool
proof
of
concept
– be,er
Excel
integraJon
– easier
report
delivery
• automaJc
noJficaJon
and
delivery
– be,er
report
descripJons
– be,er
report
organizaJon
– simpler
views
of
informaJon
– simpler
joining
of
informaJon
• Short
term
soluJon
for
Singapore
access
1/26/11
24
Work
Deferred
(Post
FY2011)
• Improving
ReporJng
Infrastructure
– real
Jme
data
– 24x7
implementaJon
– Job
scheduling
• Improving
Development
PracJces
– to
be
determined
• Improving
Data
PracJces
– strategic
(Hard
to
answer
quesJons)
– define
and
publish
guidelines
– define
and
publish
roles
&
responsibiliJes
– document
data
access
policies
• New
data
in
Warehouse
– should
be
done
with
new
tool
1/26/11
25
Improving
Understanding
of
Data
• Simpler
more
focused
views
of
informaJon
1/26/11
26
Improving
IS&T
PracJces
• Focus
with
the
business
on
up
front
design
– conceptual
data
model
– business
process
models
– reporJng
requirements
• Metadata
documentaJon
• Document
reconciliaJon
processes
1/26/11
27
Improving
Data
PracJces
• Publish
principles
&
guidelines
for
data
management
• Establish
processes
for
data
issue
resoluJon
and
prioriJzaJon
• Gain
consensus
and
document
roles
&
responsibiliJes
concerning
data
• Develop
a
method
to
monitor
progress
1/26/11
28
TIMELINE
July
2011
Improving
Warehouse
Environment
Job
Scheduling
7x24
Migrate
to
Linux/VM
ParJJoning
29
1/26/11
30
Data
Warehouse
Guiding
Principles
• InformaJon
in
a
single
place
– access
any
informaJon
in
the
same
manner
– integrated
informaJon
• join
informaJon
from
several
sources
• Structures
should
support
the
needs
of
all
types
of
users
– Central,
DLC,
InsJtuJonal
• Open
access
via
SQL
– not
proprietary
only
access
– uncommon
in
educaJon
and
industry
– can
be
accessed
programmaJcally
• Warehouse
is
the
data
distribuJon
hub
– easy
to
verify
when
feeds
and
reports
are
from
a
single
source
1/26/11
31
Data
Warehouse
Guiding
Principles
(CONT)
• Data
should
be
stable
– consistency
between
reports
generated
on
the
same
day
– ability
to
generate
the
same
report
at
any
point
in
Jme
• Data
has
to
be
accurate
– stale
data
is
be,er
than
inaccurate
data
• Structures
should
make
it
easy
to
report
• Data
is
not
corrected
or
modified
in
the
Warehouse
• Access
rules
are
shared
by
the
Warehouse
and
transacJonal
systems
– access
managed
by
Business
Owners
1/26/11
32
Data
Warehouse
Guiding
Principles
(CONT)
• Data
always
comes
from
a
system
of
record
• Users
should
be
able
generate
and
share
reports
• Metadata
is
available
for
all
informaJon
• Access
control
done
at
the
database
level
– enables
direct
access
via
any
means
• Metadata
driven
transformaJons
and
loads
– to
know
how
data
was
arrived
at
– same
code
called
thousands
of
Jmes
every
night
1/26/11
33
Data
Warehouse
Process
Stages
Metadata
Driven
• Extract
– minimal
logic
– full
or
Incremental
files
generated
• Transfer
– usually
pushed
automaJcally
– encrypted/decrypted
– simple
integrity
checking
done
• Convert
– data
into
reporJng
format
and
structures
• star
schemas
• Load
1/26/11
34
Metadata
1/26/11
35
Integrity
Checking
in
all
Processing
• Correct
files
on
hand
before
job
runs
• Record
&
byte
counts
• Comparisons
of
control
file
to
data
file
• Conversion
type
checking,
number
and
types
of
fields
• DBMS
constraints
• Error
checking
and
validaJon
rouJnes
• Mail
sent
to
Warehouse
&
Business
Owners
1/26/11
36
System
Availability
• Approaches
24
x
7
– except
Saturdays
from
3:00
pm
–
3:10
pm
• snapshot
– some
tables
are
temporarily
unavailable
in
the
early
morning
as
new
day’s
data
is
loaded
1/26/11
37
StaJsJcs
Jobs
per
night
~300
Input
files
per
night
875
Field
transformaJons
12,847
DisJnct
funcJons
78
Tables
1,522
Views
1,150
Records
1,904,948,496
Total
database
size
575g
1/26/11
38
StaJsJcs
Fiscal
Year
2011
Quarter
2
(October1
2010
–
December
31,2010)
1/26/11
39
StaJsJcs
DW
User
Community:
Sessions
by
Department
Environment,
Health
&
Safety
Office
(400+
Sessions)
FY
2011
Q2
HQ,
InformaJon
Services
&
Technology
492
451
547
730
483
441
Others
(research
affiliates
and
scienJsts,
postdoctoral
615
808
753
fellows,
visiJng
faculty/students/scholars,
non-‐current
840
employees,
etc.)
Center
for
InternaJonal
Studies
965
902
1151
Human
Resources
1299
2092
Sloan
School
of
Management
2201
InformaJon
Services
&
Technology
10490
Budget,
Finance
and
Treasury
5413
1/26/11
42
Space
Long
Term
Plan
• Work
with
FaciliJes
and
others
to
agree
on
guidelines
for
integraJng
locaJon
informaJon
– building
and
room
idenJfiers
– geocoding
• guidelines
for
tracking
geocode
for
things
like
equipment
• Work
to
gain
consensus
and
document
access
policies
concerning
space
informaJon
• Ensure
that
space
survey
data
it
in
the
Warehouse
• Encourage
reporJng
of
space
informaJon
from
the
Warehouse
where
appropriate
– possible
pilot
web
reporJng
• Bring
building
depreciaJon
informaJon
into
the
Warehouse
– create
an
integrate
view
of
all
space
costs
in
the
Warehouse
1/26/11
43
HR
Data
Issues
• Faculty
promoJon
and
Tenure
dates
have
to
be
cleaned
up
by
hand
– SAP
&
business
policy
does
not
support
correcJon
of
some
informaJon
• CIP
(Academic
Discipline)
a,ached
to
All
Academic
Staff
• Race
&
Ethnicity
not
consistent
over
Jme
– self
idenJfied
ethnicity
needs
to
be
managed
and
synchronized
with
InsJtuJonally
idenJfied
• Historical
data
is
inaccurate
because
old
records
are
changes
without
the
modified
date
and
effecJve
dates
properly
maintained
• Many
places
where
more
than
one
affiliaJon
occurs
but
always
need
one
marked
"Primary"
to
support
non-‐duplicated
counts
• HR
data
not
accurate.
– How
many
researchers
have
PHD?
• Many
things
could
be
calculated
in
a
standard
way
to
make
it
easier
and
reduce
discrepancies.
– For
example:
years
at
MIT,
could
be
provided
in
addiJon
to
the
appointment
dates.
• There
seem
to
be
different
tallies
for
the
Annual
Head
Count
Report
(October
31st)
which
cause
confusion.
• Inaccuracies
in
Faculty
appointment
dates
• Inaccuracies
in
Faculty
Leave
informaJon
• Ethnicity
&
Race
not
maintained
accurately
• How
many
FTE
by
FY
for
a
DLC
for
the
last
10
years
1/26/11
44
Finance
Data
Issues
• Hard
to
maintain
CEMIT
Groups
– can't
tell
if
they're
used
• so
hesitant
to
delete
them
– Need
to
verify
that
they
are
all
correct
each
month
• maybe
an
exclusion
list
would
help
(to
see
if
the
right
gl_accounts
are
excluded
and
only
those)
• mistakes
lead
to
reporJng
errors
• Difficult
to
separate
out
internal
charging
and
transfers
– some
specific
GL_Accounts
for
internal
that
can
be
excluded
for
this
view
– could
use
SI
transacJons
instead
of
SA,
but
no
way
to
ensure
people
do
that
correctly
– no
way
to
enforce
integrity
rule
in
SAP
• Many
Document
Types
not
used,
but
not
filtered
out
of
list
• Sponsor
Billing
done
at
Parent
WBS
level,
so
children
look
in
deficit
– no
common
definiJon
of
what
"deficit"
accounts
are
• GL_Account_Report
is
confusing.
• Travel
charges
that
haven't
been
submi,ed
on
a
trip
report
can't
be
reported
on
• Guest
Card
charges
not
available
for
DLCs
• Sponsor
Billing
done
at
Parent
WBS
level,
so
children
look
in
deficit
No
common
definiJon
of
what
"deficit"
accounts
are
– billing
lag,
means
DLCs
and
Central
might
have
different
views
and
definiJons
• Hard
to
see
if
changes
in
a
Person's
status
from
ON/OFF
Campus
have
been
made
before
the
posJngs
• Need
Profit
Center
Group
History
so
that
we
can
recreate
old
reports
and
understand
the
differences
in
numbers
1/26/11
45
Student
System
Issues
• What
is
the
Cost
of
EducaJon?
– By
Department
and
Program
• per
Student
– How
much
TuiJon
is
Charged?
• What
is
waived?
– How
much
of
TuiJon
is
paid?
• by
the
Student
• by
a
Sponsor
• Don't
have
accurate
info
on
Grads.
– Where
did
they
get
their
undergrad
degree?
– When
did
a
student
enter
in
which
degree
program?
– They
have
Grad
entry
term,
but
not
cohort.
– Are
they
fully
supported
(external
sources
hard)?
1/26/11 46