Vous êtes sur la page 1sur 53

Draft Proposal

Ph.D. in Analytics and Data Science


Last Updated: September 4, 2013
DRAFT PROPOSAL Ph.D. in Analytics and Data Science Page 2

Institution: Kennesaw State University

Institutional Contact: Daniel Papp, President

Date: XXXX

School/Division: School of Science and Mathematics

Department: Department of Mathematics and Statistics

Departmental Contact: Jennifer Lewis Priestley

Name of Proposed Program: Ph.D. in Analytics and Data Science

Degree: Ph.D.

CIP Code: XXXX
DRAFT PROPOSAL Ph.D. in Analytics and Data Science Page 3

Executive Summary
The McKinsey Global Institute has identified that the demand for deep analytical talent
will outpace the supply in the United States by almost 200,000 people within three
years. In response, the White House has launched a Big Data Research and
Development Initiative, to expand the workforce needed to develop and use Big Data
technologies. This theme is echoed by Thomas Davenports recent article in Harvard
Business Review titled Data Scientist: The Sexiest Job in the 21
st
Century.
These studies and many others point to the need for universities to educate and train
Data Scientists to address this demand. However, no university in the country
currently has a degree program in Data Science defined as the intersection of
Statistics, Mathematics and Computer Science.
Kennesaw State University is proposing a Ph.D. in Analytics Data Science one of the
first of its kind in the country.
The degree will train individuals to translate large, unstructured, complex data into
information to improve decision making. This curriculum will include programming,
data mining, statistical modeling, and the mathematical foundations to support these
concepts. Importantly, it will also emphasize communication skills both oral and
written as well as application and tying results to business and research problems.
Because this degree is a Ph.D. (rather than a Doctorate in Data Science), it creates
flexibility for the student. Graduates can either pursue a position in the private or
public sector as a practicing Data Scientist where the demand is expected to greatly
outpace the supply or pursue a position within academia, where they would be
uniquely qualified to teach these skills to the next generation.
Kennesaw State University is well positioned to launch this degree. This is evidenced
by the unparalleled success of the MS in Applied Statistics where graduates are in
great demand and continue to have 100% placement and by the Minor in Applied
Statistics with undergraduate demand from every college across the university, and
similarly strong placement. The Minor attracts approximately 200 undergraduates
every semester making it the most successful and sought out Minor in the history of
KSU.
The Ph.D. in Analytics and Data Science will not only help to close the talent gap in the
area of Data Science, but will also continue KSUs trajectory of regional and national
recognition in the area of applied analytics.
DRAFT PROPOSAL Ph.D. in Analytics and Data Science Page 4

There are three ways you can get to the top of a tree: 1) sit on an acorn 2) make friends with a
bird 3) climb it.Anonymous
SECTION 1. PROGRAM DESCRIPTION AND OBJECTIVES:
a. Objectives of the program
The Ph.D. in Analytics and Data Science at Kennesaw State University is an advanced
degree, which will prepare individuals to work as Data Scientists in a private or public
sector capacity or, secondarily, to work in academia within a department focused on the
application of data analysis.
This Ph.D. will utilize a multidisciplinary approach, with emphasis on Statistics,
Mathematics, Computer Science and a content discipline such as Biology, Chemistry,
Finance, Physics, Political Science, etc.

b. Needs the program will meet
I skate to where the puck is going to be, not where it has been. Wayne Gretzky
The United States Federal Government recently issued a press release addressing what
it sees as a growing critical shortage of data analysts and, on March 29, 2012, issued the
Big Data Research and Development Initiative. One of the main purposes of the
initiative is to expand the workforce needed to develop and use Big Data technologies.
The term Big Data is increasingly included within descriptions of required skill sets
across a wide variety of disciplines and sectors of the economy. While the accepted
definition of Big Data is continuing to evolve, there is no question about the expansion
and prevalence of related concepts and their expanded role in the future.
According to The Economist magazine, unmanned American military aircraft (i.e.,
drone aircraft) flying over Iraq and Afghanistan in a single year (2009) produced
approximately 24 years worth of video surveillance footage. Every year, Google
acquires an equivalent amount of data to the entire Library of Congress.


DRAFT PROPOSAL Ph.D. in Analytics and Data Science Page 5

These astonishing facts highlight at least four major points about how data is collected,
analyzed, and used:
1. Extraordinary, previously unimaginable amounts of data are being collected and
stored for subsequent analysis, which contain potentially significant and
meaningful information in the private and public sectors and to society at large.
2. It is not feasible to manually review and/or analyze such massive data in a timely
manner using traditional methods. Computer-assisted semi- or fully-automated
processes using new computational and data mining methods are needed in
order to extract useful information from massive data sources in a timely
manner.
3. In addition to massive amounts of traditional structured data (i.e., tabular data),
extraordinary amounts of unstructured, non-traditional data such as video
footage, audio recordings, and unstructured text are being collected and stored.
Increasingly, these two very different types of data must be merged together in
systematic ways in order to obtain useful information.
4. Unlike the past, data collection and analysis is no longer a purely academic
endeavor. Data gathering and analysis for obtaining useful information most
often used in decision making processes is used in almost every field and sector
imaginable at present including the sciences, public health, the healthcare
industry, all aspects of business and finance (including retail, insurance,
marketing, the service industry, the credit industry, fraud detection, the
communications industry, etc.), psychology, education, public policy agencies,
government elections, and critically, in national security and defense.
From these four points it follows that:
The next generation of statisticians will face very different challenges and issues than
previous generations of statisticians. As a result, the next generation of statisticians
requires a new set of knowledge and skills in order to effectively serve the data analysis
needs of the 21
st
century. These skills will incorporate more emphasis on applied
mathematics and on computer programming than has historically been the case even
for applied statisticians.



DRAFT PROPOSAL Ph.D. in Analytics and Data Science Page 6

A recent article in Significance magazine made this point clear:
Traditional data is numbers. (Emerging data) is not. It is digital, but it is generated by all kinds
of hardware and software. It is text, it is videos, it is tweets and Facebook pages; it is about
transactions and interconnections by the billions.
Data is the raw material of statistics, but traditional statistical disciplines will not cope; this new
data needs new ways of handling it, of analyzing it, of thinking about it and using it.
A large number of new and well-trained 21
st
century statisticians are needed in every
sector of almost every society in the world, not just America, and not just developed
countries.
The U.S. business community is also aware of this need: Hal Varian, Ph.D., the Chief
Economist at Google, Inc. states simply, Data are available; what is scarce is the ability to
extract wisdom from them.
Further to the recognition of the talent shortage evidenced through the White House
Big Data Research and Development Initiative, the Big Data Report from the
McKinsey Global Institute (MGI) estimates that the demand for data analysts could
exceed the current supply by 140,000 to 190,000 positions by the year 2018 (see Figure
1). Figure 1 illustrates that there are 440,000 to 490,000 total data analyst job positions
projected for 2018 with only 300,000 trained analyst to fill those positions. In other
words, the demand for big data analysts could be 50 to 60% greater than its projected
supply by 2018.








DRAFT PROPOSAL Ph.D. in Analytics and Data Science Page 7

FIGURE 1: The Talent Gap for Big Data Analysts


The Big Data MGI report also predicts differential gains as a result of the impact of big
data and its use across different sectors. According to MGI, finance and government
(Cluster B in Figure 2) are expected to benefit strongly from big data use in the future
where computer and electronic products and information sectors (Cluster A in Figure 2)
have already and will continue to experience substantial benefits from the impact and
use of big data.






DRAFT PROPOSAL Ph.D. in Analytics and Data Science Page 8

FIGURE 2: Differential Potential Gains of Big Data by Sector

A brief survey of the diverse disciplines which have recognized the role of Big Data and
the changing role of analytics includes:
Business
Customer relations management (CRM) is one of the most innovative and profitable
ways in which businesses use big data. CRM is essentially the business practice of
analyzing customer-centric big data to discover trends and use that information to
customize or personalize offers and communications with customers to optimize
business. CRM was once used only by Fortune 500 companies, however, now with the
proliferation of big data and reduced costs in collecting and storing it, all types of
companies are using it to optimize their business. In one example of a typical CRM
application, a U.S. bank used big data analytics to predict which product offer was most
likely to be accepted by a particular customer and thereby customize the next on-line
product offered to that customer in an effort to cross-sell to existing customers (Berry &
Linoff, 2000). This CRM initiative resulted in substantial gains in cross-selling and
therefore profits to the bank well above the cost of implementation. This is just one of
many examples of big data analytics in business. Others include fraud identification,
DRAFT PROPOSAL Ph.D. in Analytics and Data Science Page 9

service rate estimation, predicting product failure, and optimizing direct mailing
campaigns, among others. By all accounts, the main hindrance in CRM is a lack
qualified data analyst (The Economist, 2010; Significance, 2012).

Healthcare & Public Health
The proper use of digitized medical records has the potential of revolutionizing the
healthcare industry. Proper analysis of these records may be used to detect unwanted
drug interactions and/or side-effects, identify best practices in care (e.g., identify the
most effective drug therapies), and even predict the onset of certain diseases before
patients themselves are aware of symptoms (The Economist, 2012). In one example,
medical doctors and data analysts in Alabama developed automated infection
surveillance software that assists hospitals in identifying changes in nosocomial
infection (i.e., hospital-acquired infection) rates using massive data from Blue
Cross/Blue Shield of Alabama and statistical and data mining methods (Putman, 2003).
It has been estimated that nosocomial infections add as much as nine days to a patients
hospital stay leading to more than a $4 million per year additional expense. This
infection surveillance software provides early warning to hospitals and allows them to
intervene in a timely manner. This is only one of many possible examples where non-
traditional statistical work involving big data has made a substantial improvement in
healthcare quality and substantial savings to society.

Government
According to the National Science Foundation (NSF, 2012) in the document entitled,
Core Techniques and Technologies for Advancing Big Data Science & Engineering
(NSF 12-499), the impact of big data is causing a literal paradigm shift in scientific and
biomedical investigation that is transforming the missions of a number of U.S. Federal
Government agencies:
Today, US government agencies recognize that the scientific, biomedical and
engineering research communities are undergoing a profound transformation with the
use of large-scale, diverse, and high-resolution data sets that allow for data-intensive
decision-making, including clinical decision making, at a level never before imagined.
New statistical and mathematical algorithms, prediction techniques, and modeling
methods, as well as multidisciplinary approaches to data collection, data analysis and
DRAFT PROPOSAL Ph.D. in Analytics and Data Science Page 10

new technologies for sharing data and information are enabling a paradigm shift in
scientific and biomedical investigation. Advances in machine learning, data mining,
and visualization are enabling new ways of extracting useful information in a timely
fashion from massive data sets, which complement and extend existing methods of
hypothesis testing and statistical inference. As a result, a number of agencies are
developing big data strategies to align with their missions.

These examples and countless others highlight three common emerging themes:
1. Big Data is ubiquitous. All disciplines. All sectors of the economy.
2. Data is no longer considered a necessary cost to be managed down, but rather as an
asset to be mined and leveraged.
3. All sectors are increasingly finding a dearth of analytical talent to support their
nascent, but explosive analytical needs, particularly as it is related to Big Data.
In response to this, Kennesaw State University is proposing the development of a Ph.D.
in Analytics and Data Science. It is our position that the Data Scientist will be uniquely
positioned to fill the talent shortage as outlined above.
It is critical to note that this is a proposal for a Ph.D. program in Analytics and Data
Science rather than in Statistics.
A great deal of attention is emerging in the field of analytics towards the role of the
Data Scientist
From IBM - A data scientist represents an evolution from the business or data analyst role. The
formal training is similar, with a solid foundation typically in computer science and
applications, modeling, statistics, analytics and math. What sets the data scientist apart is strong
application acumen, coupled with the ability to communicate findingsin a way that can
influence how an organization approaches a business challenge.

From Thomas Davenport, Senior Managing Partner at Accenture and author of
Competing on Analytics (Data Scientists) are not typical scientistsbut rather hybrids of
science and computation. Somewhere along their career journeys they became interested in, and
good at, the manipulation of data. In fact, many of them really have computational in front of
their scientific specialties: computational biology, computational ecology, etc. If you want some
DRAFT PROPOSAL Ph.D. in Analytics and Data Science Page 11

evidence of this hybrid specialization, look at your favorite data scientists profile on LinkedIn --
the home, by the way, of some of the best data scientists around -- and check out the skills they
say they have. Youll see analytics (quantitative analysis, statistical modeling, predictive
analytics, social network analysis, data mining, etc.) listed, of course. But you are also likely to
see SQL, Java, C, Python, R, distributed databases, and so forth. All of these skills actually are
found in one individual, and he seems typical of the breedto my knowledge, no universities
have programs yet in big data analytics (though some are talking about them -- universities
typically dont move too hastily).
From Daniel Tunkelang, Chief Data Scientist at LinkedIn Strong analytical skills are a
given: above all, a data scientist needs to be able to derive robust conclusions from data. But a
data scientist also needs to possess creativity and strong communication skills. Creativity drives
the process of hypothesis generation, i.e., picking the right problems to solve, the will to create
value for users and the drive to improve business decisions. Communication is essential, because
data scientists work in horizontal roles and partner with groups across the entire organization.
At LinkedIn, data scientists collaborate with every other product group, as well as with sales and
finance. Strong communication skills are a must-have.
From Steve Hillion, VP of Analytics at GreenPlum, as quoted in Forbes - Im sure in 30
years time, there will be lots and lots of degrees in data science and thats where [data scientists
will] come from, but right now its coming from all these different buckets (math, computer
science, economics)And, just as the early days of computing were born in the garages of
Silicon Valley do-it-yourself-ers, data science is likely to develop first in an ad-hoc, hands-on
way.
It is our position that the intersection of skills outlined in multiple ways above, are
brought together under the description of Data Scientist, in a way that does not occur
in a traditional Statistics curriculum. This term has emerged as the moniker of an
individual with strong computational and programming skills, but also possessing
business/content acumen, enabling clear and meaningful communication. As can be
seen below, the term data scientist is emerging as a dominant search term in Job
Search engines.



DRAFT PROPOSAL Ph.D. in Analytics and Data Science Page 12

FIGURE 3: Job Trends in Data Science

From Michael Rappa, Director of the Institute for Advanced Analytics at NC State
The future of data science in the enterprise will be extremely bright, if a few key things happen:
First, the right kinds of partnerships must be formed between data-rich companies and forward-
thinking academic institutions. Second, institutions and employers need to encourage and
reward the right set of data-science skills.
Are statisticians going away? No. There will always be a need for traditional Statistics.
Disciplines such as psychology, nursing, marketing research, medical research, etc., will
always have a need for the traditional skills associated with hypothesis testing and
model development.
Data Scientists are different. They embody skills which traditional statisticians dont
have. While data scientists must have strong skills in statistical testing and modeling,
they are also strong in computational mathematics, data architecture, the process of ETL
(extract, transport, load), programming (i.e., SAS, Java, C++, Hadoop), and typically
have some content knowledge (i.e., Chemistry, Biology, Finance).
The proposed Ph.D. in Analytics and Data Science at Kennesaw State University
directly meets the national talent shortage in this space, as evidenced by movements
such as the Big Data Research and Development Initiative of 2012, by effectively and
thoroughly training and thereby expanding the workforce available to develop and use
Big Data technologies.
DRAFT PROPOSAL Ph.D. in Analytics and Data Science Page 13

Furthermore, this degree program would transform teaching and learning in the field of
Big Data technology, another major objective of the White House Initiative.
Consequently, we believe the degree will directly and/or indirectly accelerate the pace
of discovery in science and engineering used to further understanding and knowledge,
strengthen U.S. national security, and increase the quality of life for the average
American citizen.
With respect to the shortage of big data analysts and their training, The Big Data MGI
report states, we believe that the constraint on this type of talent will be global, with the
caveat that some regions may be able to produce the supply that can fill talent gaps in other
regions.
It is our strongly held position that we can make Georgia one of these key regions
which produces Big Data analytical leadership for the world with this proposed Ph.D.
degree. A Ph.D. program in Analytics and Data Science at Kennesaw State University
has the potential of defining Kennesaw State University, the University System of
Georgia, and the State of Georgia, as cutting-edge, state-of-the-art innovators in the
methods and technologies that will shape and see us through the 21
st
century.
Sowhy do we think KSU can deliver a Ph.D. in Analytics and Data Science?
The main reason is that we are already moving in that direction.
This is evidenced through the successes of both the Minor in Applied Statistics and
Data Analysis as well as in the Master of Science in Applied Statistics.
The Minor in Applied Statistics, more than any other Minor field of study in the history
of KSU, is a flagship of interdisciplinary success. Students are required to complete 15
hours (five courses) in Statistics at the 3000 level or above to qualify for a Minor in
Applied Statistics and Data Analysis. In any given semester, the Minor serves the needs
of over 200 students from almost every college across the university.
Statistics represents the most diverse cross section of majors in 3000 or 4000 level
courses, of any course of study. Where most upper division courses are populated by
students from a single major, in the statistics courses (all STAT courses are above 3000),
the classes are consistently populated with students from Biology and Chemistry,
Finance and Economics, Psychology, Mathematics, Sociologyand even Theater (see
Figure 4).

DRAFT PROPOSAL Ph.D. in Analytics and Data Science Page 14



Why do undergraduates at KSU seek out a series of five upper division electives in
Statistics? We believe that there are three primary reasons that have created this
demand:
1. Statistics at KSU includes an inherently interdisciplinary faculty the same
faculty which will power the Ph.D. in Analytics and Data Science. Most of the
Statistics faculty has had experience in the private sector, including Ford Motor
Company, The Childrens Hospital of Cincinnati, The Cancer Center at MD
Andersen in Houston, TX, MasterCard International, VISA EU (London),
AT&T/BellSouth (Brazil), Thompson Reuters, The Southern Company and
ChoicePoint. Most students can find someone with an application of Statistics
outside the classroom, aligned with their career aspirations. These experiences
are brought into the classroom and students respond.

0 10 20 30 40 50 60 70 80 90
Exercise & Health Science
Computer Science
Criminal Justice
Environmental Studies
Interdisciplinary Studies
Math Education
Political Science
Theatre
Business
Finance
Geographic Info Sciences
Information Systems
Marketing
Sociology
Accounting
Communications
Exercise & Health Sciences
Nursing
International Affairs
Biochemistry
Chemistry
Mathematics
Biology
Psychology
FIGURE 4: Distribution of Minors in Applied Statistics and Data Analysis
by Declared Major (Fall 2012)
DRAFT PROPOSAL Ph.D. in Analytics and Data Science Page 15

2. Statistics is the process through which data is converted into meaningful
information to support decision making. But, as outlined above, while data is
increasingly ubiquitous and cheap and easy to capture and store, it is difficult to
translate. Students recognize that whether they are studying Finance or
Psychology, Biology or Political Science, they will have to understand how to
translate data into information. This knowledge is no longer a differentiator but
rather an ante to play. Since all disciplines work with data, in some form, all
disciplines of study need to have some integration of Statistics for their graduates
to be marketable.

3. Jobs. Jobs. Jobs. Students are increasingly turning to Statistics as a great way to
position themselves in the marketplace. Undergraduates with Minors in
statistics are having great success with job placement after graduation. Statistics
students from KSU are recruited for positions across a wide variety of companies
including The Home Depot, Coca Cola, The Southern Company, Link Analytics,
Aspen Marketing Services, The Internal Revenue Service, Ultimate Software,
IBM, Assurant, Compucredit, The CDC, Equifax.

The Masters of Science in Applied Statistics has a similar story. Since the launch of the
degree in 2006, very few of the applicants have had undergraduate degrees in Statistics.
MSAS applicants come from Engineering, Business, Medicine, and Education. A
defining characteristic of the MSAS program is its fluid alignment with the needs of the
market the faculty works with their Advisory Board to consistently pivot the
curriculum to be aligned with the market. As a result, the MSAS is proud of its effective
0% unemployment rate amongst students without work restrictions.
Statistics emerged as a unique discipline at KSU in Fall of 2006 all of this success has
occurred in less than 6 years. In an effort to ensure limited duplication with other
successful initiatives in Statistics within the University System, such as the programs at
the University of Georgia and at Georgia Tech, KSU pursued a strongly applied
orientation, meaning that course materials were focused on leveraging faculty
experience outside the classroom, and applying Statistics the way practitioners apply
Statistics. From the beginning KSU elected to have less emphasis on theoretical
Statistics. This last point meant that KSU would have to have strong integration of
statistical software into the curriculum. In response, we looked to the dominant
software/language in the marketplace. This was, without question, SAS, which is used
DRAFT PROPOSAL Ph.D. in Analytics and Data Science Page 16

by 95% of the Fortune 500 including all of the top companies in our regional footprint.
As a result, all of our students, both at the undergraduate and graduate levels, learn
strong SAS programming skills as a complement to their statistics skills.
It is this dimension of programming skills, combined with a strong mathematical
foundation, and deep and broad instruction in statistical modeling which has already
well positioned the program to offer a Ph.D. in Analytics and Data Science.

c. Brief explanation of how the program is to be delivered
Opportunities are usually disguised as hard work, so most people don't recognize them. Ann
Landers
While additional resources will be required to make the Ph.D. in Analytics and Data
Science successful (see Section 4 below), much of the basic delivery infrastructure is in
place.
The general structure of the program will include three stages:



Stage 1: Course Work
If you only have a hammer, you tend to see every problem as a nail. Abraham Harold
Maslow
The Ph.D. in Analytics and Data Science will begin with 48 hours of core course
work/instruction, spread over (expected) four years of study, plus six hours of electives
and 24 (minimum) hours of dissertation and internship (78 total hours). In response to
the market needs and skill gaps as outlined above, the Ph.D. in Analytics and Data
Science will have a strong interdisciplinary and application orientation. Generally,
coursework will be structured as provided in Figure 5.

Coursework Application Research
DRAFT PROPOSAL Ph.D. in Analytics and Data Science Page 17


A full listing of the proposed courses, and a sample program of study can be found in
SECTION 5 below.
The logic supporting this interdisciplinary approach is that the curriculum would be
aligned with the needs of the marketplace as evidenced in Section 1b above.
Students will be required to complete a comprehensive examination of their course
materials before they are considered to have completed this stage. The comprehensive
examination will cover materials from all of the three areas of study listed above.

Stage 2: Application
The Ph.D. in Analytics and Data Science is, at its core, an applied program. Ph.D.
students would be required to engage in one year for a total of 15 credit hours of
application. This application will take one of two forms.
The first form of application is private or public sector work experience. The Statistical
Advisory Board has agreed, in principle, to hire Ph.D. students on a contract basis for
a minimum of one year, after they have completed their coursework but prior to
completing their dissertation.


Mathematics
15%
Computer
Science
20%
Statistics
35%
"Content
Application"
30%
FIGURE 5: Proposed Approximate Distribution of
Credit Hours for the Ph.D. in Analytics and Data
Science
DRAFT PROPOSAL Ph.D. in Analytics and Data Science Page 18

This would accomplish three objectives:
The hiring firm would cover one year (minimum) of doctoral student stipend
($25 - $30K).
The Ph.D. student would apply concepts and skills learned in the classroom in a
real environment.
The experience has the potential to become a source of dissertation research.
This integration with the companies represented by the Advisory Board also represents
an important endorsement of the proposed program, as well as an extension of the
engagement with the business community which has been the trademark of the
Statistics programs to date.
Letters of Intent and Support from the Statistical Advisory Board can be found in
Section A.1.
The second form would be to engage in applied research/scholarship projects with KSU
faculty. Examples would include analytical work in the areas of Finance, Chemistry,
Economics, Biology, Marketing, etc. It would be the intention that this second form of
application would take place within the context of a grant, which would provide the
funding for the student, as well as provide much needed analytical support for faculty
engaged in grant-funded scholarship and research.
It would be expected that either form of these 15 hours of application would segue into
the third stage of the program dissertation research.

Stage 3: Dissertation Research
A Ph.D. in Analytics and Data Science would require a formal Dissertation process,
involving an interdisciplinary committee, comprised of faculty from Statistics,
Computer Science, and Mathematics. Depending upon the application path pursued
above, a faculty member from a content discipline (e.g., Marketing, Finance,
Chemistry, Biology, Economics) would be included and possibly an external committee
member as appropriate.


DRAFT PROPOSAL Ph.D. in Analytics and Data Science Page 19

d. Prioritization within the institutions strategic plan
The Vision statement for the university:
Kennesaw State University will be a nationally prominent university recognized for excellence
in education, engagement, and innovation.
The Ph.D. program in Analytics and Data Science supports all three of these sources of
recognition.
As evidenced above, there is an unquestioned recognition of a current and emerging
talent shortage in the areas of Analytics and Data Science. In addition, no university in
the country has an advanced degree/curriculum which formally addresses the specific
skill sets defining the Data Scientist. KSU has the opportunity to establish the model
for such a degree. In addition, the Program is truly engaged engaged across
academic disciplines, as well as engaged with public and private sector partners who
support this initiative.
DRAFT PROPOSAL Ph.D. in Analytics and Data Science Page 20

SECTION 2. Description of the programs fit with the institutional mission and
nationally accepted trends in the discipline. Noted above.
SECTION 3. Description of how the program demonstrates demand and a
justification of need in the discipline and geographic area (region, state, and nation)
and is not unnecessary program duplication.
The market demand for the skills which are aligned with a Ph.D. in Analytics and Data
Science are evidenced above in Section 1b. The proposed Program is not only NOT a
duplication of any programs currently in existence in the State of Georgia, the program
would be the first of its kind in the country. A brief outline of the most closely related
Ph.D. programs in the State of Georgia is provided in Table 1below.
TABLE 1: Comparison of related Ph.D. programs in the University System of Georgia
Institution Name of
Program
Stated Objectives Notes on Curriculum Program Housed
Georgia Institute
of Technology
Ph.D. in Industrial
Engineering with a
Specialization in
Statistics
The Ph.D. in (Industrial and Systems
Engineering) is a research degree...students have
the opportunity to pursue work at virtually any
of the points across the applied/theoretical
spectrum...
Courses incorporate strong mathematics,
with methods courses aligned with
manufacturing and engineering.
Requirements include five core courses,
two theory courses, three methods courses,
one elective course (11 courses total). No
requirement for internships or co-op.
College of Engineering, H.
Milton Stewart School of
Industrial and Systems
Engineering.
Georgia Institute
of Technology
Ph.D. in Industrial
Engineering with a
Specialization in
Computational Science
and Engineering
Georgia Tech's CSE Ph.D. degree will prepare
students for a variety of positions in industry,
government and academia that emphasize
research and development. Students will be well
prepared for positions in industryand in
government. Graduates may pursue work in
software and systems for modeling and
simulation, systems integration, data mining and
visualization, high performance computing, and
computational modeling. Academic career
possibilities include research and education in
departments concerned with advancing the state-
of-the-art in the development and application of
computational models in engineering, the
sciences and computing.
The program emphasizes the integration
and application of principles from
mathematics, science, engineering and
computing to create computational
models.
Courses include 6 core courses in
computational mathematics and in high
performance computing, three elective
courses which must go beyond using
computers to deepen understanding of
computational methods, preferably in the
context of some application domain and
three elective courses in an application
domain (12 courses total). No requirement
for internships or co-op.

College of Engineering, H.
Milton Stewart School of
Industrial and Systems
Engineering.
University of
Georgia
Ph.D. in Statistics Heavy theoretical emphasis placement is
exclusively in academic positions.
Program includes a minimum of 10
courses including four statistical theory
courses (core), two subcore electives one
of which is a statistical computing course,
and four unspecified STAT electives.
College of Arts and Science,
Statistics Department.
Georgia State
University
Ph.D. in Mathematics
and Statistics
The Ph.D degree program in Mathematics and
Statistics includes concentrations in
bioinformatics, biostatistics, and mathematics.
These concentrations address the critical need for
mathematics faculty and the need for highly
trained specialists in the areas of bioinformatics
and biostatistics(the program) will graduate
individuals with a broad background in applied
areas for direct placement in business, industry,
governmental institutions and research
universities.
Heavy emphasis on mathematics. The four
core courses include Real Analysis, Matrix
Analysis, Theory of Probability and Linear
Statistical Analysis. Remaining courses
vary based upon selected concentration.
The Concentration in Bioinformatics
incorporates three computer science
courses. Eighteen courses required.
College of Arts and Sciences,
Department of Mathematics
and Statistics.
DRAFT PROPOSAL Ph.D. in Analytics and Data Science Page 21


These are all excellent programs which have achieved recognition across a variety of
contexts. However, none of these programs are aligned with the skills defining the
Data Scientist.

DRAFT PROPOSAL Ph.D. in Analytics and Data Science Page 22

SECTION 4. Brief description of institutional resources that will be used specifically
for the program (e.g., personnel, library, equipment, laboratories, supplies &
expenses, capital expenditures at program start-up and when the program undergoes
its first comprehensive program review).
The Department of Mathematics and Statistics does a lot of things well it has
developed the strongest, most popular Minor in the history of the University (the Minor
in Applied Statistics and Data Analysis). The MS in Applied Statistics has an effective
0% unemployment rate, with an ever-increasing number of applications each year.
While the philosophy, the will, the drive and the orientation to make a Ph.D. in
Analytics and Data Science an unqualified success at the level of the other programs is
present without question, several gaps in the faculty skill set and in the physical
infrastructure, have been identified, which must be mitigated in advance of any formal
establishment of a Ph.D. in Analytics and Data Science.
Faculty/Personnel requirements
a. Director, Ph.D. in Analytics and Data Science.
The Statistics faculty at KSU is comprised of talented, relatively young individuals who,
while having worked in capacities outside of academia, only have teaching experience
at KSU. No individual within this faculty has worked within an academic department
which grants Ph.D.s (outside of work completed during their own Ph.D. programs).
Therefore, an experienced individual is required to help to develop and to lead this
effort to develop a flagship doctoral program in Analytics and Data Science. This
individual will have had experience either leading or cultivating Ph.D. programs in
Statistics, Applied Mathematics, Engineering, or other similar applied quantitative
disciplines.
A brief survey of the salaries of individuals who lead the programs listed above within
the University System of Georgia is outlined in Table 2.
TABLE 2: Relevant Salary Comparisons for Individuals leading Major analytical
programs in the USG.
Position Institution Salary not including travel (2011)
Associate Chair for Graduate Studies and Professor Georgia Institute of Technology- H. Milton Stewart
School of Industrial and Systems Engineering
$140,167
Chair, Statistics Department University of Georgia, Department of Statistics $177,290
Director, Graduate Programs in Mathematics Georgia State University, Department of Mathematics
and Statistics
$91,525
DRAFT PROPOSAL Ph.D. in Analytics and Data Science Page 23


This individual would have the experience and authority of a department chair or
associate dean. In addition, this individual would need to possess the intersection of
skills which define a Data Scientist Statistics, Computer Science and Mathematics. As
referenced above, these individuals are in strong demand in the marketplace. While the
salary points for this skill set varies depending upon the discipline and the economic
sector, the average for 5-7 years of experience outside of academia (which may also be a
source of recruitment) ranged from $120,000 to $200,000+.
In the short term, this individual would have responsibility for the launch of the Ph.D.
program, ensure that the program is appropriately accredited, attract first-class students
into the program, coordinate and facilitate external partnerships and
communicate/market the program to external rating agencies (i.e. US News and World
Report, Fortune, etc.).
Projected Salary range (without travel) is $150,000.
b. Associate Professor in Statistics.
The demand for statistics courses at the undergraduate level and the MS level continues
to grow as can be seen in Figure 6.

0
2
4
6
8
10
12
14
16
18
20
22
24
26
28
2006 2007 2008 2009 2010 2011 2012
FIGURE 6: Number of Courses taught with STAT prefix since 2006
Undergraduate
Graduate
DRAFT PROPOSAL Ph.D. in Analytics and Data Science Page 24

While the faculty base has doubled over a six year period from five in 2006 to 11 in
2012 the number of courses (not sections, but different courses/preps) has more than
doubled from seventeen courses in 2006 (5 undergraduate and 12 graduate) to thirty-
seven DIFFERENT courses in 2012 (12 undergraduate and 25 graduate). And, many of
these courses now offer multiple sections, taking the teaching load for 11 faculty (and
two lecturers) to over 50 course sections.
Even without a proposed Ph.D. in Analytics and Data Science, the Statistics faculty is
already heaving under the weight of its own popularity. In addition to teaching their
own courses, they are frequently sought out by other departments around campus to
graduate courses in Statistics. Examples include NURS9100 and NURS9200, INCM9102
and the COLES DBA Program.
Therefore, there is a strong need for two faculty members, who can help share the
teaching load, particularly as the number of different courses offered through the
program continues to expand. Courses in advanced and emerging topics such as
Neural Network modeling, Affinity Analysis, Social Network Analysis, Text Mining, as
well as formal certification preparation courses for the Actuarial Exams, the SAS
Programming Exams and the Six Sigma Exams have been considered, but not pursued
because of the heavy teaching loads currently assumed by even senior faculty. The
ideal individual would also have some experience teaching doctoral students.
While the average ten month salary of an Associate Professor in Statistics at KSU is
about $80,000, this is well below the academic industry median, and about half of the
private sector average. To ensure a pool of competitive candidates for this position, this
position is proposed at a salary of $90,000 for ten months (not including travel).
A second position for an Assistant Professor of Statistics at KSU is proposed at a salary
of $80,000 for ten months (not including travel).

c. Associate Professor in Mathematics.
The Ph.D. Program will require, at minimum, three additional courses in theoretical
mathematics Advanced Discrete Mathematics/Combinatorics, Graph Theory and
Theory of Linear Models.
While the average ten month salary of an Associate Professor in Mathematics at KSU is
about $60,000, this is below the industry median. To ensure a pool of competitive
DRAFT PROPOSAL Ph.D. in Analytics and Data Science Page 25

candidates for this position, this position is proposed at a salary of $70,000 for ten
months (not including travel).
d. Associate Professor in Computer Science.
The Computer Science department at KSU is also experiencing an increased demand in
courses at all levels (consistent with the demands and trends in the market as outlined
above). With organic increased demand, combined with new demands from this
proposed Ph.D., the CS department will require at least two additional faculty to
accommodate the increased demand for courses. The expectations of the first
incremental CS faculty to specifically help support the needs of the proposed program,
would be some previous experience teaching Ph.D. students, combined with teaching
experience (or industry experience) with practical application. Consistent with the
current Associate Professor position in Computer Science, the expected ten month
salary is approximately $90,000.

d. Assistant Professor in Computer Science.
Same points as above regarding the Associate Professor position. The Assistant
Professor candidate may be a freshly minted Ph.D. graduate, with strong skills in Big
Data software/languages.
Consistent with current Assistant Professors in the Computer Science department, the
expected ten month salary for this position is approximately $80,000.

e. Office Manager.
The Ph.D. in Analytics and Data Science would need a dedicated Office Manager to
support the Director, as well as the doctoral students.
Consistent with Office Managers supporting other doctoral programs at KSU, the
expected twelve month salary for this position is approximately $50,000.



DRAFT PROPOSAL Ph.D. in Analytics and Data Science Page 26

f. Dedicated IT Professional.
Given the centrality of High Performance Computing to the success of the degree,
combined with the unique and complex requirements of a Big Data environment, the
Ph.D. in Analytics and Data Science would require a dedicated IT professional to
support the affiliated faculty and the doctoral students.
Consistent with IT Professionals currently engaged in system support for the College of
Science and Mathematics, the expected twelve month salary is expected to be $55,000.
g. Stipend for Doctoral Students.
Data regarding funding for doctoral students at other universities is highly varied by
program. Because the Program is full time, students will have little opportunity to
work outside of their Program-aligned internship. Therefore, the stipend will need to
be sufficiently substantive in addition to being competitive to attract high-quality
candidates.
It is our initial position that the stipend for students should range from $20,000 to
$30,000 annually with the discretion of the Director to award higher stipends for more
promising students. Doctoral students would also receive a tuition waiver and would
be required to teach or assist with research every semester.
TABLE 7: Summary of incremental Faculty/Personnel for the Ph.D. in Analytics and
Data Science
Position Home Department Expected
Annual Salary
Notes
Director, Ph.D. in Analytics and Data Science Department of Mathematics and Statistics $150,000 Twelve Month position starting YR
Before Program (YR -1).
Associate Professor of Applied Statistics Department of Mathematics and Statistics $90,000 Ten Month position starting in YR1.
Assistant Professor of Applied Statistics Department of Mathematics and Statistics $80,000 Ten Month position starting in YR2.
Associate Professor of Mathematics Department of Mathematics and Statistics $70,000 Ten Month position starting in YR2.
Associate Professor of Computer Science Department of Computer Science with appointment to
the Department of Mathematics and Statistics
$90,000 Ten Month position starting in YR1.
Assistant Professor of Computer Science Department of Computer Science with appointment to
the Department of Mathematics and Statistics
$80,000 Ten Month position starting in YR
Before Program (YR -1).
Office Manager Department of Mathematics and Statistics $50,000 Twelve Month position starting in YR1
IT Support College of Science and Mathematics $55,000 Twelve Month position starting in YR1.
Doctoral Student Stipend Department of Mathematics and Statistics $150,000 $30,000 Per Student estimate assumes
five students in YR1.
TOTAL $815,000




DRAFT PROPOSAL Ph.D. in Analytics and Data Science Page 27

Software Requirements
A defining skill of the Data Scientist is their ability to work with Big Data. While the
operational definitions of Big Data continue to evolve, there are three components to
this new and increasingly ubiquitous type of data that most people agree are present:
1. Size. In this context big is defined as data not contained by traditional
software packages. For example, data files with a billion records are increasingly
not unusual. Traditional analysis packages like Microsoft Excel or even SPSS do
not scale to accommodate this kind of data.
2. Type. Big Data often comes in forms other than traditional rows and columns.
Text, tweets, even video, are well accommodated by traditional analytical
software.
3. Velocity. Because data has become cheap and easy to capture, organizations
capture data. Every day. Every hour. Every minute. Consider the Southern
Company. This organization captures usage on the power grid continuously and
feeds this data back into their systems. This velocity of data the constant
updating of files is again, not accommodated by traditional analytical software.
Big data with issues related to size, type and velocity, requires a different type of
software solution for analysis.
The SAS Institute is one of the few organizations to have developed a software solution
for this problem. SAS High Performance Analytics is a solution that has been
developed by SAS to address these issues. SAS HPA uses in-memory and distributed
processing to analyze billions of records in seconds. By way of comparison, a current
student course project from STAT8330 requires students to model 17 million records.
With 20 students using a dedicated server (no other users permitted), the process takes
about 4 days.
In part because of the long standing relationship between the SAS Institute and
Kennesaw State University, and the consistent integration of Base SAS software across
almost all KSU STAT courses, as well as the recognition of this Ph.D. program
becoming the first of its kind in the country, the SAS Institute has agreed to make a
multi-million dollar gift to KSU in the form of a copy of this software.
This gift will accomplish three very important objectives as the new Program unfolds:

DRAFT PROPOSAL Ph.D. in Analytics and Data Science Page 28

1. Because KSU will be one of the first universities in the country to have this
software, it will not only differentiate KSU, but provide important validation and
enhanced credibility for the Program.
2. It will continue to make KSU the center for analytics in the Metropolitan Atlanta
area where small, medium and large companies can come to the KSU Data
Science Lab to test Big Data Analytics on a unique piece of software.
3. As organizations around the country integrate this software into their own
systems, KSU students will be uniquely trained and qualified to work within an
HPA environment.
Data Science Lab requirements
The multi-million dollar software grant from SAS is unprecedented in KSU history.
This powerful software will help to propel the program into immediate national
recognition as a leader in Big Data analytics.
However, the software requires a level of hardware infrastructure which has never
existed at KSU.
Working with Erik Bowe, Chief Data Officer and Don Hayes, founder of DLL
Consulting, a SAS Alliance Partner Affiliate and member of the Statistics Advisory
Board, a team of faculty are evaluating multiple options to optimize the final hardware
configuration to support a high performance analytics infrastructure. At a minimum, a
high bandwidth storage area network (SAN) with at least 6 TB of storage with 7,500
RPM drives will be needed, with options for scalability. As part of this option, the
program would begin with 20+ Dell/Teradata blades with 8 core chips along with 256
GB of RAM at 1600 MHz.
More detailed hardware infrastructure conversations will be required, but research to-
date indicates that the hardware solution options will range from $700,000 to $1,000,000.
For the purposes of planning, an estimate of $800,000 will be used to fund the one time
hardware requirements needed to accommodate the SAS high performance analytics
software.
It should be noted that this hardware infrastructure would also serve the needs of
Computer Science which has also launched a Certificate program aligned with Big
Data.

DRAFT PROPOSAL Ph.D. in Analytics and Data Science Page 29

TABLE 8: Summary of incremental Hardware/Physical Infrastructure for the Ph.D. in
Analytics and Data Science

Item Home Department Expected Costs (one time unless
noted otherwise)
Notes
New Servers $800,000 One time cost YR1
Annual Support from SAS $50,000 Annual cost
TOTAL $850,000

A detailed summary of the estimated budget for the program can be found in Section 14
below.
DRAFT PROPOSAL Ph.D. in Analytics and Data Science Page 30

SECTION 5. Curriculum: List the entire course of study required and recommended
to complete the degree program. Provide a sample program of study that would be
followed by a representative student.
The proposed program of study for the Ph.D. in Analytics and Data Science presented
in the pages below assumes that an entry-level student has completed the following
courses or their equivalents at the masters degree-level (see Table 10). If any of these
courses (or their professional equivalents) have not been completed by a candidate
prior to admission to the Ph.D. program, then the student will be required to complete
the appropriate course(s) in addition to the requirements that follow for the Ph.D.
Table 10 below displays the course acronym, title, credit hours, status (either an existing
course or new (one that must be developed for purposes of the Ph.D. program)), and
associated prerequisites for eight foundational masters degree-level program
prerequisites. In addition, Section A.2 in the Appendix provides the course descriptions
for these classes.

TABLE 10: Previous Masters Degree-Level Required Coursework for the Ph.D. in
Analytics and Data Science
Prefix Course Name
Credit
Hours Status Pre-requisites
Notes
STAT 7010 Mathematical Statistics I 3-0-3 Existing
Requires Calc 2, and preferable
Calc 3.
STAT 7100 Statistical Methods 3-0-3 Existing STAT 7010
Research Methods and Testing.
This would be standard content
in almost any MS program in
Social Sciences, Hard Sciences or
Statistics.
STAT 7020 Statistical Computing and Simulation 3-0-3 Existing STAT 7100
Must include SAS and basic
programming.
STAT 8210 Applied Regression Analysis 3-0-3 Existing STAT 7100 and STAT 7020
Standard for most Statistics
Curricula.
STAT 8310 Applied Categorical Data Analysis 3-0-3 Existing STAT 8210
STAT 8320 Applied Multivariate Data Analysis 3-0-3 Existing STAT 8120 and STAT 8210
Standard for most Statistics
Curricula.
ACS 7010 C++ and Data Structures 3-0-3 Existing Standard for most CS Curricula.
ACS 7030 Relational Database Systems 3-0-3 Existing

Because this is an applied degree program, for applicants who do not have these
specific (or similar) courses on their academic transcript, practical experience would be
a viable substitute. To ensure that applicants have the necessary depth of skills, a
passing score on a basic skills exam would be required in lieu of a completed course.

DRAFT PROPOSAL Ph.D. in Analytics and Data Science Page 31


Students pursuing a Ph.D. in Analytics and Data Science would be required to take 48
course hours plus 6 hours of electives spread over four years, plus dissertation research
(12 hour minimum) and internship (12 hour minimum). In total, this degree is expected
to require a minimum of 78 credit hours of courses, internship and dissertation.
TABLE 11: Core Required Courses in for the Ph.D. in Analytics and Data Science
Prefix Course Name Credit Hours Status Pre-requisites* Notes
STAT 8240 Data Mining I 3-0-3 Existing STAT 8210
STAT 8020 Advanced Programming in SAS 3-0-3 Existing STAT 7100 and 7020
After the completion of
this course, students will
sit for the Advanced
Programming
Certification Exam,
sponsored by SAS.
STAT 8330 Applied Binary Classification 3-0-3 Existing STAT 8120
STAT 8250 Data Mining II 3-0-3 New STAT 8240
This course may need to
be taught by or at least
in conjunction with the
CS faculty. We anticipate
heavy emphasis on
Hadoop.
STAT 8260 Segmentation Models 3-0-3 New STAT 8240
After the completion of
this course (and 8240 and
8250), students will sit for
the Data Mining
Certification Exam,
sponsored by SAS.
STAT 8270 Production-Level Modeling 3-0-3 New
STAT 8240 and STAT
8020

STAT XXXX Statistics Elective 3-0-3
STAT XXXX Statistics Elective 3-0-3
STAT CORE 24 HOURS

MATH 8020 Graph Theory 3-0-3 New
MATH 8030 Discrete Mathematics 3-0-3 New
Includes study of
Combinatorics
MATH 8010 Theory of Linear Models 3-0-3 New
MATH CORE 9 HOURS
ACS 7410 Parallel and Distributed Computing 3-0-3 New ACS7010 and ACS 7030
ACS 7510 HPC Infrastructure 3-0-3 New ACS 7010
ACS 7420 Algorithm Design for Big Data 3-0-3 New ACS 7410
ACS 8310 Data Warehousing 3-0-3 New ACS 7030
ACS XXXX ACS Elective 3-0-3
CS CORE** 15 HOURS
TOTAL CORE REQUIREMENTS 48 HOURS
DS 9900 Dissertation Research (min 2 semesters) 6-0-6 New Completion of
Coursework
Can be completed
simultaneously with
Internship
DS 9700 Internship/Application (min 2 semesters) 6-0-6 New Completion of
Coursework
Can be completed
simultaneously with
Dissertation Research
XXXX Free Elective (Statistics, Mathematics, or
Computer Science or Content area)
3-0-3 New
XXXX Free Elective (Statistics, Mathematics, or
Computer Science or Content area)
3-0-3 New
TOTAL PROGRAM REQUIREMENTS 78 HOURS
* From transcript or from professional experience.
** All CS courses listed above will be cross listed with the MS in Applied Computer Science (ACS). These courses will be taught by
a combination of Dr. Ying Xie and two incremental faculty to be hired via a search committee comprised of CS, Math and Stat
faculty.
DRAFT PROPOSAL Ph.D. in Analytics and Data Science Page 32

Students are welcome to take additional hours of elective courses (or substitutes with
approval) from any department on campus. Because few departments currently offer
Ph.D. or M.S. level courses, electives outside of the three disciplines outlined above
would most likely take the form of directed studies. These directed studies, could
segue into the application phase of the program. It would be expected that, where
appropriate, any professor who engages in a substantive directed study with a Ph.D.
student from this program would also serve on the dissertation committee.
Consequently, only tenured Associate and Full Professors are invited to engage in
directed studies with these Ph.D. students.
Of the elective courses listed below from the Statistics curriculum, several have been
considered for years as electives for inclusion in the MS in Applied Statistics. With
additional faculty resources, these courses can now be developed and cross listed
between the MS in Applied Statistics and the Ph.D. in Analytics and Data Science.
TABLE 12: Elective Courses in Statistics for the Ph.D. in Analytics and Data Science
Prefix Course Name Credit Hours Status Pre-requisites Notes
STAT 8110 Quality Control and Process Improvement 3-0-3 Existing STAT 7100 & STAT 7020 Black Belt Preparation
STAT 8140 Six Sigma Problem Solving 3-0-3 Existing STAT 8110 & STAT 8120 Black Belt Preparation
STAT 8350 Social Network Analysis 3-0-3 New STAT 8240
STAT 8360 Applied Sampling Methods 3-0-3 New STAT 8240 & STAT 8020
STAT 8370 Affinity Analysis 3-0-3 New STAT 8240
STAT 8380 Churn Modeling 3-0-3 New STAT 8240
STAT 7900 Programming in R 3-0-3 Existing STAT7020 or CS 8530 STAT 7900
STAT 8399 Design and Analysis of Massive Survey Data 3-0-3 New STAT 8240 & STAT8020 STAT 8399
ACS 8430 Text and Web Mining 3-0-3 New ACS 8310 and ACS 7420
ACS 8510 Cloud Computing 3-0-3 New ACS 7410 and ACS 7510
DS XXXX CSAS Consulting 3-0-3 New Center Director Approval
Option to work in the
Consulting Center

While an ultimate program of study could take many different forms, an example of a
typical program of study for a student in this program can be found in Table 13 below.







DRAFT PROPOSAL Ph.D. in Analytics and Data Science Page 33


TABLE 13: Sample Program of Study for the Ph.D. in Analytics and Data Science
Semester Course Course Name Pre Req Hours Notes
FALL 2014
STAT 8020 Advanced Programming in SAS
STAT 7100 & 7020 (or BASE SAS
Programming Certification)
3
STAT 8240 Data Mining I STAT8210 3
MATH 8010 Theory of Linear Models 3
ACS 7410 Parallel and Distributed Computing ACS 7010 and ACS 7030 3
SEMESTER = 12

SPRING 2015 STAT 8330 Applied Binary Classification STAT 8210 3
MATH 8030 Discrete Mathematics 3
ACS 7510 HPC Infrastructure ACS 7010 3
SEMESTER = 9

SUMMER 2015 MATH 8020 Graph Theory 3
STAT 8360 Applied Sampling Methods STAT 8240 and STAT 8020 3 STAT Elective
SEMESTER = 6

FALL 2015 STAT 8250 Data Mining II STAT 8240 3
STAT 8260 Segmentation Models STAT 8240 3
ACS 8310 Data Warehousing ACS 7030 3
SEMESTER = 9

SPRING 2016 STAT 8270 Production-Level Modeling STAT 8240 and STAT 8020 3
STAT 8350 Social Network Analysis STAT 8240 3 STAT Elective
ACS 7420 Algorithm Design for Big Data ACS7410 3
SEMESTER = 9

SUMMER 2016 ACS8430 Web and Text Mining ACS 8310 and ACS 7420 3 ACS Elective
DS 9700 Internship/Application Completion of Coursework 3
SEMESTER = 6

FALL 2016 DS 9900 Dissertation Research Completion of Coursework 6
DS 9700 Internship/Application Completion of Coursework 6
SEMESTER=12

SPRING 2017 DS 9900 Dissertation Research Completion of Coursework 6
DS 9700 Internship/Application Completion of Coursework 6
SEMESTER=12

SUMMER 2017 DS 9900 Dissertation Research Completion of Coursework 6


PROGRAM = 81









DRAFT PROPOSAL Ph.D. in Analytics and Data Science Page 34


e. Append materials available from national accrediting agencies or
professional organizations as they relate to curriculum standards for the
proposed program.
Data Science represents a nascent, but trending confluence of disciplines. The degree
incorporates Mathematics, Statistics and Computer Science. No national accrediting
agency exists to oversee degrees in Data Science. Therefore, we will look to external
entities for feedback and endorsement of our degree program. These external entities
include:
1. The SAS Institute one of the primary clearinghouses for developments in the
area of applied analytics (it should be noted that the SAS Institute is an engaged
member of our advisory board and has been instrumental in guiding our
thinking throughout the development of this proposal).
2. The KSU Statistics Advisory Board as referenced above, the Advisory Board is
comprised of managers from organizations in the Metropolitan Atlanta area who
are currently engaged in issues related to Big Data Analytics, including The
HomeDepot, The Southern Company, The Centers for Disease Control,
CompuCredit and several consulting firms and large insurance firms. Again,
they have guided the principles incorporated in this proposal.
3. Other Universities. Several universities have MS and/or undergraduate
programs aligned with applied statistics. Several professors from these
institutions have taken the time to review our proposed curriculum and have
provided their comments. Summaries of the comments can be found in Section
A.3.

f. Indicate ways in which the proposed program is consistent with national
standards.
As indicated above, there are no national standards in this space. However, as
outlined in Section 5.e, several knowledgeable experts have been solicited for their
input and endorsement of the proposed curriculum.
DRAFT PROPOSAL Ph.D. in Analytics and Data Science Page 35

It is worth noting that Thomas Davenport a recognized national expert and prolific
author in the area of applied analytics prescribed the following skills for a Data
Scientist in his article Data Scientist: The Sexiest Job of the 21
st
Century"
1
:
The most basic universal skill is the ability to code
Communicate in a language that their stakeholders can understand
Posit and test hypotheses
Bring structure to large quantities of formless data
g. If internships or field experiences are required as part of the program,
provide information documenting internship availability as well as how students
will be assigned and supervised.
As provided in Section A.1, the Department of Mathematics and Statistics has received
letters from our Statistical Advisory Board, stating that they would hire our Ph.D.
students for a minimum of one year, after they have completed their course
requirements. These students would be supervised at the hiring firm, but would also
require a faculty mentor from the Department of Mathematics and Statistics.
h. Indicate the adequacy of foundation course offerings to support the new
program.
Students would be admitted in a cohort. Courses which are unique to the Ph.D.
program (not part of a MS curriculum) would be offered once a year.


1
http://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century/ar/pr
DRAFT PROPOSAL Ph.D. in Analytics and Data Science Page 36

SECTION 6: Admissions criteria. Please include required minima scores on
appropriate standardized tests, grade point averages, and masters level graduate
degree attainment.
Admission into the Ph.D. in Analytics and Data Science Program would require the
following:
1. Minimum GRE Score Top 20
th
percentile
2. Minimum GPA 3.0 in MS program
3. Letters of Recommendation (academic and professional)
4. At least one year of previous work experience outside of academia (preferred)
Applicants would need to provide three letters of recommendation, including two from
academic references and one from an applied reference (i.e., a manager or colleague).
5. Interview
All applicants would be required to have either an on-campus interview (preferred) or a
video with the Program faculty.

DRAFT PROPOSAL Ph.D. in Analytics and Data Science Page 37

SECTION 7: Availability of assistantships.
As evidenced by the Letters of Intent (see Section A.1) from the Statistical Advisory
Board, external assistantships/internships should be readily available for students who
complete their courses and pass their comprehensive exam.
In addition, as the emphasis on research and scholarship continues to increase at KSU,
these doctoral students probably more than any other students on campus would be
expected to be in demand for grant/research support. Currently the Center for Statistics
and Analytical Services employs two MS level GRAs every semester. These GRAs are
engaged in multiple analytical projects across campus. Given the existing level of
demand for MSAS students, it would be logical to anticipate sufficient demand for Data
Science doctoral students to accommodate at least two GRAs from this Program every
semester.


DRAFT PROPOSAL Ph.D. in Analytics and Data Science Page 38

SECTION 8. Student learning outcomes and other outcomes of the proposed
program.
The success of the Ph.D. in Analytics and Data Science will be measured through four
instruments:
1. Successful completion of comprehensive examinations. At the completion of
their required coursework, students will take a comprehensive examination,
covering each of the three major areas of study. Students who do not pass their
comprehensive examinations will be mentored into alternative options. These
alternatives are outlined at the end of this section.
2. External Certifications. The SAS Institute has a strong certification program
which serves as a de facto accreditation process for the applied analytics
industry
2
. These certifications range from base programming, data management,
predictive modeling and data mining. At present, most MS students in Applied
Statistics, and increasingly more undergraduate minors in Applied Statistics, are
earning these certifications. It would be our expectation that the Ph.D. students
in Analytics and Data Science would continue this trend. With additional
training in Computer Science and in Mathematics, these students should earn a
SAS Certification in Base Programming, Advanced Programming, Statistical
Business Analyst: Regression and Modeling, at the least. These certifications
have clear and universally recognized value in the marketplace.
3. Job Placement. As referenced above, the MS in Applied Statistics has a 100%
placement rate, with many students having multiple job offers in advance of
graduation. An important distinction of success of the Ph.D. in Analytics and
Data Science will be the demand for the graduates. Not only do we expect these
graduates to experience a similarly strong placement rate as the MS students, but
we expect that there will be competition in the market for these students.
4. Pre/Interim/Post competency survey. In conjunction with our Advisory Board,
we will develop a required skills inventory including the skills which are
required for job postings within their own firms when advertising for Data
Scientists. Students will be evaluated against this inventory three times. The
pre score will result from their assessment after acceptance into the program.
The interim score will result from their assessment after completion of the

2
http://support.sas.com/certify/
DRAFT PROPOSAL Ph.D. in Analytics and Data Science Page 39

program. Their post score will come after one year in their post-graduation
position.
Students who do not pass the comprehensive exams will NOT simply earn an MS in
Applied Statistics or an MS in Applied Computer Science. If failing Ph.D. students
have a particular aptitude in one discipline and not in another, they may be guided
to apply to one of these two programs, but acceptance is not guaranteed. A scenario
exists where unsuccessful students may walk away empty-handed, although we
expect this scenario to be rare.

DRAFT PROPOSAL Ph.D. in Analytics and Data Science Page 40

SECTION 9. Administration of the program:
The Ph.D. in Data Science will be housed in the College of Science and Mathematics,
with primary administrative responsibility in the Department of Mathematics and
Statistics.

DRAFT PROPOSAL Ph.D. in Analytics and Data Science Page 41

SECTION 10. Waiver to Degree-Credit Hour (if applicable): If the program exceeds
the total credit hours normally associated with similar programs offered both within
and outside of the system, provide the institutions rationale for increased credit hour
requirements. Not Applicable.

DRAFT PROPOSAL Ph.D. in Analytics and Data Science Page 42

SECTION 11. Accreditation: Describe disciplinary accreditation requirements
associated with the program (if applicable).
As described in Section 5.e above, while no accreditation agency exists for Data Science,
the curriculum has been informed by several knowledgeable individuals and
organizations.
DRAFT PROPOSAL Ph.D. in Analytics and Data Science Page 43

SECTION 12. Projected enrollment for the program (especially during the first three
years of implementation). Please indicate whether enrollments will be cohort-based.
The Ph.D. in Analytics and Data Science will enroll five students a year for the first four
years on a co-hort basis. After four years, the program, is expected to ramp up
towards a steady state of 10 new students a year.
YR 1 YR 2 YR 3 YR 4 YR 5 YR 6
Enrollment Projections 5 10 15 20 21 22

Course Sections Satisfying Program Requirements
Previously Existing 7 16 18 20 20 20
New 9 2 2 0 0 0
Total Sections 16 18 20 20 20 20

Credit Hours Generated by Those Courses
Existing 21 48 54 66 78 78
New 27 6 12* 12* 0 0
Total Credit Hours 48 54 66 78 78 78

Degrees Awarded 0 0 0 0 4 4
* Dissertation hours and Internship hours will be duplicated

DRAFT PROPOSAL Ph.D. in Analytics and Data Science Page 44

SECTION 13: Faculty
Provide an inventory of faculty directly involved with the administration of the
program. On the list below, indicate which persons are existing faculty and which are
new hires.

Faculty Name Rank Highest Degree Degrees Earned Academic
Discipline
Current Workload
Bradley Barney Assistant Professor Ph.D. B.A., M.S., Ph.D. Statistics 3/3
Marla Bell Professor Ph.D. B.S., M.S., Ph.D. Statistics 1/1
Nicole Ferguson Assistant Professor Ph.D. B.S., M.S., Ph.D. Statistics 3/3
Joe DeMaio Professor Ph.D. B.S., M.S., Ph.D. Mathematics 2/2
Victor Kane Professor Ph.D. B.S., M.S., Ph.D. Statistics 3/3
Philippe Lavalle Associate Professor Ph.D. B.S., M.S., Ph.D. Mathematics 3/3
Louise Lawson Professor Ph.D. B.S., M.P.H., Ph.D. Statistics 3/3
Sherri Ni Associate Professor Ph.D. B.S., M.S., Ph.D. Statistics 3/3
Jennifer Priestley Associate Professor Ph.D. B.S., MBA, Ph.D. Statistics 2/2
Herman (Gene) Ray Assistant Professor Ph.D. B.S., M.S., Ph.D. Statistics 3/3
Lewis VanBrackle Professor Ph.D. B.S., M.S., Ph.D. Statistics 2/2
Ying Xie Associate Professor Ph.D. B.S., M.S., Ph.D. Computer Science 2/3
Daniel Yanosky Associate Professor Ph.D. B.S., M.S., Ph.D. Statistics 3/3
New Hire TBD Associate Professor Ph.D. Statistics NA
New Hire TBD Assistant Professor Ph.D. Statistics NA
New Hire TBD Assosciate Professor Ph.D. Mathematics NA
New Hire TBD Assistant Professor Ph.D. Computer Science NA
New Hire TBD Associate Professor Ph.D. Computer Science NA
New Hire TBD Professor Ph.D. TBD NA


DRAFT PROPOSAL Ph.D. in Analytics and Data Science Page 45

SECTION 15: Fiscal, Facilities, Enrollment Impact and Budget
YR1 YR2 YR3 YR4 YR 5 YR 6
ENROLLMENT PROJECTIONS
Doctoral Students 5 10 15 20 21 22

Sections 16 18 20 20 20 20

Credit Hours 48 54 66 78 78 78

Degrees 0 0 0 0 4 4


Personnel New
Faculty 260,000* 410,000+ 410,000 410,000 410,000 410,000
Graduate Assistants 150,000 300,000 450,000 450,000 330,000 210,000
Administrators 150,000 150,000 150,000 150,000 150,000 150,000
Support Staff 105,000 105,000 105,000 105,000 105,000 105,000
Fringe Benefits (@ 30%) 154,500 199,500 199,500 199,500 199,500 199,500
Total New Personnel 819,500 1,164,500 1,314,500 1,314,500 1,194,500 1,074,500

Start Up Costs
Equipment 800,000 0 0 0 0 0
Physical Facilities 150,000 0 0 0 0 0

Operating Costs
Supplies 20,000 10,000 5,000 5,000 5,000 5,000
Travel 30,000 30,000 30,000 30,000 30,000 30,000
Equipment 20,000 20,000 20,000 20,000 20,000 20,000
Other

GRAND TOTAL COSTS 1,839,500 1,224,500 1,369,500 1,369,500 1,249,500 1,129,500
* includes the Assistant Professor in Computer Science from YR -1, the Associate Professor in Statistics in YR1 and the Associate Professor in Computer Science in YR1.
+ includes the faculty from YR1 and the Assistant Professor in Statistics in YR2 and the Associate Professor in Mathematics in YR2.

YR 1 YR 2 YR 3 YR 4 YR 5 YR 6
REVENUE SOURCES
Source of Funds
Reallocation of existing funds
New Tuition
Federal Funds
Other Grants -

385,000

420,000

500,000

500,000

500,000
Student Fees
Other (Consulting Center)

25,000

50,000

100,000

100,000

100,000
Other (Corporate Internships)

150,000

300,000

450,000
New state allocation requested for budget hearing
Nature of Funds
Base Budget
One Time funds

GRAND TOTAL REVENUES 0 410,000 470,000 750,000 900,000 1,050,000

YR1 YR2 YR3 YR4 YR 5 YR 6
Net Total Program Cost 1,839,500 814,500 899,500 619,500 349,500 79,500


DRAFT PROPOSAL Ph.D. in Analytics and Data Science Page 46

SECTION A.1: Letters of Support from Statistical Advisory Board


DRAFT PROPOSAL Ph.D. in Analytics and Data Science Page 47

SECTION A.2: COURSE DESCRIPTIONS FROM SECTION 5
From Table 10: Previous Masters Degree-Level Required Coursework for the Ph.D.
in Analytics and Data Science. Note that these courses are currently available in the
MS in Applied Statistics and/or the MS in Applied Computer Science curricula.
STAT 7010: Mathematical Statistics I
Fundamental concepts of probability, random variables and their distributions; review of
sampling distributions; theory and methods of point estimation and hypothesis testing, interval
estimation, nonparametric tests, introduction to linear models.
STAT 7020: Statistical Computing and Simulation
Topics covered in STAT 7020 will include stochastic modeling, random number generators
based on probability distributions, discrete-event simulation approaches, simulated data analysis,
nonparametric analysis and sampling techniques. Given the importance of the SAS software to
these types of applications, students will, by definition, refine and improve their SAS
programming skills. The class will utilize real-world datasets from a variety of disciplines
including, finance, manufacturing and medicine.
STAT 7100: Statistical Methods
Stat 7100 is designed to give students the foundation in statistical methods necessary for further
study in the Master of Science in Applied Statistics program. The course begins with a study of
statistical distributions (binomial, Poisson, uniform, exponential, gamma, chi-square and
normal), descriptive statistics, the Central Limit Theorem, t-tests (one-sample, two-sample and
paired) and confidence intervals. The course then moves on to more advanced techniques
including categorical data analysis (chi-square tests), correlation, simple linear regression
analysis and one-way analysis of variance.
STAT 8120: Applied Experimental Design
Methods for constructing and analyzing designed experiments are considered. The concepts of
experimental unit, randomization, blocking, replication, error reduction and treatment structure
are introduced. The design and analysis of completely randomized, randomized complete block,
incomplete block, Latin square, split-plot, repeated measures, factorial and fractional factorial
designs will be covered.
STAT 8210: Applied Regression Analysis
DRAFT PROPOSAL Ph.D. in Analytics and Data Science Page 48

Topics include simple linear regression, inferences, diagnostics and remedies, matrix
representations, multiple regression models, generalized linear model, multicollinearity,
polynomial models, qualitative predictor variables, model selection and validation, identifying
outliers and influential observations, diagnostics for multicollinearity, and logistic regression.
STAT 8310: Applied Categorical Data Analysis
This course will cover methods of contingency table analysis, including data categorization,
dose-response and trend analysis, and calculation of measures of effect and association. The
students will learn to use generalized linear regression models including logistic, polychotomous
logistic, Poisson and repeated measures (marginal and mixed models), and apply these
appropriately to real-world data. Applications to Statistical software packages such as JMP,
MINITAB, and/or SAS will be used.
STAT 8320: Applied Multivariate Data Analysis
Survey course in statistical analysis techniques. Through a combination of textbook and real-
world data sets, students will gain hands-on experience in understanding when and how to
utilize the primary multivariate methods Data Reduction techniques, including Principal
components Analysis and Common Factor Analysis, ANOVA/MANOVA/MANCOVA,
Cluster Analysis, Survival Analysis and Decision Trees.
ACS 7010: C++ & Data Structures
A study of C++ programming language and capabilities, with a study of computing data
structures.

ACS 7030: Relational Database Systems
A study of database systems for data science and analytics, including SQL and working with
very large data sets.



DRAFT PROPOSAL Ph.D. in Analytics and Data Science Page 49

From Table 11: Required Courses in Mathematics and Statistics for the Ph.D. in
Analytics and Data Science
STAT 8240: Data Mining I
Data Mining is an information extraction activity whose goal is to discover hidden facts
contained in databases and perform prediction and forecasting through interaction with the data.
The process includes data selection, cleaning and coding, using statistical pattern recognition
and machine learning techniques, and reporting and visualizing the generated structures. The
course will cover all these issues and will illustrate the whole process by examples of practical
applications.

STAT 8020: Advanced Programming in SAS
This course will cover advanced programming techniques using the SAS system for data
management and statistical analysis. The topics covered include macro programming, using
SQL with SAS and optimizing SAS programs. Upon completion of this course students will be
prepared to take and pass the certification test and obtain the Advanced Programmer for SAS 9
certification.

STAT 8330: Applied Binary Classification
This course is a heavily used concept in Statistical Modeling. Common applications include
credit worthiness and the associated development of a FICO-esque credit score, fraud detection
or the identification of manufacturing units which fail inspection. Students will learn how to use
Logistic Regression, Odds, ROC curves, maximization functions to apply binary classification
concepts to real-world datasets. This course will heavily use SAS-software and students are
expected to have a strong working knowledge of SAS.

STAT 8250: Data Mining II
This is the second course in a two-course sequence on data mining. It emphasizes advanced
concepts and techniques for data mining and their application to massive data sets. Building on
the knowledge and skills introduced in Data Mining I, this course covers mining patterns from
temporal data, sequence data, graph data, semi-supervised learning, active learning, boosting
and distributed data mining. In addition, support vector machine (SVM), multivariate adaptive
regression splines (MARS), recursive partitioning and its extensions (e.g., bagging, boosting,
and random forests) are covered.
DRAFT PROPOSAL Ph.D. in Analytics and Data Science Page 50

STAT 8260: Segmentation Models
This class begins by reviewing classical clustering methods introduced in the data mining
sequence. These methods are studied in greater depth and their application in massive data
classification and market segmentation endeavors is explored. The second half of this course
introduces the use of probabilistic models for segmentation, including mixture and latent class
models, among others, and explores their utility and strengths. Segmentation using both
continuous and categorical inputs with these methods is stressed. Further emphasis is placed on
practical application of these methods when applied to massive data sources and appropriate and
accurate reporting of results.
STAT 8270: Production-Level Modeling
This course focuses on the practical use of statistical and data mining models in production-level
use in massive data applications. The course focuses on the circular, continuous nature of the
model life cycle by studying the planning, development, implementation, assessment,
monitoring, retirement/replacement phases of production-level modeling.
MATH 8010: The Theory of Linear Models
This course provides a solid foundation of the theory behind linear statistical models for
continuous responses. Students will learn to conceptualize linear statistical models using matrix
algebra. The course begins with a review of the calculus sequence, linear algebra, probability
theory, the multivariate normal distribution, and quadratic forms. Some of the topics covered
include: simple and multiple regression, parameter estimation and interpretation, hypothesis
testing, prediction, model diagnostics, model comparison, and variable selection.
MATH 8030: Applied Discrete & Combinatorial Mathematics for Data Analysts
This course covers applied discrete mathematics and combinatorial tools for data analyst. Topics
covered include principles of counting, fundamentals of logic, set theory, mathematical
induction, functions, and graph theory. Examples using applied data analysis and associated
computing are used throughout.
MATH 8020: Graph Theory
This course introduces standard graph theoretic terminology, theorems and algorithms necessary to the
study of large data networks. Topics include graphs, trees, paths, cycles, isomorphisms, routing
problems, independence, domination, centrality, and coloring problems. Data structures for
representing large graphs and corresponding algorithms for searching and optimization purposes
accompany these topics.
ACS 7410: Parallel & Distributed Computing
DRAFT PROPOSAL Ph.D. in Analytics and Data Science Page 51

This course covers different parallelisms including shared memory parallelism (OpenMP),
distributed memory parallelism (MPI), and MapReduce (the major distributed framework used
in cloud) in solving a variety of complex problems

ACS 7420: Algorithm Design for Big Data
This course covers advanced topics in algorithm design. Focus will be put on the design of
advanced algorithms that are scalable to big data in a distributed computing environment.

ACS 7510: HPC Infrastructure
A study of high performance computing technologies, including supercomputing, grid
computing, cloud computing and other technologies. Also includes discussion of issues around
the design and management of a large data center, data integrity, and data reliability.

ACS 8310: Data Warehousing
Data warehousing and mining are indispensable components of Business Intelligence, which
aims to enhance business competitiveness by providing business people with information and
tools that are necessary to make critical business decisions. This course will cover major
techniques of data warehousing and mining including: dimensional modeling, extraction-
transformation-load (ETL), online analytical processing (OLAP), association mining, clustering,
classification, and other business intelligence applications.



DRAFT PROPOSAL Ph.D. in Analytics and Data Science Page 52

SECTION A.3: RESPONSES TO PROPOSAL FROM OTHER UNIVERSITIES
1. Dr. Goutam Chakraborty, Professor (Marketing)
Founding Director of Graduate Certificate in Business Data Mining
Spears School of Business, Oklahoma State University

I read the document and I like it! You have done an outstanding job to make the case for a formal
program in Data Science - which is much needed to meet the demands of the market.
I have two main points of feedback:
a) The director's salary needs to be much higher, if you are going to get the right type of person
with the skill sets that you identified.
b) I suggest that you consider the degree a DDS (Doctorate in Data Science) rather than PhD. I
say this because as soon as you say PhD, everyone will start asking about a more theoretical
research oriented degree.

2. Shesh N. Rai, Ph.D., Director, Biostatistics Shared Facility, JG Brown Cancer Center
Professor and Wendell Cherry Chair in Clinical Trial Research
Dept. of Bioinformatics & Biostatistics, School of Public Health & Information Sciences
University of Louisville
I think overall the proposal is very good. It will fill an emerging need which is current addressed by
individuals with mathematics, computer science, and statistics backgrounds augmented with interest
and experience. With approach to solving practical problems, data driven, is a novel approach in my
view that you have proposed.

I am concerned about the employment opportunities for the individuals that complete the program. It
seems they will be at a disadvantage to well-trained statisticians, mathematicians, and computer
scientists who have the skills the program will build plus the additional skills expected from the
specific disciplineThe program requires a bit more advanced statistical training such as an
Inference Course and a Theory of Generalized Linear Models (or other appropriate foundational
statistical course).
3. Satish Nargundkar, Ph.D., Assistant Professor of Managerial Sciences, Robinson College of
Business. Georgia State University

I read the document, and it is quite a thorough job. There is certainly a need for this kind of
program in the market. The key question, though, is about the end product - the document
mentions the job market that has been favorable to MS students with skills in this area. However,
for a PhD, I see the academic world as representing the job market as much as industry. The
DRAFT PROPOSAL Ph.D. in Analytics and Data Science Page 53

program should perhaps be a doctorate, like Executive Doctorates in some universities, or an MS in
data sciences. If it is be a true Ph.D., there needs to be a little more emphasis on research methods.
Overall, definitely an idea worth pursuing.