Vous êtes sur la page 1sur 3

Python Development Cycle:

Python's development cycle is dramatically shorter than that of traditional tools. In Python, there are no
compile or link steps -- Python programs simply import modules at runtime and use the objects they
contain. Because of this, Python programs run immediately after changes are made. And in cases where
dynamic module reloading can be used, it's even possible to change and reload parts of a running program
without stopping it at all. Figure shows Python's impact on the development cycle.

Because Python is interpreted, there's a rapid turnaround after program changes. And because Python's
parser is embedded in Python-based systems, it's easy to modify programs at runtime. For example, we
saw how GUI programs developed with Python allow developers to change the code that handles a button
press while the GUI remains active; the effect of the code change may be observed immediately when the
button is pressed again. There's no need to stop and rebuild.

More generally, the entire development process in Python is an exercise in rapid prototyping. Python
lends itself to experimental, interactive program development, and encourages developing systems
incrementally by testing components in isolation and putting them together later. In fact, we've seen that
we can switch from testing components (unit tests) to testing whole systems (integration tests) arbitrarily.
OOP
Python makes OOP a flexible tool by delivering it in a dynamic language. More importantly, its class
mechanism is a simplified subset of C++'s, and it's this simplification that makes OOP useful in the context
of a rapid-development tool. For instance, when we looked at data structure classes in this book, we saw
that Python's dynamic typing let us apply a single class to a variety of object types; we didn't need to write
variants for each supported type.

In fact, Python's OOP is so easy to use that there's really no reason not to apply it in most parts of an
application. Python's class model has features powerful enough for complex programs, yet because
they're provided in simple ways, they don't interfere with the problem we're trying to solve.
Data Acquisition - may involve acquiring data from both internal and external sources, including social
media or web scraping. In a steady state, data extraction and transfer routines would be in place, and new
sources, once identified would be acquired following the established processes.
Data preparation - Usually referred to as "data wrangling", this step involves cleaning the data and
reshaping it into a readily usable form for performing data science. This is similar to the traditional ETL
steps in data warehousing in certain aspects, but involves more exploratory analysis and is primarily aimed
at extracting features in usable formats.
Hypothesis and modeling are the traditional data mining steps - however in a data science project, these
are not limited to statistical samples. Indeed the idea is to apply machine learning techniques to all data.
A key sub-step is performed here for model selection. This involves the separation of a training set for
training the candidate machine-learning models, and validation sets and test sets for comparing model
performances and selecting the best performing model, gauging model accuracy and preventing over-
fitting.
Steps 2 through 4 are repeated a number of times as needed; as the understanding of data and business
becomes clearer and results from initial models and hypotheses are evaluated, further tweaks are
performed. These may sometimes include Step 5 (deployment) and be performed in a pre-production or
"limited" / "pilot" environment before the actual full-scale "production" deployment, or could include
fast-tweaks after deployment, based on the continuous deployment model.

Once the model has been deployed in production, it is time for regular maintenance and operations. This
operations phase could also follow a target DevOps model which gels well with the continuous
deployment model, given the rapid time-to-market requirements in big data projects. Ideally,
the deployment includes performance tests to measure model performance, and can trigger alerts when
the model performance degrades beyond a certain acceptable threshold.
The optimization phase is the final step in the data science project life-cycle. This could be triggered by
failing performance, or due to the need to add new data sources and retraining the model, or even to
deploy improved versions of the model based on better algorithms.
Agile development processes, especially continuous delivery lends itself well to the data science project
life-cycle. As mentioned before, with increasing maturity and well-defined project goals, pre-defined
performance criteria can help evaluate feasibility of the data science project early enough in the life-cycle.
This early comparison helps the data science team to change approaches, refine hypothesis and even
discard the project if the business case is nonviable or the benefits from the predictive models are not
worth the effort to build it.

Vous aimerez peut-être aussi