Vous êtes sur la page 1sur 1

Python For Data Science Cheat Sheet Asking For Help Dropping

>>> help(pd.Series.loc)
>>> s.drop(['a', 'c']) Drop values from rows (axis=0)
Pandas Basics Selection Also see NumPy Arrays >>> df.drop('Country', axis=1) Drop values from columns(axis=1)
Learn Python for Data Science Interactively at www.DataCamp.com
Getting
>>> s['b'] Get one element Sort & Rank
-5
Pandas >>> df.sort_index() Sort by labels along an axis
>>> df.sort_values(by='Country') Sort by the values along an axis
>>> df[1:] Get subset of a DataFrame
The Pandas library is built on NumPy and provides easy-to-use Country Capital Population >>> df.rank() Assign ranks to entries
data structures and data analysis tools for the Python 1 India New Delhi 1303171035
2 Brazil Braslia 207847528
programming language. Retrieving Series/DataFrame Information
Selecting, Boolean Indexing & Setting Basic Information
Use the following import convention: By Position >>> df.shape (rows,columns)
>>> import pandas as pd >>> df.iloc([0],[0]) Select single value by row & >>> df.index Describe index
'Belgium' column >>> df.columns Describe DataFrame columns
Pandas Data Structures >>> df.iat([0],[0])
>>>
>>>
df.info()
df.count()
Info on DataFrame
Number of non-NA values
Series 'Belgium'
Summary
A one-dimensional labeled array a 3 By Label
>>> df.loc([0], ['Country']) Select single value by row & >>> df.sum() Sum of values
capable of holding any data type b -5
'Belgium' column labels >>> df.cumsum() Cummulative sum of values
>>> df.min()/df.max() Minimum/maximum values
c 7 >>> df.at([0], ['Country']) >>> df.idxmin()/df.idxmax()
Index Minimum/Maximum index value
d 4 'Belgium' >>> df.describe() Summary statistics
>>> df.mean() Mean of values
>>> s = pd.Series([3, -5, 7, 4], index=['a', 'b', 'c', 'd'])
By Label/Position >>> df.median() Median of values
>>> df.ix[2] Select single row of
DataFrame Country
Capital
Brazil
Braslia
subset of rows Applying Functions
Population 207847528 >>> f = lambda x: x*2
Columns
Country Capital Population A two-dimensional labeled >>> df.ix[:,'Capital'] Select a single column of >>> df.apply(f) Apply function
>>> df.applymap(f) Apply function element-wise
data structure with columns 0 Brussels subset of columns
0 Belgium Brussels 11190846 1 New Delhi
of potentially different types 2 Braslia Data Alignment
1 India New Delhi 1303171035
Index >>> df.ix[1,'Capital'] Select rows and columns
2 Brazil Braslia 207847528 Internal Data Alignment
'New Delhi'
NA values are introduced in the indices that dont overlap:
Boolean Indexing
>>> data = {'Country': ['Belgium', 'India', 'Brazil'], >>> s3 = pd.Series([7, -2, 3], index=['a', 'c', 'd'])
>>> s[~(s > 1)] Series s where value is not >1
'Capital': ['Brussels', 'New Delhi', 'Braslia'], >>> s[(s < -1) | (s > 2)] s where value is <-1 or >2 >>> s + s3
'Population': [11190846, 1303171035, 207847528]} >>> df[df['Population']>1200000000] Use filter to adjust DataFrame a 10.0
b NaN
>>> df = pd.DataFrame(data, Setting
c 5.0
columns=['Country', 'Capital', 'Population']) >>> s['a'] = 6 Set index a of Series s to 6
d 7.0

I/O Arithmetic Operations with Fill Methods


You can also do the internal data alignment yourself with
Read and Write to CSV Read and Write to SQL Query or Database Table
the help of the fill methods:
>>> pd.read_csv('file.csv', header=None, nrows=5) >>> from sqlalchemy import create_engine >>> s.add(s3, fill_value=0)
>>> pd.to_csv('myDataFrame.csv') >>> engine = create_engine('sqlite:///:memory:') a 10.0
>>> pd.read_sql("SELECT * FROM my_table;", engine) b -5.0
Read and Write to Excel c 5.0
>>> pd.read_sql_table('my_table', engine) d 7.0
>>> pd.read_excel('file.xlsx') >>> pd.read_sql_query("SELECT * FROM my_table;", engine) >>> s.sub(s3, fill_value=2)
>>> pd.to_excel('dir/myDataFrame.xlsx', sheet_name='Sheet1') >>> s.div(s3, fill_value=4)
read_sql()is a convenience wrapper around read_sql_table() and
Read multiple sheets from the same file >>> s.mul(s3, fill_value=3)
read_sql_query()
>>> xlsx = pd.ExcelFile('file.xls')
>>> df = pd.read_excel(xlsx, 'Sheet1') >>> pd.to_sql('myDf', engine) DataCamp
Learn Python for Data Science Interactively