Usando El Tipo DataFrame de Python Pandas

Rel Guzman
PersonalWebsite/Blog
AboutMe
Usando El Tipo DataFrame de

Python Pandas
EnestasnotashagopruebasconlaestructuradedatosDataFrame.
Inicializacin con el tipo "DataFrame"

Un objeto "DataFrame" es como una tabla SQL o una hoja de
calculo. Lo que sigue son algunas formas de inicializacin.
In[1]:
#importacionestandardepandas
importpandasaspd
importnumpyasnp
fromIPython.displayimportdisplay
data=[10,20,30]
df=pd.DataFrame(data)
df
Out[1]:
0
0 10
1 20
2 30
A partir de un diccionario
In[2]:
data={'col1':[1.,2.,3.,4.],
'col2':[4.,3.,2.,1.]}
df
Out[2]:
col1 col2
01 4
12 3
23 2
34 1
O de un diccionario con "Series"
In[3]:
data={'col1':pd.Series([1.,2.,3.],index=['a','b','c']),
'col2':pd.Series([1.,2.,3.,4.],index=['a','b','c','d'
df
Out[3]:
col1 col2
a1 1
b2 2
c3 3
d NaN 4
A partir de un string JSON
In[4]:
json_str='[{"a":10,"d":1,"c":2,"b":3},\
{"a":20,"d":1,"c":2,"b":3}]'
df=pd.read_json(json_str)
df
Out[4]:
a bc d
0 10 3 2 1
1 20 3 2 1
A partir de un archivo JSON
In[5]:
df=pd.read_json("data/dumb_data.json")
df
Out[5]:
a bc d
0 10 3 2 1
1 20 3 2 1
Tomando solo algunos indices y columnas
In[6]:
df=pd.DataFrame(data,index=['d','b','a'],columns=['col2'])
df
Out[6]:
col2
d4
b2
a1
A partir de un archivo
In[7]:
df=pd.read_csv("data/dumb_data.csv")
df
Out[7]:
col1 col2
01 4
12 3
23 2
A partir de un StringIO
In[8]:
fromStringIOimportStringIO

tsv=StringIO("""
AgeHappiness#ofpetsReligion
10Nothappy0Notreligious
20Veryhappy2Islamic
2Prettyhappy4Hindu
""")
df=pd.read_csv(tsv,sep='\t')
df
Out[8]:
Age Happiness # of pets Religion
0 10 Not happy 0 Not religious
1 20 Very happy 2 Islamic
2 2 Pretty happy 4 Hindu
Un "DataFrame" contiene columnas de tipo "Series"
In[9]:
type(df["Age"])
Out[9]:
pandas.core.series.Series
Datos, indices y columnas
In[10]:
df.values
Out[10]:
array([[10L,'Nothappy',0L,'Notreligious'],
[20L,'Veryhappy',2L,'Islamic'],
[2L,'Prettyhappy',4L,'Hindu']],dtype=object)
In[11]:
df.index
Out[11]:
Int64Index([0,1,2],dtype='int64')
In[12]:
df.columns
Out[12]:
Index([u'Age',u'Happiness',u'#ofpets',u'Religion'],dtype='object')
Seleccionando Datos
Seleccionando columnas
In[13]:
df["Age"]
Out[13]:
010
120
22
Name:Age,dtype:int64
In[14]:
columnas=["Age","Happiness"]
df[columnas]
Out[14]:
Age Happiness
0 10 Not happy
1 20 Very happy
2 2 Pretty happy
Seleccionando las
In[15]:
df["Age"][0]
Out[15]:
10
In[16]:
filas=[0,18,19]
df["Age"][filas]
Out[16]:
010
18NaN
19NaN
Name:Age,dtype:float64
Seleccionando con condiciones
In[17]:
df[(df['Age']<=30)&\
(df['Age']>=20)]
Out[17]:
In[18]:
df
Out[18]:
Operaciones Basicas
In[19]:
(df["#ofpets"]/df["Age"])+100
Out[19]:
0100.0
1100.1
2102.0
dtype:float64
Agregando columnas
In[20]:
df_tmp=df.copy()
df_tmp["tmp"]=[1,2,3]
df_tmp
Out[20]:
Age Happiness # of pets Religion tmp
0 10 Not happy 0 Not religious 1
1 20 Very happy 2 Islamic 2
2 2 Pretty happy 4 Hindu 3
In[21]:
df_tmp["tmp_factorial"]=df_tmp["tmp"].apply(np.math.factorial)
df_tmp
Out[21]:
Age Happiness # of pets Religion tmp tmp_factorial
0 10 Not happy 0 Not religious 1 1
1 20 Very happy 2 Islamic 2 2
2 2 Pretty happy 4 Hindu 3 6
In[22]:
json_map={"Nothappy":0,"Prettyhappy":1,"Veryhappy":2
df_tmp["tmp_Happiness"]=df_tmp["Happiness"].map(json_map)
df_tmp
Out[22]:
Age Happiness # of pets Religion tmp tmp_factorial tmp_Happiness
0 10 Not happy 0 Not religious 1 1 0
1 20 Very happy 2 Islamic 2 2 2
2 2 Pretty happy 4 Hindu 3 6 1
Eliminando columnas
In[23]:
#axises0parafila,1paracolumna
df_tmp.drop(labels="tmp_factorial",axis=1)
Out[23]:
Age Happiness # of pets Religion tmp tmp_Happiness
0 10 Not happy 0 Not religious 1 0
1 20 Very happy 2 Islamic 2 2
2 2 Pretty happy 4 Hindu 3 1
In[24]:
df_tmp.drop(labels=["tmp_factorial","tmp"],axis=1)
Out[24]:
Age Happiness # of pets Religion tmp_Happiness
0 10 Not happy 0 Not religious 0
1 20 Very happy 2 Islamic 2
2 2 Pretty happy 4 Hindu 1
Eliminando las
In[25]:
#suponiendoquehayunafilacondatosfaltantes
df_tmp.loc[1,"Age"]=np.nan
df_tmp
Out[25]:
1 NaN Very happy 2 Islamic 2 2 2
22 Pretty happy 4 Hindu 3 6 1
Aqui suponemos datos faltantes, se elimina la la pero hay

otras maneras de hacer esto, se vera mas adelante
In[26]:
df_tmp=df_tmp.drop(labels=[1],axis=0)
df_tmp
Out[26]:
2 2 Pretty happy 4 Hindu 3 6 1
Y para reiniciar el indexado
In[27]:
df_tmp.reset_index()
Out[27]:
# of
index Age Happiness Religion tmp tmp_factorial tmp_Happiness
pets
Not
00 10 Not happy 0 1 1 0
religious
Pretty
12 2 4 Hindu 3 6 1
happy
Aplicando funciones
Aplicando funciones
In[28]:
df
Out[28]:
Algo util que se puede hacer es crear nuebas columnas a

partir de otras
In[29]:
deff(row):
returnrow["#ofpets"]*1.0/row["Age"]
printrow
print
df["age/#pets"]=df.apply(f,axis=1)#recorriendocolumnas
df
Out[29]:
Age Happiness # of pets Religion age/#pets
0 10 Not happy 0 Not religious 0.0
1 20 Very happy 2 Islamic 0.1
2 2 Pretty happy 4 Hindu 2.0
Otras funciones
Descriptores Estadisticos
In[30]:
df.describe(percentiles=[0.25,0.5,0.75])
Out[30]:
Age # of pets age/#pets
count 3.000000 3 3.000000
mean 10.666667 2 0.700000
std 9.018500 2 1.126943
min 2.000000 0 0.000000
25% 6.000000 1 0.050000
50% 10.000000 2 0.100000
75% 15.000000 3 1.050000
max 20.000000 4 2.000000
Primero elementos
In[31]:
df.head()
Out[31]:
Age Happiness # of pets Religion age/#pets
0 10 Not happy 0 Not religious 0.0
1 20 Very happy 2 Islamic 0.1
2 2 Pretty happy 4 Hindu 2.0
Trabajando con Datos faltantes

(missing data)
Los siguientes son ejemplos de inicializacion de datos que
tienen datos faltantes.
In[32]:
tsv=StringIO("""
AgeHappiness#ofpetsReligion
10Nothappy0NaN
NaNVeryhappy2Islamic
2Prettyhappy4Hindu
""")
df=pd.read_csv(tsv,sep='\t')
df
Out[32]:
0 10 Not happy 0 NaN
1 NaN Very happy 2 Islamic
22 Pretty happy 4 Hindu
Vericando si tenemos datos faltantes
In[33]:
df.notnull()
Out[33]:
0 True True True False
1 False True True True
2 True True True True
Columnas con datos completos (complete data) y faltantes

(missing data)
In[34]:
df.notnull().all()
Out[34]:
AgeFalse
HappinessTrue
#ofpetsTrue
ReligionFalse
dtype:bool
Son datos completos?
In[35]:
df.notnull().all().all()
Out[35]:
False
Entonces presenta datos faltantes, luego algo que podemos

hacer es rellenar con promedios (esto se puede hacer por
columna)
In[36]:
df.fillna(df.mean())
Out[36]:
Se relleno el dato faltante en "Age", el de "Religion" tal vez sea

mejor dejarlo asi ya que "NaN" representa un dato faltante a
pesar de su signicado
Y para eliminar las las que tienen datos faltantes en un

cojunto de columnas
In[37]:
df.dropna(subset=["Age"])
Out[37]:

O sino en todo el dataframe
In[38]:
df.dropna()
Out[38]:
SHARE : | | |
0Comments relguzman
1 Login
Recommend Share SortbyBest
Startthediscussion
Bethefirsttocomment.
Subscribe d AddDisqustoyoursiteAddDisqusAdd Privacy

Usando El Tipo DataFrame de Python Pandas

Transféré par

Informations du document

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Usando El Tipo DataFrame de Python Pandas

Transféré par

Droits d'auteur :

Formats disponibles

Rel Guzman

Usando El Tipo DataFrame de

Inicializacin con el tipo "DataFrame"

O de un diccionario con "Series"

A partir de un string JSON

A partir de un archivo JSON

Tomando solo algunos indices y columnas

Un "DataFrame" contiene columnas de tipo "Series"

Datos, indices y columnas

Seleccionando con condiciones

Aqui suponemos datos faltantes, se elimina la la pero hay

Y para reiniciar el indexado

Algo util que se puede hacer es crear nuebas columnas a

Trabajando con Datos faltantes

Vericando si tenemos datos faltantes

Columnas con datos completos (complete data) y faltantes

Son datos completos?

Entonces presenta datos faltantes, luego algo que podemos

Se relleno el dato faltante en "Age", el de "Religion" tal vez sea

Y para eliminar las las que tienen datos faltantes en un

Age Happiness # of pets Religion

O sino en todo el dataframe

Recommend Share SortbyBest

Subscribe d AddDisqustoyoursiteAddDisqusAdd Privacy

Vous aimerez peut-être aussi