Vous êtes sur la page 1sur 1

la programmation sous Python pour la science des données

Objectifs du TP
Savoir preparer des datasets avec le language Python
Savoir manipuler les principales librairies Python pour la manipulation des datasets

Python : DataPrep
DataPrep est une bibliothèque open source disponible pour python qui vous permet de préparer vos données à l’aide d’une seule
bibliothèque avec seulement quelques lignes de code. Dans ce TP, Je vous présenterai comment analyser et préparer ses données en
quelques lignes.

In [6]:
import pandas as pd

# creating a data frame


housingdf = pd.read_csv("housing.csv")
print(housingdf.head())

longitude latitude housing_median_age total_rooms total_bedrooms \


0 -122.23 37.88 41.0 880.0 129.0
1 -122.22 37.86 21.0 7099.0 1106.0
2 -122.24 37.85 52.0 1467.0 190.0
3 -122.25 37.85 52.0 1274.0 235.0
4 -122.25 37.85 52.0 1627.0 280.0

population households median_income median_house_value ocean_proximity


0 322.0 126.0 8.3252 452600.0 NEAR BAY
1 2401.0 1138.0 8.3014 358500.0 NEAR BAY
2 496.0 177.0 7.2574 352100.0 NEAR BAY
3 558.0 219.0 5.6431 341300.0 NEAR BAY
4 565.0 259.0 3.8462 342200.0 NEAR BAY

In [9]:
housingdf = housingdf.copy()

In [10]:
print(housingdf.head())

longitude latitude housing_median_age total_rooms total_bedrooms \


0 -122.23 37.88 41.0 880.0 129.0
1 -122.22 37.86 21.0 7099.0 1106.0
2 -122.24 37.85 52.0 1467.0 190.0
3 -122.25 37.85 52.0 1274.0 235.0
4 -122.25 37.85 52.0 1627.0 280.0

population households median_income median_house_value ocean_proximity


0 322.0 126.0 8.3252 452600.0 NEAR BAY
1 2401.0 1138.0 8.3014 358500.0 NEAR BAY
2 496.0 177.0 7.2574 352100.0 NEAR BAY
3 558.0 219.0 5.6431 341300.0 NEAR BAY
4 565.0 259.0 3.8462 342200.0 NEAR BAY

In [11]:
housingdf.describe()

Out[11]: longitude latitude housing_median_age total_rooms total_bedrooms population households median_income median_ho

count 20640.000000 20640.000000 20640.000000 20640.000000 20433.000000 20640.000000 20640.000000 20640.000000 206

mean -119.569704 35.631861 28.639486 2635.763081 537.870553 1425.476744 499.539680 3.870671 2068

std 2.003532 2.135952 12.585558 2181.615252 421.385070 1132.462122 382.329753 1.899822 1153

min -124.350000 32.540000 1.000000 2.000000 1.000000 3.000000 1.000000 0.499900 149

25% -121.800000 33.930000 18.000000 1447.750000 296.000000 787.000000 280.000000 2.563400 1196

50% -118.490000 34.260000 29.000000 2127.000000 435.000000 1166.000000 409.000000 3.534800 1797

75% -118.010000 37.710000 37.000000 3148.000000 647.000000 1725.000000 605.000000 4.743250 2647

max -114.310000 41.950000 52.000000 39320.000000 6445.000000 35682.000000 6082.000000 15.000100 5000

In [14]:
housingdf.plot()

Out[14]: <AxesSubplot:>

In [19]:
import random
import numpy as np
def add_random_bad_values(df,number_of_bad_values = 1000,seed=42):
dataframe = housingdf.copy()
bad_values = [np.NaN, None]
columns = dataframe.columns.tolist()
dataframe_size = len(dataframe)
random.seed(seed)
for null_values in range(number_of_bad_values):
random_column = random.choice(columns)
random_row = random.randint(0,dataframe_size)
dataframe.loc[random_row,random_column] = random.choice(bad_values)
return dataframe
housingdf = add_random_bad_values(housingdf)
housingdf.isnull().sum()

Out[19]: longitude 76
latitude 123
housing_median_age 101
total_rooms 97
total_bedrooms 318
population 103
households 88
median_income 98
median_house_value 103
ocean_proximity 97
dtype: int64

In [21]:
housingdf.plot()

Out[21]: <AxesSubplot:>

In [22]:
import seaborn as sns
import matplotlib.pyplot as plot

In [23]:
import pandas as pd

In [24]:
housingdf

Out[24]: longitude latitude housing_median_age total_rooms total_bedrooms population households median_income median_house_value o

0 -122.23 37.88 41.0 880.0 129.0 322.0 126.0 8.3252 452600.0

1 -122.22 37.86 21.0 7099.0 1106.0 2401.0 1138.0 8.3014 358500.0

2 -122.24 37.85 52.0 1467.0 190.0 496.0 177.0 7.2574 352100.0

3 -122.25 37.85 52.0 1274.0 235.0 558.0 219.0 5.6431 341300.0

4 -122.25 37.85 52.0 1627.0 280.0 565.0 259.0 3.8462 342200.0

... ... ... ... ... ... ... ... ... ...

20635 -121.09 39.48 25.0 1665.0 374.0 845.0 330.0 1.5603 78100.0

20636 -121.21 39.49 18.0 697.0 150.0 356.0 114.0 2.5568 77100.0

20637 -121.22 39.43 17.0 2254.0 485.0 1007.0 433.0 1.7000 92300.0

20638 -121.32 39.43 18.0 1860.0 409.0 741.0 349.0 1.8672 84700.0

20639 -121.24 39.37 16.0 2785.0 616.0 1387.0 530.0 2.3886 89400.0

20640 rows × 10 columns

In [25]:
plot.plot(housingdf.total_rooms,housingdf.population)

plot.show

Out[25]: <function matplotlib.pyplot.show(close=None, block=None)>

In [28]:
housingdf.plot()

Out[28]: <AxesSubplot:>

In [30]:
sns.pairplot(housingdf[['housing_median_age','total_rooms', 'population','median_income']])

Out[30]: <seaborn.axisgrid.PairGrid at 0xce323a7940>

In [38]:
fig = plot.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(housingdf.median_income, housingdf.population, housingdf.housing_median_age)
plot.show()

In [50]:
import pandas_profiling as pp

In [54]:
pp.ProfileReport(housingdf)

---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-54-f0df3894349c> in <module>
----> 1 pp.ProfileReport(housingdf)

C:\ProgramData\Anaconda3\lib\site-packages\pandas_profiling\__init__.py in __init__(self, df, **kwargs)


64 sample = kwargs.get('sample', df.head())
65
---> 66 description_set = describe(df, **kwargs)
67
68 self.html = to_html(sample,

C:\ProgramData\Anaconda3\lib\site-packages\pandas_profiling\describe.py in describe(df, bins, check_correlatio


n, correlation_threshold, correlation_overrides, check_recoded, pool_size, **kwargs)
390 if name not in names:
391 names.append(name)
--> 392 variable_stats = pd.concat(ldesc, join_axes=pd.Index([names]), axis=1)
393 variable_stats.columns.names = df.columns.names
394

TypeError: concat() got an unexpected keyword argument 'join_axes'

In [53]:
housingdf.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20640 entries, 0 to 20639
Data columns (total 10 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 longitude 20564 non-null float64
1 latitude 20517 non-null float64
2 housing_median_age 20539 non-null float64
3 total_rooms 20543 non-null float64
4 total_bedrooms 20322 non-null float64
5 population 20537 non-null float64
6 households 20552 non-null float64
7 median_income 20542 non-null float64
8 median_house_value 20537 non-null float64
9 ocean_proximity 20543 non-null object
dtypes: float64(9), object(1)
memory usage: 1.6+ MB

In [65]:
from pandas_profiling import ProfileReport

In [66]:
report= pp.ProfileReport(housesingdf,title='report')

---------------------------------------------------------------------------
RemoteTraceback Traceback (most recent call last)
RemoteTraceback:
"""
Traceback (most recent call last):
File "C:\ProgramData\Anaconda3\lib\multiprocessing\pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "C:\ProgramData\Anaconda3\lib\multiprocessing\pool.py", line 48, in mapstar
return list(map(*args))
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas_profiling\describe.py", line 282, in multiprocess_fun
c
return x[0], describe_1d(x[1], **kwargs)
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas_profiling\describe.py", line 270, in describe_1d
result = result.append(describe_numeric_1d(data, **kwargs))
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas_profiling\describe.py", line 54, in describe_numeric_
1d
stats['histogram'] = histogram(series, **kwargs)
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas_profiling\plot.py", line 73, in histogram
plot = _plot_histogram(series, **kwargs)
TypeError: _plot_histogram() got an unexpected keyword argument 'title'
"""

The above exception was the direct cause of the following exception:

TypeError Traceback (most recent call last)


<ipython-input-66-fd636f67fe20> in <module>
----> 1 report= pp.ProfileReport(housesingdf,title='report')

C:\ProgramData\Anaconda3\lib\site-packages\pandas_profiling\__init__.py in __init__(self, df, **kwargs)


64 sample = kwargs.get('sample', df.head())
65
---> 66 description_set = describe(df, **kwargs)
67
68 self.html = to_html(sample,

C:\ProgramData\Anaconda3\lib\site-packages\pandas_profiling\describe.py in describe(df, bins, check_correlatio


n, correlation_threshold, correlation_overrides, check_recoded, pool_size, **kwargs)
349 pool = multiprocessing.Pool(pool_size)
350 local_multiprocess_func = partial(multiprocess_func, **kwargs)
--> 351 ldesc = {col: s for col, s in pool.map(local_multiprocess_func, df.iteritems())}
352 pool.close()
353

C:\ProgramData\Anaconda3\lib\multiprocessing\pool.py in map(self, func, iterable, chunksize)


362 in a list that is returned.
363 '''
--> 364 return self._map_async(func, iterable, mapstar, chunksize).get()
365
366 def starmap(self, func, iterable, chunksize=None):

C:\ProgramData\Anaconda3\lib\multiprocessing\pool.py in get(self, timeout)


769 return self._value
770 else:
--> 771 raise self._value
772
773 def _set(self, i, obj):

TypeError: _plot_histogram() got an unexpected keyword argument 'title'

In [67]:
profile=ProfileReport(housingdf)

---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-67-fd105493fcca> in <module>
----> 1 profile=ProfileReport(housingdf)

C:\ProgramData\Anaconda3\lib\site-packages\pandas_profiling\__init__.py in __init__(self, df, **kwargs)


64 sample = kwargs.get('sample', df.head())
65
---> 66 description_set = describe(df, **kwargs)
67
68 self.html = to_html(sample,

C:\ProgramData\Anaconda3\lib\site-packages\pandas_profiling\describe.py in describe(df, bins, check_correlatio


n, correlation_threshold, correlation_overrides, check_recoded, pool_size, **kwargs)
390 if name not in names:
391 names.append(name)
--> 392 variable_stats = pd.concat(ldesc, join_axes=pd.Index([names]), axis=1)
393 variable_stats.columns.names = df.columns.names
394

TypeError: concat() got an unexpected keyword argument 'join_axes'

In [69]:
pip install -U https://github.com/pandas-profiling/pandas-profiling/archive/master.zip

Collecting https://github.com/pandas-profiling/pandas-profiling/archive/master.zipNote: you may need to restart


the kernel to use updated packages.
Using cached https://github.com/pandas-profiling/pandas-profiling/archive/master.zip
Collecting joblib~=1.1.0
Using cached joblib-1.1.0-py2.py3-none-any.whl (306 kB)
Requirement already satisfied: scipy>=1.4.1 in c:\programdata\anaconda3\lib\site-packages (from pandas-profilin
g==3.1.1) (1.6.2)
Requirement already satisfied: pandas!=1.0.0,!=1.0.1,!=1.0.2,!=1.1.0,>=0.25.3 in c:\programdata\anaconda3\lib\s
ite-packages (from pandas-profiling==3.1.1) (1.2.4)
Requirement already satisfied: matplotlib>=3.2.0 in c:\programdata\anaconda3\lib\site-packages (from pandas-pro
filing==3.1.1) (3.3.4)
Requirement already satisfied: pydantic>=1.8.1 in c:\programdata\anaconda3\lib\site-packages (from pandas-profi
ling==3.1.1) (1.8.2)
Requirement already satisfied: PyYAML>=5.0.0 in c:\programdata\anaconda3\lib\site-packages (from pandas-profili
ng==3.1.1) (5.4.1)
Requirement already satisfied: jinja2>=2.11.1 in c:\programdata\anaconda3\lib\site-packages (from pandas-profil
ing==3.1.1) (2.11.3)
Requirement already satisfied: markupsafe~=2.0.1 in c:\programdata\anaconda3\lib\site-packages (from pandas-pro
filing==3.1.1) (2.0.1)
Requirement already satisfied: visions[type_image_path]==0.7.4 in c:\programdata\anaconda3\lib\site-packages (f
rom pandas-profiling==3.1.1) (0.7.4)
Requirement already satisfied: numpy>=1.16.0 in c:\programdata\anaconda3\lib\site-packages (from pandas-profili
ng==3.1.1) (1.20.1)
Collecting htmlmin>=0.1.12
Using cached htmlmin-0.1.12-py3-none-any.whl
Collecting missingno>=0.4.2
Using cached missingno-0.5.0-py3-none-any.whl (8.8 kB)
Collecting phik>=0.11.1
Using cached phik-0.12.0-cp38-cp38-win_amd64.whl (659 kB)
Requirement already satisfied: tangled-up-in-unicode==0.2.0 in c:\programdata\anaconda3\lib\site-packages (from
pandas-profiling==3.1.1) (0.2.0)
Requirement already satisfied: requests>=2.24.0 in c:\programdata\anaconda3\lib\site-packages (from pandas-prof
iling==3.1.1) (2.25.1)
Requirement already satisfied: tqdm>=4.48.2 in c:\programdata\anaconda3\lib\site-packages (from pandas-profilin
g==3.1.1) (4.59.0)
Requirement already satisfied: seaborn>=0.10.1 in c:\programdata\anaconda3\lib\site-packages (from pandas-profi
ling==3.1.1) (0.11.1)
Requirement already satisfied: multimethod>=1.4 in c:\programdata\anaconda3\lib\site-packages (from pandas-prof
iling==3.1.1) (1.6)
Requirement already satisfied: networkx>=2.4 in c:\programdata\anaconda3\lib\site-packages (from visions[type_i
mage_path]==0.7.4->pandas-profiling==3.1.1) (2.5)
Requirement already satisfied: attrs>=19.3.0 in c:\programdata\anaconda3\lib\site-packages (from visions[type_i
mage_path]==0.7.4->pandas-profiling==3.1.1) (20.3.0)
Requirement already satisfied: Pillow in c:\programdata\anaconda3\lib\site-packages (from visions[type_image_pa
th]==0.7.4->pandas-profiling==3.1.1) (8.2.0)
Collecting imagehash
Using cached ImageHash-4.2.1-py2.py3-none-any.whl
Requirement already satisfied: kiwisolver>=1.0.1 in c:\programdata\anaconda3\lib\site-packages (from matplotlib
>=3.2.0->pandas-profiling==3.1.1) (1.3.1)
Requirement already satisfied: cycler>=0.10 in c:\programdata\anaconda3\lib\site-packages (from matplotlib>=3.
2.0->pandas-profiling==3.1.1) (0.10.0)

Requirement already satisfied: python-dateutil>=2.1 in c:\programdata\anaconda3\lib\site-packages (from matplot


lib>=3.2.0->pandas-profiling==3.1.1) (2.8.1)
Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.3 in c:\programdata\anaconda3\lib\site-pa
ckages (from matplotlib>=3.2.0->pandas-profiling==3.1.1) (2.4.7)
Requirement already satisfied: six in c:\programdata\anaconda3\lib\site-packages (from cycler>=0.10->matplotlib
>=3.2.0->pandas-profiling==3.1.1) (1.15.0)
Requirement already satisfied: decorator>=4.3.0 in c:\programdata\anaconda3\lib\site-packages (from networkx>=
2.4->visions[type_image_path]==0.7.4->pandas-profiling==3.1.1) (5.0.6)
Requirement already satisfied: pytz>=2017.3 in c:\programdata\anaconda3\lib\site-packages (from pandas!=1.0.0,!
=1.0.1,!=1.0.2,!=1.1.0,>=0.25.3->pandas-profiling==3.1.1) (2021.1)
Requirement already satisfied: typing-extensions>=3.7.4.3 in c:\programdata\anaconda3\lib\site-packages (from p
ydantic>=1.8.1->pandas-profiling==3.1.1) (3.7.4.3)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in c:\programdata\anaconda3\lib\site-packages (from reques
ts>=2.24.0->pandas-profiling==3.1.1) (1.26.4)
Requirement already satisfied: chardet<5,>=3.0.2 in c:\programdata\anaconda3\lib\site-packages (from requests>=
2.24.0->pandas-profiling==3.1.1) (4.0.0)
Requirement already satisfied: certifi>=2017.4.17 in c:\programdata\anaconda3\lib\site-packages (from requests>
=2.24.0->pandas-profiling==3.1.1) (2020.12.5)
Requirement already satisfied: idna<3,>=2.5 in c:\programdata\anaconda3\lib\site-packages (from requests>=2.24.
0->pandas-profiling==3.1.1) (2.10)
Requirement already satisfied: PyWavelets in c:\programdata\anaconda3\lib\site-packages (from imagehash->vision
s[type_image_path]==0.7.4->pandas-profiling==3.1.1) (1.1.1)
Building wheels for collected packages: pandas-profiling
Building wheel for pandas-profiling (setup.py): started
Building wheel for pandas-profiling (setup.py): finished with status 'done'
Created wheel for pandas-profiling: filename=pandas_profiling-3.1.1-py2.py3-none-any.whl size=261270 sha256=2
782de5acb2e43d10759329301df5577e5818844cb15c9dfd812e87a2fe0eb7b
Stored in directory: C:\Users\DELL\AppData\Local\Temp\pip-ephem-wheel-cache-c_l78j_o\wheels\64\b6\85\dfc808b2
3666a5910371784e349d28818006ff63ed9cfeca59
Successfully built pandas-profiling
Installing collected packages: joblib, imagehash, phik, missingno, htmlmin, pandas-profiling
Attempting uninstall: joblib
Found existing installation: joblib 1.0.1
Uninstalling joblib-1.0.1:
Successfully uninstalled joblib-1.0.1
Attempting uninstall: pandas-profiling
Found existing installation: pandas-profiling 1.4.1
Uninstalling pandas-profiling-1.4.1:
Successfully uninstalled pandas-profiling-1.4.1
Successfully installed htmlmin-0.1.12 imagehash-4.2.1 joblib-1.1.0 missingno-0.5.0 pandas-profiling-3.1.1 phik-
0.12.0

In [70]:
pip install -U pandas-profiling

Requirement already satisfied: pandas-profiling in c:\programdata\anaconda3\lib\site-packages (3.1.1)


Requirement already satisfied: markupsafe~=2.0.1 in c:\programdata\anaconda3\lib\site-packages (from pandas-pro
filing) (2.0.1)
Requirement already satisfied: numpy>=1.16.0 in c:\programdata\anaconda3\lib\site-packages (from pandas-profili
ng) (1.20.1)
Requirement already satisfied: PyYAML>=5.0.0 in c:\programdata\anaconda3\lib\site-packages (from pandas-profili
ng) (5.4.1)
Requirement already satisfied: requests>=2.24.0 in c:\programdata\anaconda3\lib\site-packages (from pandas-prof
iling) (2.25.1)
Requirement already satisfied: joblib~=1.1.0 in c:\programdata\anaconda3\lib\site-packages (from pandas-profili
ng) (1.1.0)
Requirement already satisfied: matplotlib>=3.2.0 in c:\programdata\anaconda3\lib\site-packages (from pandas-pro
filing) (3.3.4)
Requirement already satisfied: visions[type_image_path]==0.7.4 in c:\programdata\anaconda3\lib\site-packages (f
rom pandas-profiling) (0.7.4)
Requirement already satisfied: htmlmin>=0.1.12 in c:\programdata\anaconda3\lib\site-packages (from pandas-profi
ling) (0.1.12)
Requirement already satisfied: seaborn>=0.10.1 in c:\programdata\anaconda3\lib\site-packages (from pandas-profi
ling) (0.11.1)
Requirement already satisfied: pydantic>=1.8.1 in c:\programdata\anaconda3\lib\site-packages (from pandas-profi
ling) (1.8.2)
Requirement already satisfied: pandas!=1.0.0,!=1.0.1,!=1.0.2,!=1.1.0,>=0.25.3 in c:\programdata\anaconda3\lib\s
ite-packages (from pandas-profiling) (1.2.4)
Requirement already satisfied: missingno>=0.4.2 in c:\programdata\anaconda3\lib\site-packages (from pandas-prof
iling) (0.5.0)
Requirement already satisfied: tangled-up-in-unicode==0.2.0 in c:\programdata\anaconda3\lib\site-packages (from
pandas-profiling) (0.2.0)
Requirement already satisfied: phik>=0.11.1 in c:\programdata\anaconda3\lib\site-packages (from pandas-profilin
g) (0.12.0)
Requirement already satisfied: tqdm>=4.48.2 in c:\programdata\anaconda3\lib\site-packages (from pandas-profilin
g) (4.59.0)
Requirement already satisfied: multimethod>=1.4 in c:\programdata\anaconda3\lib\site-packages (from pandas-prof
iling) (1.6)
Requirement already satisfied: scipy>=1.4.1 in c:\programdata\anaconda3\lib\site-packages (from pandas-profilin
g) (1.6.2)
Requirement already satisfied: jinja2>=2.11.1 in c:\programdata\anaconda3\lib\site-packages (from pandas-profil
ing) (2.11.3)
Requirement already satisfied: networkx>=2.4 in c:\programdata\anaconda3\lib\site-packages (from visions[type_i
mage_path]==0.7.4->pandas-profiling) (2.5)
Requirement already satisfied: attrs>=19.3.0 in c:\programdata\anaconda3\lib\site-packages (from visions[type_i
mage_path]==0.7.4->pandas-profiling) (20.3.0)
Requirement already satisfied: Pillow in c:\programdata\anaconda3\lib\site-packages (from visions[type_image_pa
th]==0.7.4->pandas-profiling) (8.2.0)
Requirement already satisfied: imagehash in c:\programdata\anaconda3\lib\site-packages (from visions[type_image
_path]==0.7.4->pandas-profiling) (4.2.1)
Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.3 in c:\programdata\anaconda3\lib\site-pa
ckages (from matplotlib>=3.2.0->pandas-profiling) (2.4.7)
Requirement already satisfied: python-dateutil>=2.1 in c:\programdata\anaconda3\lib\site-packages (from matplot
lib>=3.2.0->pandas-profiling) (2.8.1)
Requirement already satisfied: cycler>=0.10 in c:\programdata\anaconda3\lib\site-packages (from matplotlib>=3.
2.0->pandas-profiling) (0.10.0)
Requirement already satisfied: kiwisolver>=1.0.1 in c:\programdata\anaconda3\lib\site-packages (from matplotlib
>=3.2.0->pandas-profiling) (1.3.1)
Requirement already satisfied: six in c:\programdata\anaconda3\lib\site-packages (from cycler>=0.10->matplotlib
>=3.2.0->pandas-profiling) (1.15.0)
Requirement already satisfied: decorator>=4.3.0 in c:\programdata\anaconda3\lib\site-packages (from networkx>=
2.4->visions[type_image_path]==0.7.4->pandas-profiling) (5.0.6)
Requirement already satisfied: pytz>=2017.3 in c:\programdata\anaconda3\lib\site-packages (from pandas!=1.0.0,!
=1.0.1,!=1.0.2,!=1.1.0,>=0.25.3->pandas-profiling) (2021.1)
Requirement already satisfied: typing-extensions>=3.7.4.3 in c:\programdata\anaconda3\lib\site-packages (from p
ydantic>=1.8.1->pandas-profiling) (3.7.4.3)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in c:\programdata\anaconda3\lib\site-packages (from reques
ts>=2.24.0->pandas-profiling) (1.26.4)
Requirement already satisfied: certifi>=2017.4.17 in c:\programdata\anaconda3\lib\site-packages (from requests>
=2.24.0->pandas-profiling) (2020.12.5)
Requirement already satisfied: idna<3,>=2.5 in c:\programdata\anaconda3\lib\site-packages (from requests>=2.24.
0->pandas-profiling) (2.10)
Requirement already satisfied: chardet<5,>=3.0.2 in c:\programdata\anaconda3\lib\site-packages (from requests>=
2.24.0->pandas-profiling) (4.0.0)
Requirement already satisfied: PyWavelets in c:\programdata\anaconda3\lib\site-packages (from imagehash->vision
s[type_image_path]==0.7.4->pandas-profiling) (1.1.1)
Note: you may need to restart the kernel to use updated packages.

In [2]:
from pandas_profiling import ProfileReport

In [4]:
import pandas_profiling as pp
from pandas_profiling import ProfileReport

In [9]:
report= pp.ProfileReport(housingdf,title='report')

In [10]:
report.to_file(output_file="report.html")

In [11]:
#showing the profile with:
report

report Overview Variables Interactions Correlations Missing values Sample

Overview

Overview Alerts 32 Reproduction

Dataset statistics Variable types


Number of variables 10 Numeric 9

Number of observations 20640 Categorical 1

Missing cells 207

Missing cells (%) 0.1%

Duplicate rows 0

Duplicate rows (%) 0.0%

Total size in memory 1.6 MiB

Average record size in memory 80.0 B

Variables

longitude Distinct 844 Minimum -124.35


Real number (ℝ)
Distinct 4.1% Maximum -114.31

Out[11]:

In [ ]:

Vous aimerez peut-être aussi