Vous êtes sur la page 1sur 13
‘1110724, 1050 AM. Tre quan ha a igu_matposioipynb - Colaboratory Data Visualization - Truc quan héa dit liu la phuong php truyén dat di liéu, théng tin duéi dang do thi, biéu dé truc quan sinh dong. Trye quan héa dif ligu thutng duge str dung sau khi cé két qua phan tich tir do ligu tc [a da 6 théng tin dé trinh bay cho ngudi ding. Ngoai ra Tryc quan héa dir ligu con duge sir dung truéc khi di liéu duge dura vao gial doan phan tich dé c6 thé hiéu vé c&c bién der liéu, méi lién hé gitta ching tir d6 c6 nhting quyét dinh phan tich higu qua hon. Thyc hanh True quan héa dif liéu. Khai bdo céc thu vién: + pandas: Lam viée véi cau trtic dif ligu, hé tro doc ghi file dir igu véi nhiéu dinh dang khéc nhau nhu esy, xls, sql, html + matplotlib: Thu vién hé tro vé cdc dé thi trong Python + Numpy cung céip cdc déi trong va phuong thtic dé lam viée véi mang nhiéu chiéu va cdc phép todn dai sé tuyén tinh. import pandas as pd import matplotlib.pyplot as plt import matplotlib.ticker as ticker import numpy as np Doc file di ligu "Work_data.csv" khdo sat mic lyong cia nhan vién cé kinh nghiém tir 0-10 nam cua mét s6 nganh vao Dataframes df cua python. Dataframes duge cau tao nhu mét mang hai chiéu gom cdc cot va cdc dong. df = pd.read_csv("Work_data.csv" df.shape #86 lugng mu va $6 thudc tinh quan sat (79, 3) df.head(1) # Xem 1@ dong di ligu dau tién hitpscolab research google comidival'-UVExoIRUadbPOeJVUeadYEjzIVZ2qQHrin!Mode=ttue na ‘1110724, 1050 AM. SoNamkinhNghiem ° 7 1 4 2 8 3 9 4 1 5 2 6 4 print(df) # Xen toan bo SoNanKinhNghien e 7 1 4 2 8 3 9 4 1 74 8 75 5 76 9 7 6 78 8 Tre quan ha a igu_matposioipynb - Colaboratory Luong NganhNghe 26.0 13.8 215 24.0 78 100 135 di 1igu KeToan KeToan KeToan KeToan KeToan KeToan KeToan Luong NganhNghe 26. 13. 21. 24. 7 wonas 22. 16. 25. 18. 22. NoeVul [79 rows x 3 columns] KeToan KeToan KeToan KeToan KeToan sale sale sale sale sale Xem cdc théng tin théng ké déi véi cdc dif ligu dinh long: count: téng sé cdc ban ghi trong dit liéu © min, max: gid tri I6n nhat nhé nhat ca dor liu mean: gid tri trung binh std: 46 léch chun, gid tri STD Ién hay nhé phan énh tp dif liéu phan bé tp trung quanh diém trung tam hay rai xa né (6 day la mean) df.deseribe() # hitpscolab research google comidrival'-UVExo!RUatbPOeJVUeadYEjzaIVZ2qGtorintMod: ana ‘1110721, 10:50 AM “Trve quan héa 6 u_matpltl.pynb - Colaboratory SoNamKinhNghien Luong count 79,000000 79,0000 mean 5.367089 17.355696 std 3.105732 6.313391 éi tén cc thuéc tinh cn ding vé dang don gidn gitip dé dang truy cap sau nay nam = df['SoNamkinhNghiem" ] df[ ‘Luong’ ] df ‘NganhNghe ] max 0.000000 27.600000 > Vé dé thi, biéu dé » Biéu d6 cét/thanh doc Bar chart + Méi quan hé gitta sé nam kinh nghiém va luong plt.bar(nam, luong) plt.show() Thém nhan cho biéu dd plt.xlabel('Sé ndm kinh nghign') plt.ylabel( Luong’) plt.suptitle('muc Luong theo sé nam kinh nghiem') # Hign thi nhan cia céc nam plt.xticks(rotation=45) hitpscolab research google comidival'-UVExoIRUadbPOeJVUeadYEjzIVZ2qQHrin!Mode=ttue ana ‘1110721, 10:50 AM “Trve quan héa 6 u_matpltl.pynb - Colaboratory plt.xticks (nam) plt.bar(nam, Luong) plt.shon() Mac luang theo s6 nim kinh nghigm oy >FeHoOoAKA TOD Sénam kin nghigm > Biéu dé cét ngang plt.xlabel('Sé nam kinh nghién') plt.ylabel( Luong") plt.suptitle('mic lvong theo sé nim kinh nghigm’) plt.barh(nam, luong, color = ‘blue') plt.show() Mac luang theo sé nam kinh nghiém ry 3 2 B S6nam kinn nghigm » Vé nhiéu dé thi trong cling mét anh hitpscolab research google comidival'-UVExoIRUadbPOeJVUeadYEjzIVZ2qQHrin!Mode=ttue ana ‘1110724, 1050 AM. Tre quan ha a igu_matposioipynb - Colaboratory + Vidu vé ba dé thi vé méi quan hé gia Iuong va sé nam kinh nghiém cia tig nganh nghé Buse 1: Tach bang dif ligu ban dau thanh cae bang con theo céc tiéu chi + Vidy téch bang di ligu thanh ba bang con theo nganh nghé df_ketoan = df[dF[ 'NganhNghe’] == 'keToan'] df_hens = d¥[d#[‘NganhNghe'] == ‘HCNS*] df_sale = df[df[‘NganhNghe'] == 'Sale'] print ("S6 Iugng mu nhan vien Ké todn: " + str(df_ketoan.shape[0])) print ("Sé Iugng mu nhan vien HCNS: “ + str(df_hens.shape[0])) print ("S6 lugng mu nhan vien SALE: “ + str(df_sale.shape[0])) print("bi 1igu bang ké todn") print (df_ketoan) S6 lugng miu nhan vien Ké toan: 19 6 lung miu nhan vien HCNS: 26 S6 lugng mu nhan vien SALE: 4@ Dir 1igu bang ké toan ‘SoNamKinhNghiem Luong NganhNghe e 7 26.8 KeToan 1 4 13.8 KeToan 2 8 21.5 KeToan 3 9 24.8 KeToan 4 1 7.8 KeToan 5 2 10.8 KeToan 6 4 13.5 KeToan 7 5 15.8 KeToan 8 6 17.5 KeToan 9 1@ 26.1 KeToan 10 18.3. KeToan u 9 23.9 KeToan 12 @ 5.8 KeToan 13 7 19.9 KeToan 4 4 13.8 KeToan 15 3 11.6 KeToan 16 311.7 keToan 7 1@ 25.8 KeToan 18 2. 9.6 KeToan Bude 2: Vé 3 d6 thi Trong Matplotlib, méi plt.plot() tra vé mét déi tugng Figure (la hinh anh bén ngoai), trong Figure lai cé thé c6 nhiéu cde di tugng Axes la cdc dé thi con bén trong Fig, (ax1, ax2, ax3) = plt.subplots(1,3, figsize=(10,4), sharey=True, dpi=8@) # Tao dd thi gd ax1.bar(df_ketoan[ 'SoNamkKinhNghiem’ ], df_ketoan[ ‘Luong’ ]) ax2.bar(d#_hens[ ' SoNamKinhNghiem’ ],d# hcns[ ‘Luong’ ]) ax3. bar(d#_sale[ 'SoNamkinhNghiem’ ],d#_sale[ 'Luong']) hitpscolab research google comidrival'-UVExo!RUatbPOeJVUeadYEjzaIVZ2qGtorintMod: ena ‘1110721, 10:50 AM “Trve quan héa 6 u_matpltl.pynb - Colaboratory # Tiéu d@ va nhdn cla cde dd thi con axl.set_title('ké todn') ax2.set_title(‘Hanh chinh nhan sy") ax3.set_title(‘Sale') ax1.set_xlabel('Sé nam kinh nghién’) ‘ax2.set_xlabel("Sé nam kinh nghiém') ax3.set_xlabel("S6 nam kinh nghiém’) ax1. set_ylabel (‘Luong’) ax2.set_ylabel(‘Lyong") ax3. set_ylabel(‘Lyong") plt.tight_layout() plt.show() Ké toan Hanh chinh nhan sy sale » P i yl : ; Sl natu! Lead ucltene kasd ant ~ Biéu dé thanh xép chong - Stacked Bar Chart + Vé biéu dé thanh xp chéng ctia trung binh cng Iurong va trung binh céng sé nim kinh nghiém theo céc nganh nghé tbc_luong_nghe = df.groupby( ‘NganhNghe" )[[‘ Luong" ]].mean() ‘tbc_nan_nghe = df. groupby( ‘NganhNghe’ )[['SoNamKinhNghien’ }].mean() print ("TBC luong theo nghé","\n', tbc_luong_nghe, *\n') print("TBC nam kinh nghigm theo nghé","\n" ,tbc_nam_nghe) plt.xlabel('Céc nganh nghé', color = ‘green’, fontweight = ‘bold’, fontsize = ‘15') plt.bar(tbc_luong_nghe. index, tbc_luong_nghe[ ‘Luong'], label="TBC luong') plt.bar(tbc_nam_nghe. index, tbc_nam_nghe[ 'SoNamkinhNghien"], label='T&C nam kinh nghiém' ) hitpscolab research google comidival'-UVExoIRUadbPOeJVUeadYEjzIVZ2qQHrin!Mode=ttue ena ‘1110724, 1050 AM. Tre quan ha a igu_matposioipynb - Colaboratory plt.legend() # Hién thi nhdn Label plt.show() TBC luong theo nghé Luong NganhNghe HCNS 16.790000 KeToan 16.126316 sale 18.222500 TBC nam kinh nghigm theo nghé SoNamkinhNghiem NganhNghe HENS 5.200 KeToan 5.000 sale 5.625, mm TECIuong ‘mm TEC nm kinh nghigm us 0 ws KeTean Cac nganh nghé ~ Biéu dé thanh nhom - Grouped Bar Chart Cho phép cc thanh ating lin nhau * Vé biéu 6 minh hoa cac gia tri long |dn nhat, nho nhat, trung binh theo méi nganh nghé # Tinh luong 1én nhdt, nhé nhdt, trung binh theo mi nganh nghé max_luong_nghe = df.groupby( ‘NganhNghe" )[[ ‘Luong’ ]].max() min_luong_nghe = df.groupby( ‘NganhNghe" )[[ ‘Luong’ ]].min() ‘tbc_luong_nghe = d.groupby( ‘NganhNghe' )[[ Luong’ ]].mean() x = np.arange(len(max_luong_nghe.index)) # Tinh vi tri hign thi cla céc nhém. Két qu tra vé # Nhém 1 c6 vj tri trén tryc x tir @-2, nhém 2 ti vi width = @.2 # Kich thuée cdc thanh fig, ax = plt.subplots() ax.bar(x , max_luong_nghe['Luong"], width, label="Luong max") hitpscolab research google comidrival'-UVExo!RUatbPOeJVUeadYEjzaIVZ2qGtorintMod: ue ma ‘1110724, 1050 AM. Tre quan ha a igu_matposioipynb - Colaboratory ax.bar(x + @.2, min_luong_nghe['Luong'], width, label= ax.bar(x + @.4, tbc_luong_nghe[‘Luong'], width, labe # Add sone text for labels, title and custom x-axis tick labels, etc. ax.set_title('Gia tri wong theo nganh nghé") ax.set_xticks(x) ax.set_xticklabels(max_luong_nghe.index) # Thiét 1§p chudi cdc nhan ax. legend() fig.tight_layout() plt.show() [e122] Gia tri luong theo nganh nghé zs » 6 » 5 HENS KeToan ~ Biéu dé hop Biéu dé hép (Box plot) hay cbn goi la biéu dé hop-va-rau (box-and-whisker plot) Ia biéu dé dién ta 5 Vitri phan b6 ca dé ligu, d6 la: gid tri nhé nhat khéng ngoai lai (rau dui) (L), te phan vi the nhat (Q1), trung vi (median), tir phan vi thir 3 (Q3) va gia tri [én nhat khéng ngoai lai (rau trén) (U). Biéu dé hép cho biét phan bé cia dif ligu va xa dinh cdc gid tri ngoai lal: + Néu duting trung vi chia chiéc hép thanh 2 niza déu nhau, thi tap di liéu nay déi xing (symmetric). Néu nia dudi nhé hon niza trén thi tap dtr ligu bi léch phai (right-skewed), va ngugc lai, néu nira dui lén hon thi tap dif liéu bi léch trai (left skewed). * Cac gia tri ngoai lai (néu 6) sé xuat hién bén ngoai phia dudi rau dui va phia trén rau trén # V8 bidu dd hop mo td ti 1@ CO binh quan dau ngubi cha 8 quéc gia dong dan sé nhdt trén thé nuoc = np.array(['china’," india’, ‘us’, ‘indonesia’, ‘brazil’, ‘pakistan’, ‘russia’, ‘bangladesh’ ]) co2 = np.array([4.9,1.4,18.9,1.8,1.9,0.9,10.8,0.3]) plt.boxplot(co2) hitpscolab research google comidival'-UVExoIRUadbPOeJVUeadYEjzIVZ2qQHrin!Mode=ttue ana ‘1110724, 1050 AM. plt.show() Ta c6 két qua nhy sau: Tre quan ha a igu_matposioipynb - Colaboratory a1. Min = 0.3 #2. Qi = 1.275 8. Trung vi median= 1.85 #4. Q3= 6.375 #5. Max = 18.9 #6. TOR = Q3-Ql= 5.1 #7. Gid tri thdp cla bién L = Q1-1,5xIQR = -6.375 #8. Gid tri cao cia bién U = Q3 + 1,5xIQR = 14.025 #9. Tw (7) va (8), ta suy ra us = 18.9 18 mot gid tri ngoai bién. us 60 ws 100 18 50 25 @ a= # Bigu dd hop phan bs Ivong cua nhan vién KE todn pit. boxplot (df_ketoan[ ‘Luong’ ]) plt.show() Bo Ds 20 wo ps wo 75 50 ~ Biéu d6 dudng Line Plot Buge str dung dé mé ta xu hung bién déng tang hay gidm cua dif liéu hntpseolab research google comidival'-UVExo!RUadbPOeJVUeadYEjzaIVZ2qCtorinMode ona ‘1110724, 1050 AM. Tre quan ha a igu_matposioipynb - Colaboratory + Vé biéu dé dung cla Luong theo 79 dang di lig * Mét sé dinh dang: © go“: cdc diém 6 mau xanh (g) va ndi hai diém la duéng thang (c6 thé thay mau va kiéu dung vi du 'yo-.) © ‘go’ chi hién thj cdc diém cua dé thi © ‘rt! cdc diém hinh ngéi sao mau do, dudng ndi cdc diém dang -. © 'bD-! cdc diém hinh kim cuong mau xanh duong, duéng ndi cdc diém dang « © 'g® cdc diém hinh tam giéc hung lén mau xanh Ié, dung néi céc diém dang ~ plt.ylabel(‘Luong") plt.grid() plt.plot(luong, ‘go-') plt.show() 2% Long w Vé biéu d6 dung mé t4 mdi quan hé gitta sé nam kinh nghiém va mic Iuong trung binh ‘tbe_luong_nam = df.groupby( ' SoNamKinhNghien* )[[‘Luong']].mean() plt.xlabel('Sé nam kinh nghiém') plt.ylabel('Luong trung binh') plt.grid() plt.plot(tbc_luong_nam, ‘ro-") plt.show() hitpscolab research google comidival'-UVExoIRUadbPOeJVUeadYEjzIVZ2qQHrin!Mode=ttue ron ‘1110724, 1050 AM. Tre quan ha a igu_matposioipynb - Colaboratory 25 Bo 2s 2 ‘nh ~ Biéu d6 phan tan Scatter Bigu do phan tan mé ta su phan bé cuia dir ligu trong khéng gian hai chiéu + Vé biéu dé mé ta sy phan b6 cila Ivong. Do chi xét bién long lén thém bién df index la id cla cde mau trong bang dit liéu df plt.scatter(df. index, luong) pit. show() wo Su khdc biét gidta plot() va scatter() plot() khéng cé kha nang thay déi mau va kich thuée diém trong tap hgp diém ban dau nhung scatter() lai c6 thé. plot() cd thé vé cde dung néi hai diém lién tiép, scatter() thi khéng, Vi du dui day vé ra cdc diém trén dé thi véi dif ligu vé chiéu cao va cn nang, mdi diém e6 mau ngau nhién va 6 kich thuéc eting ngau nhién. height = np.array([167,170, 149,165, 155,180,166,146,159,185,145,168,172,181,169]) weight = np.array([86,74,66,78,68,79,90,73,70,88,66,84,67,84,77]) colors = np.random.rand(15) # Sinh ngBu nhién 15 gia tri colors area = (38 * np.randon.rand(15))**2 plt.xlim(140,200) # Gidi han tryc x c6 gid tr] tir 140 dén 200cm plt.ylim(60,100) # Gidi han tryc y c6 gid tri tir 60 dén 100kg plt.scatter(height, weight, s-area, c=colors) plt.title("Chigu cao va can nang") plt.xlabel("chigu cao - cn") search google comierive/-UVBxoIRUadbPOe JVUeadYEj21VZ2qOtoriniMod! htpscoa wna ‘1110724, 1050 AM. Tre quan ha a igu_matposioipynb - Colaboratory plt.ylabel("Can nang - kg") plt.show() CChiéu cao va cin nang 00 =| 24 e ‘ 84 e e of e. n{ oO «| °@° ° * Mo 0200S chiéu cao -em (Cn nang - ko Bai tap vé nha: 1. Vé biéu dé thanh méi c6t mot mau 2. Vé biéu d6 tron mé ta méi quan hé gitta mic Iuong trung binh theo nganh nghé 3. Vé biéu dé hép so sénh déng thdi luong cla nhén vién Ké toan, HCNS va Sale 4, Tim higu thém cc loai biéu dé khéc trong python ex vane ‘1110724, 1050 AM. Tre quan ha a igu_matposioipynb - Colaboratory hitpscolab research google comidival'-UVExoIRUadbPOeJVUeadYEjzIVZ2qQHrin!Mode=ttue rana

Vous aimerez peut-être aussi