Vous êtes sur la page 1sur 25

TP LANGAGES DE

PROGRAMMATION ÉVOLUÉ
BUSINESS INTELLIGENCE
Par : Kamel BENRAIS
Top des langages de programmation
pour data science
1. Python
2. R
3. SQL
4. Java
5. Julia
6. Scala
7. C et C++
8. Javascript
9. Swift
10. Go
Python
 Python dispose d’un riche écosystème de
bibliothèques. De ce fait, il peut effectuer toutes les
tâches de data science. Cela englobe toutes sortes
d’opérations, de prétraitement des données, de la
visualisation et de l’analyse statistique. Tous les
types de déploiement de modèles d’apprentissage
automatique et d’apprentissage en profondeur
s’ajoutent à cette liste.
The Top 4 ETL Python Frameworks
 Bonobo
 Bubbles
 Pygrametl
 Mara
Getting started with Bonobo
 Exigences
Vous devez disposer d’un environnement
python3.5+ fonctionnel. Linux et OSX ont un support
premium, tandis que les environnements Windows
sont pris en charge sur la base du meilleur effort.
 Installer
C:>pip installer bonobo
Créez votre premier environnement
virtuel
 mkdir project
 Cd project
 python -m venv env
 Cd env\Scripts
 Activate.bat
Instruction de base
Structure de donnee Iterations.
x = [20,30,60]
for i in range(len(x)):
y = {'Name':'Osama', 'Age':40}
print(i)
z = {20,30,50}
c = (20,30,60)
Fonction
Contrôle de flux
x=5 def test():
if x < 10: x = "Hello"
print('Ok') return x
else: print(test())
print('Not')
Module
import math as x
data = x.pow(3,2)
print(data)
************** import requests
import random x=
data = random.randint(0,10) requests.get('https://w3schools.com/python/de
print(data) mopage.htm')
*************
import random print(x.text)
data = random.uniform(0,2) ******************
print(data) import subprocess as x
data = x.call('calc.exe')
import pandas as pd
data = pd.read_excel('05.xlsx')
print(data)
for i in data:
print(i)

print(data['Sal'])

import pandas as pd
df = pd.read_excel('orders.xlsx')
print(df)
import pandas as x import PyPDF2 as x
data = x.read_csv('050.csv') file = open('test.pdf', 'br')
print(data) reader = x.PdfFileReader(file)
page1 = reader.getPage(0)
# print(page1.extractText())
linelist = [] data = page1.extractText()
with open('osama.txt', 'r') as f: print(type(data))
for i in f: print(data.upper())
print(i.strip())

dir(str)
Fichier html
import pandas as pd
data =
pd.read_html(‘https://en.wikipedia.org/wiki/Science_Fiction:_The_100_Best_Novels’,
header=0)
data = data[0]
print(data)
import pandas as pd
data =
pd.read_html('https://en.wikipedia.org/wiki/Timeline_of_programming_languages',
header=0)
data = data[8]
data.tail()
Connection Mysql
import pymssql as x
with open('connection_string.txt', 'r') as file:
server = file.readline().strip()
user = file.readline().strip()
password = file.readline().strip()
database = file.readline().strip()
con = x.connect(server, user, password, database)
sql = 'select id, name, price from product'
cur = con.cursor()
cur.execute(sql)
for i in cur:
print('============')
for j in i:
print(j)
Connexion au serveur de base de
données
pip install cx_Oracle

import cx_Oracle
dsn_tns = cx_Oracle.makedsn('Host Name', 'Port Number', service_name='Service Name')
conn = cx_Oracle.connect(user=r'User Name', password='Personal Password', dsn=dsn_tns)
c = conn.cursor()
c.execute('select * from database.table')
for row in c:
print (row[0], '-', row[1])
conn.close()
VISUALISATION DES
DONNÉES
Visualisation
Graphe en courbe
 grade = [70,90,80,65,70]
 subject = ['Math', 'Marketing','Production', 'Programming','Accounting']
 import matplotlib.pyplot as plt
 plt.plot(subject,grade)
 plt.title('Osama Hassan - Student Result')
 plt.show()
Graphe en bar
 plt.bar(subject,grade)
 plt.title('Osama Hassan - Student Result')
 plt.show()
 ##########
 plt.barh(subject,grade)
 plt.title('Osama Hassan - Student Result')
 plt.show()
 import pandas as pd
 data = pd.read_csv('Countries.csv')
 data
Premier projet
 c:>pip install markupsafe==2.0.1
 bonobo init my-etl-job.py
EXTRACTION DONNEE
germany = []
egypt = []
years = []
for i in range(len(data)):
if data['country'][i] == 'Germany':
years.append(data['year'][i])
x = data['population'][i] /1000000
germany.append(round(x,0))
elif data['country'][i] == 'Egypt':
x= data['population'][i] / 1000000
egypt.append(round(x,0))
print(germany)
print(egypt)
print(years)
Visualisation
%matplotlib inline
plt.plot(years, germany)
plt.title('German Population since 1950')
plt.show()

%matplotlib inline
plt.plot(years, germany)
plt.plot(years,egypt)
plt.title('Egyptian Population Compared with Germany')
plt.legend(['German ', ' Egypt'])
plt.show()
%matplotlib inline
plt.scatter(years, germany)
plt.scatter(years,egypt)
plt.title('Egyptian Population Compared with Germany')
plt.legend(['German ', ' Egypt'])
plt.show()
%matplotlib inline
plt.bar(years, germany)
plt.bar(years,egypt)
plt.title('Egyptian Population Compared with Germany')
plt.legend(['German ', ' Egypt'])
plt.show()
%matplotlib inline
plt.barh(years, germany)
plt.barh(years,egypt)
plt.title('Egyptian Population Compared with Germany')
plt.legend(['German ', ' Egypt'])
plt.show()
 import matplotlib.pyplot as plt
 from matplotlib_venn import venn2
 # First way to call the 2 group Venn diagram:
 venn2(subsets = (10, 5, 2), set_labels = ('Group A', 'Group B'))
 plt.show()
 # Second way
 venn2([set(['A', 'B', 'C', 'D']), set(['D', 'E', 'F'])])
 plt.show()
Widescreen Test Pattern (16:9)

Aspect Ratio Test


(Should appear
circular)

4x3

16x9

Vous aimerez peut-être aussi