Académique Documents
Professionnel Documents
Culture Documents
PROGRAMMATION ÉVOLUÉ
BUSINESS INTELLIGENCE
Par : Kamel BENRAIS
Top des langages de programmation
pour data science
1. Python
2. R
3. SQL
4. Java
5. Julia
6. Scala
7. C et C++
8. Javascript
9. Swift
10. Go
Python
Python dispose d’un riche écosystème de
bibliothèques. De ce fait, il peut effectuer toutes les
tâches de data science. Cela englobe toutes sortes
d’opérations, de prétraitement des données, de la
visualisation et de l’analyse statistique. Tous les
types de déploiement de modèles d’apprentissage
automatique et d’apprentissage en profondeur
s’ajoutent à cette liste.
The Top 4 ETL Python Frameworks
Bonobo
Bubbles
Pygrametl
Mara
Getting started with Bonobo
Exigences
Vous devez disposer d’un environnement
python3.5+ fonctionnel. Linux et OSX ont un support
premium, tandis que les environnements Windows
sont pris en charge sur la base du meilleur effort.
Installer
C:>pip installer bonobo
Créez votre premier environnement
virtuel
mkdir project
Cd project
python -m venv env
Cd env\Scripts
Activate.bat
Instruction de base
Structure de donnee Iterations.
x = [20,30,60]
for i in range(len(x)):
y = {'Name':'Osama', 'Age':40}
print(i)
z = {20,30,50}
c = (20,30,60)
Fonction
Contrôle de flux
x=5 def test():
if x < 10: x = "Hello"
print('Ok') return x
else: print(test())
print('Not')
Module
import math as x
data = x.pow(3,2)
print(data)
************** import requests
import random x=
data = random.randint(0,10) requests.get('https://w3schools.com/python/de
print(data) mopage.htm')
*************
import random print(x.text)
data = random.uniform(0,2) ******************
print(data) import subprocess as x
data = x.call('calc.exe')
import pandas as pd
data = pd.read_excel('05.xlsx')
print(data)
for i in data:
print(i)
print(data['Sal'])
import pandas as pd
df = pd.read_excel('orders.xlsx')
print(df)
import pandas as x import PyPDF2 as x
data = x.read_csv('050.csv') file = open('test.pdf', 'br')
print(data) reader = x.PdfFileReader(file)
page1 = reader.getPage(0)
# print(page1.extractText())
linelist = [] data = page1.extractText()
with open('osama.txt', 'r') as f: print(type(data))
for i in f: print(data.upper())
print(i.strip())
dir(str)
Fichier html
import pandas as pd
data =
pd.read_html(‘https://en.wikipedia.org/wiki/Science_Fiction:_The_100_Best_Novels’,
header=0)
data = data[0]
print(data)
import pandas as pd
data =
pd.read_html('https://en.wikipedia.org/wiki/Timeline_of_programming_languages',
header=0)
data = data[8]
data.tail()
Connection Mysql
import pymssql as x
with open('connection_string.txt', 'r') as file:
server = file.readline().strip()
user = file.readline().strip()
password = file.readline().strip()
database = file.readline().strip()
con = x.connect(server, user, password, database)
sql = 'select id, name, price from product'
cur = con.cursor()
cur.execute(sql)
for i in cur:
print('============')
for j in i:
print(j)
Connexion au serveur de base de
données
pip install cx_Oracle
import cx_Oracle
dsn_tns = cx_Oracle.makedsn('Host Name', 'Port Number', service_name='Service Name')
conn = cx_Oracle.connect(user=r'User Name', password='Personal Password', dsn=dsn_tns)
c = conn.cursor()
c.execute('select * from database.table')
for row in c:
print (row[0], '-', row[1])
conn.close()
VISUALISATION DES
DONNÉES
Visualisation
Graphe en courbe
grade = [70,90,80,65,70]
subject = ['Math', 'Marketing','Production', 'Programming','Accounting']
import matplotlib.pyplot as plt
plt.plot(subject,grade)
plt.title('Osama Hassan - Student Result')
plt.show()
Graphe en bar
plt.bar(subject,grade)
plt.title('Osama Hassan - Student Result')
plt.show()
##########
plt.barh(subject,grade)
plt.title('Osama Hassan - Student Result')
plt.show()
import pandas as pd
data = pd.read_csv('Countries.csv')
data
Premier projet
c:>pip install markupsafe==2.0.1
bonobo init my-etl-job.py
EXTRACTION DONNEE
germany = []
egypt = []
years = []
for i in range(len(data)):
if data['country'][i] == 'Germany':
years.append(data['year'][i])
x = data['population'][i] /1000000
germany.append(round(x,0))
elif data['country'][i] == 'Egypt':
x= data['population'][i] / 1000000
egypt.append(round(x,0))
print(germany)
print(egypt)
print(years)
Visualisation
%matplotlib inline
plt.plot(years, germany)
plt.title('German Population since 1950')
plt.show()
%matplotlib inline
plt.plot(years, germany)
plt.plot(years,egypt)
plt.title('Egyptian Population Compared with Germany')
plt.legend(['German ', ' Egypt'])
plt.show()
%matplotlib inline
plt.scatter(years, germany)
plt.scatter(years,egypt)
plt.title('Egyptian Population Compared with Germany')
plt.legend(['German ', ' Egypt'])
plt.show()
%matplotlib inline
plt.bar(years, germany)
plt.bar(years,egypt)
plt.title('Egyptian Population Compared with Germany')
plt.legend(['German ', ' Egypt'])
plt.show()
%matplotlib inline
plt.barh(years, germany)
plt.barh(years,egypt)
plt.title('Egyptian Population Compared with Germany')
plt.legend(['German ', ' Egypt'])
plt.show()
import matplotlib.pyplot as plt
from matplotlib_venn import venn2
# First way to call the 2 group Venn diagram:
venn2(subsets = (10, 5, 2), set_labels = ('Group A', 'Group B'))
plt.show()
# Second way
venn2([set(['A', 'B', 'C', 'D']), set(['D', 'E', 'F'])])
plt.show()
Widescreen Test Pattern (16:9)
4x3
16x9