Vous êtes sur la page 1sur 41

Value Added Course

Programming in Python and


Machine Learning
UNIT-2
Python Modules
• A module is a file containing Python definitions and statements. A module
can define functions, classes and variables. A module can also include
runnable code.
• Grouping related code into a module makes the code easier to
understand and use
• To create a module, just save the code you want in a file with the file
extension.py
Ex:- mymodule.py
def greeting(name):
print("Hello, " + name)
def add(a,b):
print(a+b)
• Now we can use the module we just created, by using the import
statement
import mymodule
mymodule.greeting(“GRIET Students")
mymodule.add(2,3)
Python Modules
• The module can contain functions, and also variables of all types
(arrays, dictionaries, objects etc):
• Ex:- mymodule1.py
person = {
"name": "John",
"age": 36,
"country": "Norway“
}
import module or rename module
import mymodule1
a = mymodule1.person["age"]
print(a)
(OR)
import mymodule1 as m2
a = m2.person1["age"]
print(a)
Python Modules
• The from import Statement
• Python’s from statement lets you import specific attributes from a
module. The from .. import .. has the following syntax :
• Ex:-
from math import sqrt, factorial
print(sqrt(16))
print(factorial(6))
• Ex:-
from mymodule1 import person
print(person["age"])
Python Modules
• There are several built-in modules in Python, which you
can import whenever you like.
• Ex:- platform module
import platform
x = platform.system()
print(x)
• There is a built-in function to list all the function names
(or variable names) in a module.
Ex:- dir() function Ex:- import math
import platform x=dir(math)
x = dir(platform) print(x)
print(x)
Python Modules
# importing built in module random
import random

# printing random integer between 0 and 5


print(random.randint(0, 5))

# print random floating point number between 0 and 1


print(random.random())

# random number between 0 and 100


print(random.random() * 100)

List = [1, 4, True, 800, "python", 27, "hello"]


# using choice function in random module for choosing a random
element from a set such as a list
print(random.choice(List))
Python- Numpy
NUMPY-Introduction
• NumPy is a Python package. It stands for 'Numerical
Python'. It is a library consisting of multidimensional
array objects and a collection of routines for processing
of array.
Operations using NumPy:
– Mathematical and logical operations on arrays.
– Fourier transforms and routines for shape manipulation.
– Operations related to linear algebra. NumPy has in-built
functions for linear algebra and random number
generation
• NumPy is often used along with packages like SciPy
(Scientific Python) and Mat−plotlib (plotting library).
NUMPY − NDARRAY OBJECT
• The most important object defined in NumPy is an N-dimensional array
type called ndarray. It describes the collection of items of the same type.
Items in the collection can be accessed using a zero-based index
• The basic ndarray is created using an array function in NumPy as follows:
• numpy.array(object, dtype=None, copy=True, order=None, subok=False,
ndmin=0)
Ex:- Ex:-
import numpy as np import numpy as np
a=np.array([1,2,3]) a=np.array([1, 2, 3,4,5],
print(a) ndmin=2)
Output:- print(a)
[1 2 3] Output:-
[[1 2 3 4 5]]
Ex:-
Import numpy as np Ex:-
a = np.array([[1, 2], [3, 4]]) import numpy as np
print(a) a = np.array([1, 2, 3],
dtype=complex)
Output:
print(a)
[[1 2]
Output:-
[3 4]]
[1.+0.j 2.+0.j 3.+0.j]
Data Type Objects (dtype)
• A data type object describes interpretation of fixed block of
memory corresponding to an array, depending on the
following aspects:
• Type of data (integer, float or Python object)
• Size of data
• Byte order (little-endian or big-endian)- prefixing '<' or '>' to
data type. '<‘ means littleendian '>‘means big-endian.
Syntax:
numpy.dtype(object, align, copy)
Object: To be converted to data type object
Align: If true, adds padding to the field to make it similar to C-
struct
Copy: Makes a new copy of dtype object. If false, the result is
reference to built-in data type object
Ex: Ex:
import numpy as np import numpy as np
dt=np.dtype(np.int32) dt = np.dtype([('age',np.int8)])
print(dt) a = np.array([(10,),(20,),(30,)], dtype=dt)
Print(a) (or) print(a['age'] )
Output:
Int32 Output: [(10,) (20,) (30,)]
NUMPY − ARRAY ATTRIBUTES
shape: returns a tuple consisting of array dimensions.
reshape:It can also be used to resize the array.
Ex:
import numpy as np
a=np.array([[1,2,3],[4,5,6]])
print(a.shape)
Output: (2, 3)

Ex: Ex:
import numpy as np
a = np.array([[1,2,3],[4,5,6]])
a=np.array([[1,2,3],[4,5,6]])
b = a.reshape(3,2)
a.shape=(3,2)
print(b)
print(a)
Output: Output:
[[1 2] [[1 2]
[3 4] [3 4]
[5 6]] [5 6]]
ndim: returns the number of array dimensions

Output:
Ex: [ 2 3 4 5 5 6 7 8 1 12 23 34 56
import numpy as np 67 45 33 55 66 77 15 17 19 20 0]
a = np.array([2,3,4,5,5,6,7,8,1,12,23,34,56, 1
67,45,33,55,66,77,15,17,19,20,0]) [[[ 2 3 4]
print(a) [ 5 5 6]
[ 7 8 1]
print(a.ndim) [12 23 34]]
b = a.reshape(2,4,3)
print(b) [[56 67 45]
print(b.ndim) [33 55 66]
[77 15 17]
[19 20 0]]]
3
NUMPY − ARRAY CREATION
• Empty:It creates an uninitialized array of specified shape and
dtype
– Syntax: numpy.empty(shape, dtype=float, order='C')
• Zeros:Returns a new array of specified size, filled with zeros.
– Syntax: numpy.zeros(shape, dtype=float, order='C')
• Ones: Returns a new array of specified size and type, filled
with ones.
– Syntax: numpy.ones(shape, dtype=None, order='C')
• Full: Returns Constant array
• Eye: Returns identity matrix
• Random (or) rand : generates random numbers
• Arrange: returns evenly spaced elements
– Syntax: numpy.arrange(start,stop,step, dtype)(by default
step is 1)
Ex: Output:
[[ 2 3]
import numpy as np
[ 4 19]
a = np.empty([3,2], dtype=int) [20 0]]

Print(a) [0 0 0 0 0]

b = np.zeros((5,), dtype=np.int)
[[1 1]
Print(b) [1 1]]

x = np.ones([2,2], dtype=int) [[7 7]


[7 7]]
print(x)
c = np.full((2,2), 7) [[1. 0.]
[0. 1.]]
print(c)
[[0.20380731 0.09872237]
d = np.eye(2)
[0.97113746 0.93355909]]
print(d)
[[[0.90727033 0.35343595 0.41580929 0.76775171 0.04303769]
e = np.random.random((2,2)) [0.56244407 0.2992826 0.1484745 0.42957338 0.30501852]
[0.92683058 0.65308611 0.50461455 0.74004263 0.27147302]
print(e)
[0.24944067 0.23368387 0.31131521 0.1730478 0.02660226]]
f = np.random.rand(2,4,5)
[[0.75579737 0.90515676 0.45732618 0.95709755 0.79289932]
Print(f) [0.75538358 0.61081331 0.95818045 0.87706321 0.01676706]
[0.03442638 0.71310015 0.9943779 0.85361029 0.12742604]
y = np.arange(2,10,2)
[0.20256802 0.42287993 0.56465586 0.01915705 0.59932725]]]
print(y)
[2 4 6 8]
Arthemetic Operations
Output:
• Input arrays for performing arithmetic operations such as add(),
subtract(), multiply(), and divide() must be either of the same First array:
shape [[0. 1. 2.]
[3. 4. 5.]
Ex:
[6. 7. 8.]]
import numpy as np
Second array:
a = np.arange(9, dtype = np.float_).reshape(3,3) [10 10 10]
print('First array:') Add the two arrays:
print(a) [[10. 11. 12.]
print('\n') [13. 14. 15.]
print('Second array:') [16. 17. 18.]]
Subtract the two arrays:
b = np.array([10,10,10])
[[-10. -9. -8.]
print(b)
[ -7. -6. -5.]
print('Add the two arrays:') [ -4. -3. -2.]]
print(np.add(a,b)) Multiply the two arrays:
print('Subtract the two arrays:') [[ 0. 10. 20.]
print(np.subtract(a,b)) [30. 40. 50.]
print('Multiply the two arrays:') [60. 70. 80.]]
Divide the two arrays:
print(np.multiply(a,b))
[[0. 0.1 0.2]
print('Divide the two arrays:')
[0.3 0.4 0.5]
print(np.divide(a,b)) [0.6 0.7 0.8]]
Indexing
•Contents of ndarray object can be accessed and modified by indexing
or slicing, just like Python's in-built container objects.
•Three types of indexing methods are available − field access, basic
slicing and advanced indexing.
•Basic slicing is an extension of Python's basic concept of slicing to n
dimensions.
• A Python slice object is constructed by giving start, stop,
and step parameters to the built-in slice function. This slice object is
passed to the array to extract a part of array.
Ex:
import numpy as np
a = np.arange(10) Output:
s = slice(2,7,2) [2 4 6]
print(a[s]) [2 4 6]
b = a[2:7:2] [2 3 4 5 6 7 8 9]
print(b) [2 3 4]
print(a[2:])
print(a[2:5])
Ex:
Output:
import numpy as np [[1 2 3]
a = np.array ([[1,2,3],[3,4,5],[4,5,6]]) [3 4 5]
print(a) [4 5 6]]
slice the array from the
print('slice the array from the index a[1:]') index a[1:]
print(a[1:]) [[3 4 5]
print('The items in the second column are:') [4 5 6]]
The items in the second
print(a[...,1]) column are:
print('\n') [2 4 5]
print('The items in the second row are:') The items in the second
row are:
print(a[1,...]) [3 4 5]
print('\n') The items column 1
onwards are:
print('The items column 1 onwards are:') [[2 3]
print(a[...,1:]) [4 5]
[5 6]]
Advanced indexing always returns a copy of the data. As against this, the slicing
only presents a view.
There are two types of advanced indexing − Integer and Boolean.
Integer Indexing: y = x[1:4,[1,2]] # using advanced index for
Ex: column
import numpy as np print('Slicing using advanced index for
column:')
x = np.array([[1, 2], [3, 4], [5, 6]])
print(y)
#elements at (0,0), (1,1) and (2,0) from
array Output:
y = x[[0,1,2], [0,1,0]] Our array is:
print(y) [[ 0 1 2]
Output: [ 3 4 5]
[1 4 5] [ 6 7 8]
Ex:-import numpy as np [ 9 10 11]]
x = np.array([[ 0, 1, 2],[ 3, 4, 5],[ 6, After slicing, our array becomes:
7, 8],[ 9, 10, 11]]) [[ 4 5]
print('Our array is:') [ 7 8]
print(x) [10 11]]
z = x[1:4,1:3] # slicing Slicing using advanced index for column:
print('After slicing, our array [[ 4 5]
becomes:') [ 7 8]
print(z ) [10 11]]
Boolean Array Indexing: This type of advanced indexing
is used when the resultant object is meant to be the result
of Boolean operations, such as comparison operators

Ex:
Output:
import numpy as np
x = np.array([[ 0, 1, 2],[ 3, 4, 5],[ 6,
Our array is:
7, 8],[ 9, 10, 11]]) [[ 0 1 2]
print('Our array is:' ) [ 3 4 5]
print(x ) [ 6 7 8]
print('\n') [ 9 10 11]]
# print the items greater than 5 The items greater than 5 are:
print('The items greater than 5 are:')
[ 6 7 8 9 10 11]
print(x[x > 5])
Broadcasting
• The term broadcasting refers to the ability of NumPy to treat arrays of
different shapes during arithmetic operations.
• Arithmetic operations on arrays are usually done on corresponding
elements.
• If two arrays are of exactly the same shape, then these operations are
smoothly performed.
• Frequently we have a smaller array and a larger array, and we want to use
the smaller array multiple times to perform some operation on the larger
array.
Ex:
import numpy as np
x = np.array([[1,2,3], [4,5,6], [7,8,9], [10, 11, 12]])
v = np.array([1, 0, 1])
y = x + v # Add v to each row of x using broadcasting
print(y)
Output:
[[ 2 2 4]
[ 5 5 7]
[ 8 8 10]
[11 11 13]]
Python-Pandas
Pandas-Introduction
Pandas is an open-source Python Library providing high-
performance data manipulation and analysis tool using its powerful
data structures. The name Pandas is derived from the word Panel
Data
Key Features of Pandas
• Fast and efficient DataFrame object with default and customized
indexing.
• Tools for loading data into in-memory data objects from different
file formats.
• Data alignment and integrated handling of missing data.
• Reshaping and pivoting of date sets.
• Label-based slicing, indexing and subsetting of large data sets.
• Columns from a data structure can be deleted or inserted.
Pandas deals with the following three data structures −
• Series
• DataFrame
• Panel
Pandas-Series
Series is a one-dimensional array like structure with
homogeneous data.
• A pandas Series can be created using the following constructor
Syntax: pandas.Series( data, index, dtype, copy)
The parameters of the constructor are as follows −
Sr.No Parameter & Description

data: data takes various forms like ndarray, list,


1 constants

index: Index values must be unique and hashable,


2 same length as data. Default np.arrange(n) if no
index is passed.

dtype: dtype is for data type. If None, data type


3 will be inferred

copy: Copy data. Default False


4
Create a Series from ndarray
If data is an ndarray, then index passed must be of the same length.
If no index is passed, then by default index will be range(n) where n
is array length, i.e., [0,1,2,3…. range(len(array))-1].

Example-1 Example-2
import pandas as pd import pandas as pd
import numpy as np import numpy as np
data = np.array(['a','b','c','d'])
data = np.array(['a','b','c','d']) s=
s = pd.Series(data) pd.Series(data,index=[100,101,102,1
print(s) 03])
Print(s)
Output:−
output :-
0 a 100 a
1 b 101 b
2 c 102 c
3 D 103 d
dtype: object dtype: object
Create a Series from dict
Example-1 Example-2
import pandas as pd import pandas as pd
import numpy as np import numpy as np
data = {'a' : 0., 'b' : 1., 'c' : 2.} data = {'a' : 0., 'b' : 1., 'c' : 2.}
s = pd.Series(data) s=
Print(s) pd.Series(data,index=['b','c','d','
Output:- a'])
a 0.0 Print(s)
b 1.0 Output:-
c 2.0 b 1.0
dtype: float64 c 2.0
d NaN(Not a Number)
a 0.0
dtype: float64
Accessing Data from Series with
Position
Example-1
import pandas as pd
s = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e'])
#retrieve the first three element
Print(s[:3])
Output:-
a 1
b 2
c 3
dtype: int64
Pandas - DataFrame
• A Data frame is a two-dimensional data structure, i.e., data is aligned in
a tabular fashion in rows and columns.
Features of DataFrame
• Potentially columns are of different types
• Size – Mutable
• Labeled axes (rows and columns)
• Can Perform Arithmetic operations on rows and columns
A pandas DataFrame can be created using the following constructor −
Syntax: pandas.DataFrame( data, index, columns, dtype, copy)
Sr.No Parameter & Description
1 data: data takes various forms like ndarray, series, map, lists, dict,
constants and also another DataFrame.
2 index: For the row labels, the Index to be used for the resulting
frame is Optional Default np.arrange(n) if no index is passed.
3 columns: For column labels, the optional default syntax is -
np.arrange(n). This is only true if no index is passed.
4 dtype:Data type of each column.
5 Copy:This command (or whatever it is) is used for copying of data, if
the default is False.
DataFrame-Creation
A pandas DataFrame can be created using various inputs like −
1) Lists 2)dict 3)Series 4)Numpy ndarrays
Create a DataFrame from Lists Create a DataFrame from Dict
import pandas as pd import pandas as pd
data = data = {'Name':['Tom', 'Jack', 'Steve',
[['Alex',10],['Bob',12],['Clarke',13]] 'Ricky'], 'Age':[28,34,29,42]}
df = df = pd.DataFrame(data,
pd.DataFrame(data,columns=['Name',' index=['rank1','rank2','rank3','rank4'])
Age']) Print(df)
print(df) Output :-
Output:- Name Age Age Name
0 Alex 10 rank1 28 Tom
1 Bob 12 rank2 34 Jack
2 Clarke 13 rank3 29 Steve
rank4 42 Ricky
•Create a DataFrame from Series • Create a DataFrame from
import pandas as pd Numpy- ndarray
d = {'one' : pd.Series([1, 2, 3],
index=['a', 'b', 'c']), import numpy as np
'two' : pd.Series([1, 2, 3, 4], import pandas as pd
index=['a', 'b', 'c', 'd'])}
df = pd.DataFrame(d) a=np.array([[1,2,3],[4,5,6]]
print(df) )
df = pd.DataFrame(a)
Output :-
one two Print(df)
a 1.0 1 Output :-
b 2.0 2
c 3.0 3 0 1 2
d NaN 4 0 1 2 3
1 4 5 6
Row Selection, Addition, and Deletion
Row Selection:
Selection by Label: Rows can be selected by passing
row label to a loc function
Example:-
import pandas as pd
d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),
'two' : pd.Series([1, 3, 2, 4], index=['a', 'b', 'c', 'd'])}
df = pd.DataFrame(d)
print(df.loc['b'])
Output:-
one 2.0
two 3.0
Name: b, dtype: float64
Selection by integer Location: Slice Rows: Multiple rows
Rows can be selected by can be selected using ‘ : ’
passing integer location to operator.
an iloc function.
Example:- Example:-
import pandas as pd import pandas as pd
d = {'one' : pd.Series([1, 2, 3], d = {'one' : pd.Series([1, 2, 3],
index=['a', 'b', 'c']), index=['a', 'b', 'c']),
'two' : pd.Series([1, 2, 3, 4], 'two' : pd.Series([1, 2, 3, 4],
index=['a', 'b', 'c', 'd'])} index=['a', 'b', 'c', 'd'])}
df = pd.DataFrame(d) df = pd.DataFrame(d)
Print(df.iloc[2]) print(df[2:4])
Output:- Output:-
one 3.0 one two
two 3.0 c 3.0 3
Name: c, dtype: float64 d NaN 4
Addition of Rows: Add new Deletion of Rows: Use index label
rows to a DataFrame using to delete or drop rows from a
the append function. This DataFrame. If label is duplicated,
function will append the rows at then multiple rows will be
the end. dropped.
Example:- Example:-
import pandas as pd import pandas as pd
df = pd.DataFrame([[1, 2], [3, df1 = pd.DataFrame([[1, 2], [3,
4]], columns = ['a','b']) 4]], columns = ['a','b'])
df2 = pd.DataFrame([[5, 6], [7,
8]], columns = ['a','b']) df = df1.drop(0)
df = df.append(df2) print(df)
print(df) Output:-
Output:- a b
a b 1 3 4
0 1 2
1 3 4
0 5 6
1 7 8
Python Pandas - Panel
A panel is a 3D container of data. The term Panel data is
derived from econometrics and is partially responsible for the
name pandas − pan(el)-da(ta)-s.
The names for the 3 axes are intended to give some semantic
meaning to describing operations involving panel data. They
are −
• items − axis 0, each item corresponds to a DataFrame
contained inside.
• major_axis − axis 1, it is the index (rows) of each of the
DataFrames.
• minor_axis − axis 2, it is the columns of each of the
DataFrames.
A Panel can be created using the following constructor −
Syntax:
pandas.Panel(data, items, major_axis, minor_axis, dtype,
copy)
Parameter Description
The parameters of the data Data takes various forms like ndarray, series,
constructor are as follows − map, lists, dict, constants and also another
DataFrame

items axis=0

major_axis axis=1

minor_axis axis=2

dtype Data type of each column

copy Copy data. Default, false

Create Panel
Example:- Output:-
import pandas as pd <class 'pandas.core.panel.Panel'>
import numpy as np Dimensions: 2 (items) x 4 (major_axis) x 5
data = (minor_axis)
np.random.rand(2,4,5) Items axis: 0 to 1
p = pd.Panel(data) Major_axis axis: 0 to 3
print(p) Minor_axis axis: 0 to 4
Selecting the Data from Panel
Select the data from the panel using − 1) Items 2)Major_axis
3)Minor_axis
Using Items:-
Example:-
import pandas as pd
import numpy as np
data = {'Item1' : pd.DataFrame(np.random.randn(4, 3)),
'Item2' : pd.DataFrame(np.random.randn(4, 2))}
p = pd.Panel(data)
Print(p['Item1'])
Output:-
0 1 2
0 0.488224 -0.128637 0.930817
1 0.417497 0.896681 0.576657
2 -2.775266 0.571668 0.290082
3 -0.400538 -0.144234 1.110535
Using major_axis Using minor_axis
Example:- Example:-
import pandas as pd import pandas as pd
import numpy as np import numpy as np
data = {'Item1' : data = {'Item1' :
pd.DataFrame(np.random.randn(4, pd.DataFrame(np.random.randn(4,
3)), 3)),
'Item2' : 'Item2' :
pd.DataFrame(np.random.randn(4, pd.DataFrame(np.random.randn(4,
2))} 2))}
p = pd.Panel(data) p = pd.Panel(data)
Print(p.major_xs(1)) print(p.minor_xs(1))
Output:- Output:-
Item1 Item2 Item1 Item2
0 0.417497 0.748412 0 -1.194259 -0.171606
1 0.896681 -0.557322 1 0.949656 0.585843
2 0.576657 NaN 2 1.074569 1.115871
3 0.821483 1.831133
Data Analysis using Pandas
•We might have our data in .csv files , SQL tables, Excel files
Or .tsv files.
•We want to analyze that data using pandas.
•The first step will be to read it into a data structure that’s
compatible with pandas.
How to create a csv file:-
my_dict = { 'name' : ["a", "b", "c", "d", "e","f", "g"],
'age' : [20,27, 35, 55, 18, 21, 35],
'designation': ["VP", "CEO", "CFO", "VP", "VP",
"CEO", "MD"]}
df = pd.DataFrame(my_dict)
df.to_csv('csv_example',index=False)
Now we have the CSV file which contains the data present
in the DataFrame above.
Data Analysis using Pandas
To Load the data into dataframe using read_csv ,the syntax is:-
pandas.read_csv(filepath_or_buffer,sep=', ',`names=None`,`index_col=None`,`ski
pinitialspace=False`)
• filepath_or_buffer: Path or URL with the data
• sep=', ': Define the delimiter to use
• `names=None`: Name the columns. If the dataset has ten columns, you need
to pass ten names
• `index_col=None`: If yes, the first column is used as a row index
• `skipinitialspace=False`: Skip spaces after delimiter.

Example:- Output:-
#Load the CSV file and create a new DataFrame name age designation
df_csv = pd.read_csv('csv_example') 0 a 20 VP
Print(df_csv) 1 b 27 CEO
2 c 35 CFO
3 d 55 VP
4 e 18 VP
5 f 21 CEO
6 g 35 MD
Data Analysis using Pandas
•To print the top 5 lines:- • To print the random few lines:-
print(df_csv.head()) print(df_csv.sample(3))
Output:- Output:-
name age designation name age designation
0 a 20 VP 3 d 55 VP
1 b 27 CEO 0 a 20 VP
2 c 35 CFO 1 b 27 CEO
3 d 55 VP • To print the specific columns:-
4 e 18 VP print(df_csv[['name','age']])
• To print the bottom 5 lines:- Output:-
print(df_csv.tail()) name age
Output:- 0 a 20
name age designation 1 b 27
2 c 35 CFO 2 c 35
3 d 55 VP 3 d 55
4 e 18 VP 4 e 18
5 f 21 CEO 5 f 21
6 g 35 MD 6 g 35

Vous aimerez peut-être aussi