Ai Cheat Sheet Machine Learning With Python Cheat Sheet

AI Cheat Sheet
AI Basics USE CASES USE CASES

ARTIFICIAL INTELLIGENCE (AI) Ranking is used in bioinformatics, drug discovery, Manufacturing. Robots use deep reinforcement
“The theory and development of computer systems information retrieval, sentiment analysis, machine learning to pick a device from one box and put it in
able to perform tasks normally requiring human translation, and online advertising. a container. They learn from successful and failed
intelligence.” attempts.
Classification is applied to e-mail spam filtering,
Oxford English Dictionary bank customers loan pay back willingness Inventory Management. RL algorithms reduce
prediction, cancer tumour cells identification, transit time for stocking and retrieving products in
sentiment analysis, drugs classification, facial the warehouse to optimize space utilization and
BUSINESS USE CASES keypoints detection, and pedestrians detection in warehouse operations.
an automotive car driving.
1. AI and CUSTOMER SERVICE ENHANCEMENT: Delivery management. Reinforcement learning is
Business value is generated through optimization Regression is employed for pricing optimization, used to solve the problems of operational research
of the “front-office” operations. modeling historical sales in order to determine a and logistics (e.g., the split delivery vehicle routing
pricing strategy and predict demand for products problem).
2. AI and PROCESSES OPTIMIZATION: that have not been sold before.
The value is generated through optimization of the Finance sector. The Q-learning algorithm is able
“back-office” operations in order to reduce costs Supervised ML methods are leveraged to to learn an optimal stock market trading strategy
and improve compliance. minimize the number of returns for online with a single instruction: maximize the value of a
purchases. portfolio.
3. AI and INSIGHTS GENERATION:
New business value is created from the existing DL methods are used for predicting the next
purchases of customers in advance. OPEN-SOURCE FRAMEWORKS
data by enabling better, more consistent, and
faster decision making. RL-Glue, OpenAI Gym, RLPy, BURLAP
Forecasting models are the bread and butter for
MACHINE LEARNING (ML) business intelligence.
ML is one of the AI approaches, which uses Unsupervised learning
statistical techniques to give computer systems OPEN-SOURCE FRAMEWORKS
the ability to “learn” (i.e., progressively improve R Data Science libraries: Caret, randomForest, ALGORITHMS
performance on a specific task) from some data, nnet, dplyr
without being explicitly programmed. Clustering [k-means, Mean-Shift,
Python Data Science libraries: Scikit-learn, Scipy, Hierarchical, Fuzzy c-means] — the algorithm
DEEP LEARNING (DL) NumPy, NLTK, Matplotlib, CatBoost, XGBoost, is asked to group the similar kind of data items by
DL is a machine learning method. It uses neural PyTorch, Caffe2, Theano, Keras, TensorFlow, considering the most satisfied condition: all the
networks and allows us to train an algorithm to OpenCV items in the same group (called a cluster) are more
predict outputs, given a set of inputs. A neural net- similar to each other than to the items in the other
work consists of an input layer, a hidden layer(s), groups (clusters).
and an output layer. The “deep” in deep learning Semi-supervised learning
refers to having more than one hidden layer of Anomalies detection [Density-Based, SVM-
neurons in a neural network. Both supervised and ALGORITHMS Based, Clustering-Based] — the computer
unsupervised learning can be used for training. program sifts through a set of events or objects
Pseudo labeling is an algorithm used for and ﬂags some of them as unusual or atypical.
expanding training data sets. It requires some
TYPES OF ML ALGORITHMS data to be labeled first, but then it uses this data Dimensionality reduction [PCA, Singular
in a conjunction with a large amount of unlabeled Value Decomposition, LDA] — the algorithm is
Supervised learning algorithms make data to learn a model for a domain. It is compatible asked to reduce the number of random variables
predictions based on a set of examples. They are with almost all neural network models and training under consideration by obtaining a set of principal
trained using labeled data sets that have inputs methods. variables. Dimensionality reduction can be divided
and expected outputs. into feature selection and feature extraction.
Generative models [VAE and GANs] are neural
network models that can replicate the data Missing data imputation [mean imputation,
Semi-supervised learning learning algorithms distribution given as an input. This allows to k-NN] — the algorithm is given examples with
use unlabeled examples together with a small generate “fake-but-realistic” data points from real some missing entries and is asked to provide the
amount of labeled data to improve the learning data points. values of the missing entries.
accuracy.
Association rules learning [AIS, SETM,
Unsupervised learning algorithms work with USE CASES APRIORI, FP-GROWTH] — is a rule-based method
totally unlabeled data. They are designed to Pseudo labeling is applicable to malware/fraud for discovering interesting relations between
discover the intrinsic patterns that underlie detection, document structure analysis, stock variables in large databases. It is intended to
the data, such as a clustering structure, a low- predictions, real-time diagnostics, NLP/speech identify strong rules discovered in databases using
dimensional manifold, or a sparse tree and graph. recognition, and any other type of problems where some measures of interestingness.
small labeled data set size represents a constraint.
Reinforcement learning algorithms analyze and USE CASES
Generative models are used for real-time visual
optimize the behavior of an agent based on the processing, text-to-image generation, image-to- Unsupervised learning methods are used in
feedback from the environment. Machines try image translation, increasing image resolution, or healthcare and pharma for such tasks as human
different scenarios to discover which actions yield predicting the next video frame. genetic clustering and genome sequence analysis.
the greatest reward, rather than being told which They are also widely used across all industries for
actions to take. customers segmentation, recommender systems,
OPEN-SOURCE FRAMEWORKS
chatbots, topic modeling, anomalies detection,
TensorFlow, numPy, Scikit-learn grouping of shopping items, search results
Supervised learning grouping, etc.
ALGORITHMS Reinforcement learning
OPEN-SOURCE FRAMEWORKS
Regression [Linear, Polynomial,
ALGORITHMS R Data Science libraries: Caret, Rattle, e1071,
Nonparametric] — the algorithm is asked to nnet, dplyr
predict a numerical value given some input: “How Q-learning — the algorithm is based on a
much money would a bank gain (lose) by lending to mathematical optimization method known as Python Data Science libraries: BigARTM,
a certain client?” dynamic programming. Given current states of a Tesseract, Scrapy, Scikit-learn, PyTorch, Caffe2,
system, the algorithm finds an optimal policy (i.e., Theano, Keras, TensorFlow
Classification [Naive Bayes, k-NN,SVM, set of actions) that maximizes Q-value function.
Random Forest, Neural Networks] — the
algorithm is asked to specify which of k categories State-Action-Reward-State-Action — the Information sources:
some input belongs to. “Will a client be able to pay algorithm resembles Q-learning a lot, but learns
his loan back?” Q-value based on the action performed by the Ian Goodfellow, Yoshua Bengio and Aaron Courville
current policy instead of the greedy policy. (2016) “Deep Learning”, MIT Press
Learning to rank [HITS, SALSA, PageRank] —
the algorithm is asked to rank (i.e., to produce a Deep Q Network — the algorithm leverages a Andrew Burgess (2017) “The Executive Guide to
permutation of items in new, unseen lists) in a way Neural Network to estimate the Q-value function. Artificial Intelligence”, Springer
that is similar to the rankings in the training data. It resolves some limitations of the Q-learning Hui Li (2017) “Which machine learning algorithm
“What are the top 10 world’s safest banks?” algorithm. should I use?”, SAS Blog
Forecasting [Trending, Time-Series Deep Deterministic Policy Gradient — the
Modeling, Neural Networks] — the algorithm is algorithm is designed for such problems as
asked to generate predictions based on available physical control tasks, where the action space is
data. continuous.
Get the latest version at: http://altoros.com/visuals.html

Machine Learning with Python Cheat Sheet
General-purpose machine learning The face_recognition framework allows for universe is a software platform for measuring and
recognizing and manipulating faces from Python or training an AI’s general intelligence across the world’s
The Auto_ml framework is developed for automating
from the command line.” supply of games, websites, and other applications.
a machine learning process and making it easier to
get real-time predictions in production. It automates Dockerface is a Docker-based solution for face
analytics, feature engineering, feature selection, detection using Faster R-CNN. Data analysis and data visualization
model selection, data formatting, hyperparameter Detectron is a software system by Facebook AI Apache Spark is a fast and general cluster computing
optimization, etc. Research that implements state-of-the-art object system for big data. It provides high-level APIs in
The machine-learning framework provides a web detection algorithms, including Mask R-CNN. It Python and an optimized engine that supports general
interface and an API for classification and regression. is written in Python and powered by the Caffe2 computation graphs for data analysis.
The support vector machines and support vector framework. NumPy is a fundamental package needed for scientific
regression algorithms are available via the framework computing with Python.
out of the box. Natural language processing SciPy is open-source software for mathematics,
XGBoost implements machine learning algorithms NLTK (the Natural Language Toolkit) is a suite of science, and engineering. It includes modules for
under the Gradient Boosting technique. XGBoost open-source Python modules, data sets, and tutorials statistics, optimization, integration, linear algebra,
provides a parallel tree boosting (also known as GBDT supporting research and development in natural Fourier transforms, signal and image processing, ODE
or GBM), which solves many data science problems in language processing. solvers, etc.
a fast and accurate manner.
TextBlob is a Python library for processing textual Pandas is a library providing high-performance, easy-
scikit-learn is a Python module for machine data. It provides a simple API for diving into common to-use data structures and data analysis tools for the
learning built on top of the SciPy framework. The natural language processing tasks, such as part-of- Python language.
module encapsulates methods for enabling data speech tagging, noun phrase extraction, sentiment PyMC is a Python module that implements the Bayesian
preprocessing, classification, regression, clustering, analysis, classification, translation, etc. statistical models and fitting algorithms, including
model selection, etc.
PyNLPl is a library for natural language processing the Markov chain Monte Carlo methods. Its flexibility
SimpleAI is a library for solving search and statistical that contains various modules useful for a variety of and extensibility make it applicable to a large variety
classification problems. The search module includes natural language processing tasks, such as extraction of problems. Along with core sampling functionality,
traditional and local search algorithms, constraint of n-grams and frequency lists or building simple PyMC includes methods for summarizing output,
satisfaction problem algorithm, and interactive language models. plotting, goodness-of-fit, and convergence diagnostics.
execution of search algorithms. The classification
Polyglot is a multilingual text processing toolkit. statsmodels is a package for statistical modeling and
module of SimpleAI supports decision tree, Naive
It supports language detection (196 languages), econometrics in Python. It provides a complement to
Bayes, and k-nearest neighbours classifiers.
tokenization (165 languages), named entity recognition SciPy for statistical computations, including descriptive
MLlib in Apache Spark is a distributed machine (40 languages), part-of-speech tagging (16 languages), statistics and estimation, as well as inference for
learning library in Spark. Its goal is to make practical sentiment analysis (136 languages), and other statistical models.
machine learning scalable and easy. It provides a set features. Matplotlib is a Python 2D plotting library, which
of common machine learning algorithms, as well as
Fuzzy Wuzzy is a fuzzy string matching implementation produces publication-quality figures in a variety of
utilities for linear algebra, statistics, data handling,
in Python. The algorithm uses Levenshtein Distance to hard copy formats and interactive environments
featurization, etc.
calculate the differences between sequences. across platforms.
Theano is a numerical computation library for Python.
jellyfish is a Python library for approximate and ggplot is a plotting system for Python built for making
It allows you to efficiently define, optimize, and
phonetic matching of strings. professional looking plots quickly and with a minimum
evaluate mathematical expressions involving multi-
of code.
dimensional arrays.
Topic modeling scikit-plot is a visualization library for quick and
TensorFlow is an open-source software library for
BigARTM is a powerful tool for topic modeling. Additive easy generation of common plots in data analysis and
numerical computation using data flow graphs.
regularization of topic models is the innovative machine learning.
Originally developed by the Google Brain team,
TensorFlow allows to easily deploy computations approach lying at the core of the BigARTM library.
across a variety of platforms (CPUs, GPUs, or TPUs), as The solution helps to build multi-objective models Other projects
well as on clusters of servers, mobile and edge devices, by adding the weighted sums of regularizers to the The deepdream repository contains IPython Notebook
etc.It is widely used in a bundle with neural networks. optimization criterion. BigARTM supports different with sample code, complementing Google Research
features, including sparsing, smoothing, topics blog post about the neural network art.
Keras is a high-level neural networks API, written in decorrelation, etc.
Python and capable of running on top of TensorFlow NeuralTalk2 is an efficient image captioning code
or Theano. It was developed with a focus on enabling Gensim is a Python library for topic modelling, based on recurrent neural networks.
fast experimentation. document indexing, and similarity retrieval with large
corpora. Kaggle-cifar contains code for the CIFAR-10 Kaggle
Caffe is a deep learning framework that supports competition on image recognition. It uses a cuda-
many different types of architectures geared towards topik is a topic modeling toolbox, which provides a convnet architecture.
image classification and image segmentation. full-suite and high-level interface for anyone interested
in applying topic modeling. It includes a bunch of The Lime project is about explaining what machine
Caffe2 is a lightweight, modular, and scalable deep utilities beyond statistical modeling algorithms. learning classifiers (or models) are doing. At the
learning framework. Based on the original Caffe, moment, it supports explaining individual predictions
Caffe2 aims to provide an easy and straightforward for text classifiers or classifiers that act on tables
way to experiment with deep learning and leverage Chatbots (e.g., the NumPy arrays of numerical or categorical
community contributions of new models and End-to-end-negotiator is a PyTorch implementation data) or images. The project aims at helping users to
algorithms. of research paper “Deal or No Deal? End-to-End understand and interact meaningfully with machine
PyTorch is a Python package that provides two high- Learning for Negotiation Dialogues” by Facebook learning.
level features: tensor computation (like NumPy) with AI Research. The code trains neural networks to
DeepJ is a deep learning model for style-specific music
strong GPU acceleration and deep neural networks. hold negotiations in natural language and enabless
generation.
reinforcement learning self-play and rollout-based
CatBoost is a general purpose gradient boosting planning. deep-neuroevolution is a GitHub repository
on decision trees library with categorical features containing implementation of the neuroevolution
support out of the box. It is an easy-to-install and well DeepPavlov is an open-source library for building end-
approach, where neural networks are optimized
documented package. It supports CPU and GPU (even to-end dialog systems and training chatbots built on
through the evolutionary algorithms. It is an
multi-GPU) computation. TensorFlow and Keras.
effective method to train deep neural networks for
awesome-bots is a GitHub repository with a collection reinforcement learning tasks.
Computer vision of materials dedicated to chatbots.
scikit-image is a collection of algorithms for image Free online books
processing in Python. It includes algorithms for Reinforcement learning
segmentation, geometric transformations, color space DeepMind Lab is a first-person 3D game platform Understanding Machine Learning: From
manipulation, analysis, filtering, morphology, feature designed for research and development of general Theory to Algorithms by Shai Shalev-Shwartz
detection, etc. artificial intelligence and machine learning systems. and Shai Ben-David (2014)
OpenCV is a computer vision framework designed for DeepMind Lab can be used to study how autonomous Natural Language Processing with Python by
computational efficiency with a strong focus on real- artificial agents may learn complex tasks in large, Steven Bird, Ewan Klein, and Edward Loper (2009)
time applications. Usage ranges from interactive art to partially observed, and visually diverse worlds.
Deep Learning by Yoshua Bengio, Ian
mines inspection and advanced robotics. OpenAI Baselines is a set of high-quality
Goodfellow, and Aaron Courville (2015)
SimpleCV is a framework that gives access to several implementations of reinforcement learning algorithms.
high-powered computer vision libraries, such as OpenAI Gym is a toolkit for developing and comparing Neural Networks and Deep Learning by
OpenCV. To use the framework, you don’t need to reinforcement learning algorithms. Michael Nielsen (2014)
first learn bit depths, file formats, color spaces, buffer RLPy is a framework for conducting sequential Deep Learning by Microsoft Research (2013)
management, eigenvalues, and matrix versus bitmap decision-making experiments. The current focus of
storage. Deep Learning in Neural Networks: An
this project is on value-function-based reinforcement Overview by Jurgen Schmidhuber (2014)
OpenFace is a Python and Torch implementation of learning.
face recognition with deep neural networks.
Get the latest version at: http://altoros.com/visuals.html

Ai Cheat Sheet Machine Learning With Python Cheat Sheet

Transféré par

Informations du document

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Ai Cheat Sheet Machine Learning With Python Cheat Sheet

Transféré par

Droits d'auteur :

Formats disponibles

AI Cheat Sheet

AI Basics USE CASES USE CASES

Get the latest version at: http://altoros.com/visuals.html

Get the latest version at: http://altoros.com/visuals.html

Vous aimerez peut-être aussi