Académique Documents
Professionnel Documents
Culture Documents
Artificial
Intelligence
1
Recap
2
Basic data structures
3
Basic data structures
- Create matrix
- Create list - Create data frame
- Dimension of matrix
- Name vector elements - Dimension of data frame
- Name rows and columns
- Type and class of list - Name rows and columns
- Access elements of matrix
- Access elements of list - Access elements of data frame
- Modify matrix
- Modify list - Add column or instance
- Add row or column
- Data manipulation - Modify data frame
- Matrix arithmetics
4
Data sources
Loading data into R can be quite frustrating. Almost every single type of file that
you want to get into R seems to require its own function, and even then you
might get lost in the functions’ arguments. In short, it can be fairly easy to mix
up things from time to time, whether you are a beginner or a more advanced R
user.
5
Flat Files Excel Files Web Databases
7
Checklist to make sure correct import
● If you work with spreadsheets, the first row is usually reserved for the header, while the first column is
● Avoid names, values or fields with blank spaces, otherwise each word will be interpreted as a separate
variable, resulting in errors that are related to the number of elements per line in your data set;
● Try to avoid using names that contain symbols such as ?, $,%, ^, &, *, (, ),-,#, ?,,,<,>, /, |, \, [ ,] ,{, and };
● Delete any comments that you have made in your Excel file to avoid extra columns or NA’s to be added
● Make sure that any missing values in your data set are indicated with NA.
Flat Files
9
Flat files
Path Setting: file.path("FOLDER", "FileName.xyz"), first argument is the directory of file and second argument is file name.
read.table(): Use for any type of table data. Specify the separator (delimiter).
● Header
● Fileencoding
● StringsAsFactors
● Col.names
● Row.names
● Fill
Excel Data
11
Excel files
read_excel(): Arguments required is only the file name. If specific sheet is required then another argument sheet is used.
13
Web
Data scraping, also known as web scraping, is the process of importing information from a website into a spreadsheet or local
file saved on your computer. It's one of the most efficient ways to get data from the web.
HTML and CSS knowledge is required to scraping data from web, the alternative to this is auto-web-scrape tool like import.io.
Web scraping in general is almost always going to be unique from use case to use case, this is because every website is different,
updates occur, and things can change.
In R:
library(rvest)
Databases
15
Databases
(If your data fits in memory, there is no advantage to putting it in a database; it will only be slower and more frustrating.)
If you are using R to do data analysis inside a company, most of the data you need probably already lives in a database.
a specific backend for the database that you want to connect to.
There are packages that either connect via ODBC but do not provide support for DBI, or offer DBI support but connect via
JDBC. The odbc package, in combination with a driver, satisfies both requirements.
Another package that provides both ODBC connectivity and DBI support is ROracle. The current version of dbplyr in CRAN
does not yet fully support a connection coming from ROracle, but we are working on it.
Connection Settings
library(RODBC)
In Windows, you create the DSN(data source name) using the ODBC Source Administrator. This tool can be found in the Control
Panel.
In Windows 10, it’s under System and Security -> Administrative Tools -> ODBC Data Sources.
To set one up, click Add, and you’ll get this box:
Select the appropriate driver (Oracle in OraDB12Home1) and click the Finish button.
A Driver Configuration box opens:
For “Data Source Name,” you can put in almost anything you want. This is the name you will use in R when you connect to the
database.
The “Description” field is optional
TNS Service Name is the name that you (or your company data base administrator) assigned when configuring the Oracle
database. And “User ID” is your ID that you use with the database.
After you fill in these fields, click the “Test Connection” button. Another box pops up, with the TNS Service Name and User ID
already populated, and an empty field for your password. Enter your password and click “OK.” You should see a “Connection
Successful” message. If not, check the Service Name, User ID, and Password.
Lab 5: Import Data
19
Lab 4:
-Tabular data
- Sheets in excel - Auto-web-scrape - Connecting
- CSV
- Data chunks - RVEST - Querying
- TSV