Académique Documents
Professionnel Documents
Culture Documents
statesToKeep<sapply(c("CA","OR","WA"),grep,stateAbb)
statesToKeep
Results
CA
OR
WA
38
48
2. Now you'll define the data you want to bring over from SQL Server, using a TransactSQL query. Later you'll use this
variable as the inData argument for rxImport.
R
importQuery<paste("SELECTgender,cardholder,balance,stateFROM",sqlFraudTable,
"WHERE(state=5ORstate=38ORstate=48)")
Make sure there are no hidden characters such as line feeds or tabs in the query.
3. Next, you'll define the columns to use when working with the data in R.
For example, in the smaller data set, you need only three factor levels, because the query will return data for only three
states. You can reuse the statesToKeep variable to identify the correct levels to include.
R
importColInfo<list(
gender=list(type="factor",levels=c("1","2"),newLevels=c("Male",
"Female")),
cardholder=list(type="factor",levels=c("1","2"),newLevels=c("Principal",
"Secondary")),
state=list(type="factor",levels=as.character(statesToKeep),newLevels=
names(statesToKeep))
)
4. Set the compute context to local, because you want all the data available on your local computer.
R
rxSetComputeContext("local")
5. Create the data source object by passing all the variables that you just defined as arguments to RxSqlServerData.
R
sqlServerImportDS<RxSqlServerData(
connectionString=sqlConnString,
sqlQuery=importQuery,
colInfo=importColInfo)
6. Then, call rxImport to save the data in the current working directory, in a file named ccFraudSub.xdf.
R
localDS<rxImport(inData=sqlServerImportDS,
outFile="ccFraudSub.xdf",
overwrite=TRUE)
The localDs object returned by the rxImport function is a lightweight RxXdfData data source object that represents the
ccFraud.xdf data file stored locally on disk.
7. Call rxGetVarInfo on the XDF file to verify that the data schema is the same.
R
rxGetVarInfo(data=localDS)
Results
rxGetVarInfo(data = localDS)
Var 1: gender, Type: factor, no factor levels available
Var 2: cardholder, Type: factor, no factor levels available
Var 3: balance, Type: integer, Low/High: (0, 22463)
Var 4: state, Type: factor, no factor levels available
8. You can now call various R functions to analyze the localDs object, just as you would with the source data on SQL Server.
For example:
R
rxSummary(~gender+cardholder+balance+state,data=localDS)
Now that you've mastered the use of compute contexts and working with various data sources, it's time to try something fun.
In the next and final lesson, you'll create a simple simulation using a custom R function and run it on the remote server.
Next Step
Lesson 5: Create a Simple Simulation Data Science Deep Dive
Previous Step
Lesson 4: Analyze Data in Local Compute Context Data Science Deep Dive
See Also
Data Science Deep Dive: Using the RevoScaleR Packages
2016 Microsoft