Vous êtes sur la page 1sur 15

Objective : Gearing up for a SAS interview??

1. The following SAS program is submitted: input name $ number;


data test; run;
set sasuser.employees; Which one of the following is the value of
if 2 le years_service le 10 then the NUMBER variable?
amount = 1000; A. xx
else if years_service gt 10 then B. Joe
amount = 2000; C. . (missing numeric value)
else D. The value can not be determined as the
amount = 0; program fails to execute due to errors.
amount_per_year = years_service / 4. The contents of the raw data file AMOUNT
amount; are listed below:
run; 10-20-30
Which one of the following values does the $1,234
variable AMOUNT_PER_YEAR contain if an The following SAS program is submitted:
employee has been with the company for data test;
one year? infile amount;
A. 0 input @1 salary 6.;
B. 1000 run;
C. 2000 Which one of the following is the value of
D. . (missing numeric value) the SALARY variable?
2. The contents of the raw data file AMOUNT A. 1234
are listed below: B. 1,234
10-20-30 C. $1,234
$1,234 D. . (missing numeric value)
The following SAS program is submitted: 5. Which one of the following statements is
data test; true regarding the SAS automatic _ERROR_
infile amount; variable?
input @1 salary 6.; A. The _ERROR_ variable contains the
if _error_ then description = Problems; values ON or OFF.
else description = No Problems; B. The _ERROR_ variable contains the
run; values TRUE or FALSE.
Which one of the following is the value of C. The _ERROR_ variable is automatically
the DESCRIPTION variable? stored in the resulting SAS data set.
A. Problems D. The _ERROR_ variable can be used in
B. No Problems expressions or calculations in the DATA
C. (missing character value) step.
D. The value can not be determined as the 6. Which one of the following is true when
program fails to execute due to errors. SAS encounters a data error in a DATA
3. The contents of the raw data file NAMENUM step?
are listed below: A. The DATA step stops executing at the
10-20-30 point of the error, and no SAS data set is
Joe xx created.
The following SAS program is submitted: B. A note is written to the SAS log
data test; explaining the error, and the DATA step
infile namenum; continues to execute.
C. A note appears in the SAS log that the the above program?
incorrect data record was saved to a A. The program fails execution due to data
separate SAS file for further examination. errors.
D. The DATA step stops executing at the B. The program fails execution due to
point of the error, and the resulting DATA syntax errors.
set contains observations up to that point. C. The program runs with warnings and
7. The following SAS program is submitted: creates the WORK.TOTALSALES data set
data work.totalsales (keep = with 60 observations.
monthsales{12} ); D. The program runs without errors or
set work.monthlysales (keep = year warnings and creates the
product sales); WORK.TOTALSALES data set with 60
array monthsales {12} ; observations.
do i=1 to 12; 9. The following SAS program is submitted:
monthsales{i} = sales; data work.january;
end; set work.allmonths (keep = product month
run; num_sold cost);
The data set named if month = Jan then output work.january;
WORK.MONTHLYSALES has one sales = cost * num_sold;
observation per month for each of five keep = product sales;
years for a total of 60 observations. run;
Which one of the following is the result of Which variables does the WORK.JANUARY
the above program? data set contain?
A. The program fails execution due to data A. PRODUCT and SALES only
errors. B. PRODUCT, MONTH, NUM_SOLD and
B. The program fails execution due to COST only
syntax errors. C. PRODUCT, SALES, MONTH, NUM_SOLD
C. The program executes with warnings and COST only
and creates the WORK.TOTALSALES data D. An incomplete output data set is created
set. due to syntax errors.
D. The program executes without errors or 10. The contents of the raw data file
warnings and creates the CALENDAR are listed below:
WORK.TOTALSALES data set. 10-20-30
8. The following SAS program is submitted: 01012000
data work.totalsales; The following SAS program is submitted:
set work.monthlysales(keep = year product data test;
sales); infile calendar;
retain monthsales {12} ; input @1 date mmddyy10.;
array monthsales {12} ; if date = 01012000d then event =
do i = 1 to 12; January 1st;
monthsales{i} = sales; run;
end; Which one of the following is the value of
cnt + 1; the EVENT variable?
monthsales{cnt} = sales; A. 01012000
run; B. January 1st
The data set named C. . (missing numeric value)
WORK.MONTHLYSALES has one D. The value can not be determined as the
observation per month for each of five program fails to execute due to errors.
years for a total of 60 observations. 11. A SAS program is submitted and the
Which one of the following is the result of following SAS log is produced:
2 data gt100; C. 95
3 set ia.airplanes D. . (missing numeric value)
4 if mpg gt 100 then output; 13. A SAS PRINT procedure output of the
22 202 WORK.LEVELS data set is listed below:
ERROR: File WORK.IF.DATA does not Obs name level
exist. 1 Frank 1
ERROR: File WORK.MPG.DATA does not 2 Joan 2
exist. 3 Sui 2
ERROR: File WORK.GT.DATA does not 4 Jose 3
exist. 5 Burt 4
ERROR: File WORK.THEN.DATA does not 6 Kelly .
exist. 7 Juan 1
ERROR: File WORK.OUTPUT.DATA does The following SAS program is submitted:
not exist. data work.expertise;
ERROR 22-322: Syntax error, expecting one set work.levels;
of the following: a name, if level = . then
a quoted string, (, ;, END, KEY, KEYS, expertise = Unknown;
NOBS, OPEN, POINT, _DATA_, _LAST_, else if level = 1 then
_NULL_. expertise = Low;
ERROR 202-322: The option or parameter else if level = 2 or 3 then
is not recognized and will be ignored. expertise = Medium;
5 run; else
The IA libref was previously assigned in this expertise = High;
SAS session. run;
Which one of the following corrects the Which of the following values does the
errors in the LOG? variable EXPERTISE contain?
A. Delete the word THEN on the IF A. Low, Medium, and High only
statement. B. Low, Medium, and Unknown only
B. Add a semicolon at the end of the SET C. Low, Medium, High, and Unknown only
statement. D. Low, Medium, High, Unknown, and
C. Place quotes around the value on the IF (missing character value)
statement. 14. The contents of the raw data file
D. Add an END statement to conclude the EMPLOYEE are listed below:
IF statement. 10-20-30
12. The contents of the raw data file SIZE are Ruth 39 11
listed below: Jose 32 22
10-20-30 Sue 30 33
72 95 John 40 44
The following SAS program is submitted: The following SAS program is submitted:
data test; data test;
infile size; infile employee;
input @1 height 2. @4 weight 2; input employee_name $ 1-4;
run; if employee_name = Ruth then input
Which one of the following is the value of idnum 10-11;
the variable WEIGHT in the output data else input age 7-8;
set? run;
A. 2 Which one of the following values does the
B. 72 variable IDNUM contain when the name of
the employee is Ruth?
A. 11 then description = Senior Chemist;
B. 22 else description = Unknown;
C. 32 run;
D. . (missing numeric value) A value for the variable JOBCODE is listed
15. The contents of the raw data file below:
EMPLOYEE are listed below: JOBCODE
10-20-30 CHEM3
Ruth 39 11 Which one of the following values does the
Jose 32 22 variable DESCRIPTION contain?
Sue 30 33 A. chem3
John 40 44 B. Unknown
The following SAS program is submitted: C. Senior Chemist
data test; D. (missing character value)
infile employee; 18. Which one of the following ODS statement
input employee_name $ 1-4; options terminates output being written to
if employee_name = Sue then input age 7- an HTML file?
8; A. END
else input idnum 10-11; B. QUIT
run; C. STOP
Which one of the following values does the D. CLOSE
variable AGE contain when the name of the 19. The following SAS program is submitted:
employee is Sue? proc means data = sasuser.shoes;
A. 30 where product in (Sandal , Slipper ,
B. 33 Boot);
C. 40 run;
D. . (missing numeric value) Which one of the following ODS statements
16. The following SAS program is submitted: completes the program and sends the
libname sasdata SAS-data-library; report to an HTML file?
data test; A. ods html = sales.html;
set sasdata.chemists; B. ods file = sales.html;
if jobcode = Chem2 C. ods file html = sales.html;
then description = Senior Chemist; D. ods html file = sales.html;
else description = Unknown; 20. The following SAS program is submitted:
run; proc format;
A value for the variable JOBCODE is listed value score 1 50 = Fail
below: 51 100 = Pass;
JOBCODE run;
chem2 proc report data = work.courses nowd;
Which one of the following values does the column exam;
variable DESCRIPTION contain? define exam / display format = score.;
A. Chem2 run;
B. Unknown The variable EXAM has a value of 50.5.
C. Senior Chemist How will the EXAM variable value be
D. (missing character value) displayed in the REPORT procedure output?
17. The following SAS program is submitted: A. Fail
libname sasdata SAS-data-library; B. Pass
data test; C. 50.5
set sasdata.chemists; D. . (missing numeric value)
if jobcode = chem3
21. The following SAS program is submitted: 25. The following SAS program is submitted:
options pageno = 1; proc means data = sasuser.houses std
proc print data = sasuser.houses; mean max;
run; var sqfeet;
proc means data = sasuser.shoes; run;
run; Which one of the following is needed to
The report created by the PRINT procedure display the standard deviation with only two
step generates 5 pages of output. decimal places?
What is the page number on the first page A. Add the option MAXDEC = 2 to the
of the report generated by the MEANS MEANS procedure statement.
procedure step? B. Add the statement MAXDEC = 7.2; in the
A. 1 MEANS procedure step.
B. 2 C. Add the statement FORMAT STD 7.2; in
C. 5 the MEANS procedure step.
D. 6 D. Add the option FORMAT = 7.2 option to
22. Which one of the following SAS system the MEANS procedure statement.
options displays the time on a report? 26. Unless specified, which variables and data
A. TIME values are used to calculate statistics in the
B. DATE MEANS procedure?
C. TODAY A. non-missing numeric variable values only
D. DATETIME B. missing numeric variable values and non-
23. Which one of the following SAS system missing numeric variable values only
options prevents the page number from C. non-missing character variables and non-
appearing on a report? missing numeric variable values only
A. NONUM D. missing character variables, non-missing
B. NOPAGE character variables, missing numeric
C. NONUMBER variable values, and non-missing numeric
D. NOPAGENUM variable
24. The following SAS program is submitted: values
footnote1 Sales Report for Last Month; 27. The following SAS program is submitted:
footnote2 Selected Products Only; proc sort data = sasuser.houses out =
footnote3 All Regions; houses;
footnote4 All Figures in Thousands of by style;
Dollars; run;
proc print data = sasuser.shoes; proc print data = houses;
footnote2 All Products; run;
run; Click on the Exhibit button to view the
Which one of the following contains the report produced.
footnote text that is displayed in the report? style bedrooms baths price
A. All Products CONDO 2 1.5 80050
B. Sales Report for Last Month 3 2.5 79350
All Products 4 2.5 127150
C. All Products 2 2.0 110700
All Regions RANCH 2 1.0 64000
All Figures in Thousands of Dollars 3 3.0 86650
D. Sales Report for Last Month 3 1.0 89100
All Products 1 1.0 34550
All Regions SPLIT 1 1.0 65850
All Figures in Thousands of Dollars 4 3.0 94450
3 1.5 73650 30. The SAS data set SASUSER.HOUSES
TWOSTORY 4 3.0 107250 contains a variable PRICE which has been
2 1.0 55850 assigned a permanent label of Asking
2 1.0 69250 Price.
4 2.5 102950 Which one of the following SAS programs
Which of the following SAS statement(s) temporarily replaces the label Asking Price
create(s) the report? with the label Sale Price in the output?
A. id style; A. proc print data = sasuser.houses;
B. id style; label price = Sale Price;
var style bedrooms baths price; run;
C. id style; B. proc print data = sasuser.houses label;
by style; label price Sale Price;
var bedrooms baths price; run;
D. id style; C. proc print data = sasuser.houses label;
by style; label price = Sale Price;
var style bedrooms baths price; run;
28. A realtor has two customers. One customer D. proc print data = sasuser.houses label =
wants to view a list of homes selling for less Sale Price;
than $60,000. The other customer wants run;
to view a list of homes selling for greater 31. The SAS data set BANKS is listed below:
than $100,000. BANKS
Assuming the PRICE variable is numeric, name rate
which one of the following PRINT FirstCapital 0.0718
procedure steps will select all desired DirectBank 0.0721
observations? VirtualDirect 0.0728
A. proc print data = sasuser.houses; The following SAS program is submitted:
where price lt 60000; data newbank;
where price gt 100000; do year = 1 to 3;
run; set banks;
B. proc print data = sasuser.houses; capital + 5000;
where price lt 60000 or price gt 100000; end;
run; run;
C. proc print data = sasuser.houses; Which one of the following represents how
where price lt 60000 and price gt 100000; many observations and variables will exist
run; in the SAS data set NEWBANK?
D. proc print data = sasuser.houses; A. 0 observations and 0 variables
where price lt 60000 or where price gt B. 1 observations and 4 variables
100000; C. 3 observations and 3 variables
run; D. 9 observations and 2 variables
29. The value 110700 is stored in a numeric 32. The following SAS program is submitted:
variable. data work.clients;
Which one of the following SAS formats is calls = 6;
used to display the value as $110,700.00 in do while (calls le 6);
a report? calls + 1;
A. comma8.2 end;
B. comma11.2 run;
C. dollar8.2 Which one of the following is the value of
D. dollar11.2 the variable CALLS in the output data set?
A. 4
B. 5 D. duration = today( )
C. 6 input(date,yymmdd10.);
D. 7 36. A raw data record is listed below:
33. The following SAS program is submitted: 10-20-30
data work.pieces; Printing 750
do while (n lt 6); The following SAS program is submitted:
n + 1; data bonus;
end; infile file-specification;
run; input dept $ 1 11 number 13 15;
Which one of the following is the value of run;
the variable N in the output data set? Which one of the following SAS statements
A. 4 completes the program and results in a
B. 5 value of Printing750 for the DEPARTMENT
C. 6 variable?
D. 7 A. department = trim(dept) number;
34. The following SAS program is submitted: B. department = dept input(number,3.);
data work.sales; C. department = trim(dept) ||
do year = 1 to 5; put(number,3.);
do month = 1 to 12; D. department = input(dept,11.) ||
x + 1; input(number,3.);
end; 37. The following SAS program is submitted:
end; data work.month;
run; date = put(13mar2000d,ddmmyy10.);
Which one of the following represents how run;
many observations are written to Which one of the following represents the
the WORK.SALESdata set? type and length of the variable DATE in the
A. 0 output data set?
B. 1 A. numeric, 8 bytes
C. 5 B. numeric, 10 bytes
D. 60 C. character, 8 bytes
35. A raw data record is listed below: D. character, 10 bytes
10-20-30 38. The following SAS program is submitted:
1999/10/25 data work.products;
The following SAS program is submitted: Product_Number = 5461;
data projectduration; Item = 1001;
infile file-specification; Item_Reference = Item/Product_Number;
input date $ 1 10; run;
run; Which one of the following is the value of
Which one of the following statements the variable ITEM_REFERENCE in the
completes the program above and output data set?
computes the duration of the project in A. 1001/5461
days as of todays B. 1001/ 5461
date? C. . (missing numeric value)
A. duration = today( ) D. The value can not be determined as the
put(date,ddmmyy10.); program fails to execute due to errors.
B. duration = today( ) 39. The following SAS program is submitted:
put(date,yymmdd10.); data work.retail;
C. duration = today( ) cost = 20000;
input(date,ddmmyy10.); total = .10 * cost;
run; 44. The following SAS program is submitted:
Which one of the following is the value of data work.test;
the variable TOTAL in the output data set? First = Ipswich, England;
A. 2000 City_Country = substr(First,1,7)!!,
B. 2000 !!England;
C. . (missing numeric value) run;
D. (missing character value) Which one of the following is the length of
40. Which one of the following SAS statements the variable CITY_COUNTRY in the output
correctly computes the average of four data set?
numerical values? A. 6
A. average = mean(num1 num4); B. 7
B. average = mean(of num1 num4); C. 17
C. average = mean(of num1 to num4); D. 25
D. average = mean(num1 num2 num3 45. The following SAS program is submitted:
num4); data work.test;
41. The following SAS program is submitted: First = Ipswich, England;
data work.test; City = substr(First,1,7);
Author = Agatha Christie; City_Country = City!!, !!England;
First = substr(scan(author,1, ,),1,1); run;
run; Which one of the following is the value of
Which one of the following is the length of the variable CITY_COUNTRY in the output
the variable FIRST in the output data set? data set?
A. 1 A. Ipswich!!
B. 6 B. Ipswich, England
C. 15 C. Ipswich, England
D. 200 D. Ipswich , England
42. The following SAS program is submitted: 46. Which one of the following is true of the
data work.test; RETAIN statement in a SAS DATA step
Author = Christie, Agatha; program?
First = substr(scan(author,2, ,),1,1); A. It can be used to assign an initial value
run; to _N_ .
Which one of the following is the value of B. It is only valid in conjunction with a SUM
the variable FIRST in the output data set? function.
A. A C. It has no effect on variables read with
B. C the SET, MERGE and UPDATE statements.
C. Agatha D. It adds the value of an expression to an
D. (missing character value) accumulator variable and ignores missing
43. The following SAS program is submitted: values.
data work.test; 47. A raw data file is listed below:
Title = A Tale of Two Cities, Charles J. 10-20-30
Dickens; 1901 2
Word = scan(title,3, ,); 1905 1
run; 1910 6
Which one of the following is the value of 1925 .
the variable WORD in the output data set? 1941 1
A. T The following SAS program is submitted
B. of and references the raw data file above:
C. Dickens data coins;
D. (missing character value)
infile file-specification; payroll + wagerate;
input year quantity; if last.department;
run; run;
Which one of the following completes the The SAS data set WORK.SALARY, currently
program and produces a non-missing value ordered by DEPARTMENT, contains 100
for the variable TOTQUANTITY in the last observations for each of 5 departments.
observation of the output data set? Which one of the following represents how
A. totquantity + quantity; many observations the WORK.TOTAL data
B. totquantity = sum(totquantity + set contains?
quantity); A. 5
C. totquantity 0; B. 20
sum totquantity; C. 100
D. retain totquantity 0; D. 500
totquantity = totquantity + quantity; 50. The following SAS program is submitted:
48. A raw data file is listed below: data work.total;
10-20-30 set work.salary(keep = department
squash 1.10 wagerate);
apples 2.25 by department;
juice 1.69 if first.department then payroll = 0;
The following SAS program is submitted payroll + wagerate;
using the raw data file above: if last.department;
data groceries; run;
infile file-specification; The SAS data set named WORK.SALARY
input item $ cost; contains 10 observations for each
run; department, currently ordered by
Which one of the following completes the DEPARTMENT.
program and produces a grand total for all Which one of the following is true regarding
COST values? the program above?
A. grandtot = sum cost; A. The BY statement in the DATA step
B. grandtot = sum(grandtot,cost); causes a syntax error.
C. retain grandtot 0; B. FIRST.DEPARTMENT and
grandtot = sum(grandtot,cost); LAST.DEPARTMENT are variables in
D. grandtot = sum(grandtot,cost); the WORK.TOTAL data set.
output grandtot; C. The values of the variable PAYROLL
49. The following SAS program is submitted: represent the total for each department in
data work.total; the WORK.SALARY data set.
set work.salary(keep = department D. The values of the variable PAYROLL
wagerate); represent a total for all values of
by department; WAGERATE in the WORK.SALARY data set.
if first.department then payroll = 0;
ANSWERS :
1: d 11: b 21: d 31: b 41: d
2: a 12: a 22: b 32: d 42: a
3: c 13: b 23: c 33: c 43: b
4: d 14: d 24: b 34: b 44: d
5: d 15: d 25: a 35: d 45: d
6: b 16: b 26: a 36: c 46: c or d
7: b 17: b 27: c 37: d 47: a
8: b 18: d 28: b 38: d 48: c
9: d 19: d 29: d 39: a 49: a
10: d 20: c 30: c 40: b 50: d or c
Base SAS vs. SAS Enterprise guide
Case Study
Background to case 1
You work in a retail industry. You have recently started a loyalty program for your customers. A study
conducted on retail bank, says that the customers with a total purchase of $1,000 in 3rd month (T+ 2th
months) are the customers who will finally purchase more than $30,000. You want to focus your loyalty
campaign on these customers.
You have 2 data-sets. First has the entire list of customer IDs with their date of first purchase. Second
data has the customer ID with their monthly purchases for each year-month. First purchase can
possibly be a non-financial transaction which might not be a part of table 2.
You need to identify customers who make more than $1000 purchase in the 3rd month from the first
purchase.
Table 1 : Table 2 :

olution to Case 1
This question is a classic case when Base SAS clearly beats SAS EG. In this section you will see a
simple solution for this case study.

* Creating a macro for each month


%macro fetch_data (next_mon = , third_mon = );
data create_list;
set table_1;
if first_pur < next_mon;
run;

proc sort data = create_list out=list; by customer_id; run;

proc sort data = table_2 out=purchase; by customer_id; run;

data fetch_purchase;
merge list(in=a) purchase(in=b);
if yearmonth = third_mon;
by customer_id;
if a;
run;

proc datasets;
append base=final_dataset data = fetch_purchase foce;
run;
%mend fetch_data;

%fetch_data (next_mon = 01Feb2012d , third_mon = 201204);


%fetch_data (next_mon = 01Feb2013d , third_mon = 201304);
%fetch_data (next_mon = 01Mar2012d , third_mon = 201205);
%fetch_data (next_mon = 01Mar2013d , third_mon = 201305);

*Identifying the customers with purchase above $1000 in 3rd month


data shortlisted;
set final_dataset;
if sales ge 1000;
run;
Background to case 2
You work for a banking industry. You want to analyze the transaction dataset and want to find the
median transaction amount for each customer. This is the amount over which we will want to pay to
the customer for stretching. More the dollar value of transactions, the cheaper is the total cost of
transactions.You need to make a list of all customer with their floored median transaction amount (if
there are 5 transactions, we want the 2nd lowest transaction and not 3rd and if transactions are only 1
then remove the customer from the list).
The only dataset you have is unique on transaction ID. It also has the customer ID and amount of the
transaction.
Table 1

Solution to Case 2
The solution to this problem is tiresome on SAS EG because there is no median function on SQL
routines after grouping data. SQL routines are the foundation of data handling in SAS EG.But this
becomes quite easy on Base SAS. Lets see how this can be done easily on Base SAS.
proc sql;
create table work.summarize as
select count(*) as trans_nos, customer_id
from work.table1
group by customer_id;
quit;

proc sort data = tables1; by customer_id;run;

proc sort data = summarize; by customer_id;run;

data add_total_trans;
merge table1 (in=a) summarize (in=b);
median_no = floor(trans_nos/2);
by customer_id;
drop trans_nos;
run;

proc sort data = add_total_trans; by customer_id amount;run;

data final_list;
set add_total_trans;
by customer_id amount;
if first.customer_id then n =1;
if n = median_no;
n + 1;
run;
The solution in base SAS for this question is not only effective but also time efficient.
End Notes
Both Base SAS and EG have their own pros and cons. The best recommended strategy is to use both.
If you want to make a traditional query, use SAS EG to generate automated code. Now copy this code
to make it macronized and generalized using Base SAS. The macro adds a new dimension to the
codes which helps you generalize the code and avoid hard entered data.
Have you faced any other SAS problem in analytics interview? Are you facing any specific problem
with SAS codes? Do you think this provides a solution to any problem you face? Do you think there
are other methods to solve the problems discussed in a more optimized way? Do let us know your
thoughts in the comments below.
1. Merging data in SAS :
Merging datasets is the most important step for an analyst. Merging data can be done through both
DATA step and PROC SQL. Usually people ignore the difference in the method used by SAS in the
two different steps. This is because generally there is no difference in the output created by the two
routines. Lets look at the following example :

Problem Statement : In this example, we have 2 datasets. First table gives the product holding for a
particular household. Second table gives the gender of each customer in these households. What you
need to find out is that if the product is Male biased or neutral. The Male biased product is a product
bought by males more than females. You can assume that the product bought by a household belongs
to each customer of that household.
Thought process: The first step of this problem is to merge the two tables. We need a Cartesian product
of the two tables in this case. After getting the merged dataset, all you need to do is summarize the
merged dataset and find the bias.

Code 1
Proc sort data = PROD out =A1; by household;run;
Proc sort data = GENDER out =A2; by household;run;
Data MERGED;
merge A1(in=a) A2(in=b);
by household;
if a AND b;
run;
Code 2 :
PROC SQL;
Create table work.merged as
select t1.household, t1.type,t2.gender
from prod as t1, gender as t2
where t1.household = t2.household;
quit;
Will both the codes give the same result?
The answer is NO. As you might have noticed, the two tables have many-to-many mapping. For getting
a cartesian product, we can only use PROC SQL. Apart from many-to-many tables, all the results of
merging using the two steps will be exactly same.
Why do we use DATA MERGE step at all?
DATA-MERGE step is much faster compared to PROC SQL. For big data sets except one having
many-to-many mapping, always use DATA- MERGE.
2. Transpose data-sets :
When working on transactions data, we frequently transpose datasets to analyze data. There are two
kinds of transposition. First, transposing from wide structure to narrow structure. Consider the following
example :

Following are the two methods to do this kind of transposition :


a. DATA STEP :
data transposed;set base;
array Qtr{3} Q:;
do i = 1 to 3;Period = cat('Qtr',i);Amount = Qtr{i} ;output;end;
drop Q1:Q3;
if Amount ne .;
run;
b. PROC TRANSPOSE :
proc transpose data = base out = transposed
(rename=(Col1=Amount) where=(Amount ne .)) name=Period;
by cust; run;
In this kind of transposition, both the methods are equally good. PROC TRANSPOSE however takes
lesser time because it uses indexing to transpose.
Second, narrow to wide structure. Consider an opposite of the last example.

For this kind of transposition, data step becomes very long and time consuming. Following is a much
shorter way to do the same task,
Proc transpose data=transposed out=base (drop=_name_) prefix Q;
by cust;
id period;
var amount;
run;
3. Passing values from one routine to other:
Imagine a scenario, we want to compare the total marks scored by two classes. Finally the output
should be simply the name of the class with the higher score. The score of the two datasets is stored
in two separate tables.
There are two methods of doing this question. First, append the two tables and sum the total marks for
each or the classes. But imagine if the number of students were too large, we will just multiply the
operation time by appending the two tables. Hence, we need a method to pass the value from one
table to another. Try the following code:

DATA _null_;set class_1;


total + marks;
call symputx ('class1_tot',total);
run;
DATA _null_;set class_2;
total + marks;
call symputx ('class2_tot',total);
run;
DATA results;
if &class1_tot > &class2_tot then better_class = 1;
else if &class1_tot > &class2_tot then better_class = 2;
else better_class = 0;
run;
Funtion symputx creates a macro variable which can be passed between various routines and thus
gives us an opportunity to link data-sets.
4. Using where and if :
Where and if are both used for sub-setting. Most of the times where and if can be used
interchangeably in data step for sub-setting. But, when sub-setting is done on a newly created variable,
only if statement can be used. For instance, consider the following two programs,
Code 1 : Code 2 :
data a;set b; data a;set b;
z= x+y; z= x+y;
if z < 10; where z < 10;
run; run;
Code 2 will give an error in this case, because where cannot be used for sub-setting data based on a
newly created variable.
End Notes :
These codes come directly from my cheat chit. What is especial about these 4 codes, that in aggregate
they give me a quick glance to almost all the statement and options used in SAS. If you were able to
solve all the questions covered in this article, we think you are up for the next level. You can read the
second part of this article here ( https://www.analyticsvidhya.com/blog/2014/04/tricky-base-sas-
interview-questions-part-ii/ ) . The second part of the article will have tougher and lengthier questions
as compared to those covered in this article.
Have you faced any other SAS problem in analytics interview? Are you facing any specific problem
with SAS codes? Do you think this provides a solution to any problem you face? Do you think there
are other methods to solve the problems discussed in a more optimized way? Do let us know your
thoughts in the comments below.

Vous aimerez peut-être aussi