Académique Documents
Professionnel Documents
Culture Documents
Paper 70
device-type indicates the type of device. It is fileref is the fileref assigned to the required
optional and defaults to DISK. external file. The fileref must be assigned
before the DATA step by using an operating
’external-name’ is the name of the file on the system definition or FILENAME statement.
host system. The quotes are required.
’external-file’ specifies the name of the
host-options specify any options that vary from required external file. This form is
operating system to operating system. Carefully equivalent to specifying the external file with
read the SAS Companion for your operating a FILENAME statement.
system to understand the meaning and usage of
these options. CARDS (or CARDS4) indicates that the data
immediately follows the CARDS statement at
You may use FILENAME in conjunction with the end of the data step. In Release 6.07 or
operating system definitions to specify later, DATALINES (or DATALINES4) may be
information about the file not covered by the specified instead.
operating system. In general, one or the other
alone is sufficient. options specify SAS options to control reading
the file or to provide information about the file.
Commonly used options will be described in the
INFILE STATEMENT OPTIONS section.
Examples
The following examples are functionally
equivalent.
CMS
FILEDEF SALESDAT DISK
NOVEMBER SALES A
Beginning Tutorials
The FILENAME statement assigns the fileref Although acceptable in our small example, these
VENDORS before the DATA step, and the problems become material when reading large
INFILE points to fileref VENDORS. If the files. This section provides the tools to
numeric data were in a packed format, formatted overcome these and other problems.
input would be required as shown in this
example: Input Pointer Controls
SAS maintains two pointers, the column pointer
FILENAME vendors ’external-name’;
DATA purchasd.fruit; and the line pointer, to track what raw data will
INFILE vendors; be read during the execution of an INPUT
INPUT statement. You can change these pointers to re-
vendor $ 1-20 read data, change the order in which data fields
apples pd5.
pears pd5. are read, or handle logical records that are
; defined by multiple physical records. Use of
these pointer controls also makes the INPUT
Variable Lengths statement more self-documenting.
An important consideration when deciding which
method of input to use (list, column, formatted, COLUMN pointer controls:
or named) is the desired lengths of the variables
in the resulting SAS data set. SAS determines @expression
Beginning Tutorials
start = 1;
INPUT
@start vendor $CHAR20.
apples 5.
pears 5.
;
INPUT
@1 vendor $20.
+5 pears 5.
+(-10) apples 5.
;
Note in the last example that PEARS is read
before APPLES by using the +5 and +(-10)
column pointer controls. Normally, we prefer the
form shown in the first example. It clearly shows
the starting location, length, and informat of each
variable.
#expression
/
Beginning Tutorials
Line Hold Specifiers you use @ to hold the input record across
Line hold specifiers are used to maintain the multiple INPUT statements within the same
position of the line and column pointers on the iteration of the DATA step, you must execute an
current line in the external file through multiple INPUT without a line hold specifier to input the
INPUT statements or multiple iterations of a next record within the same iteration of the
single data step. Placed at the end of the INPUT DATA step (although the record will be
statement, they instruct SAS not to read a new automatically released at the end of the DATA
record when the next INPUT statement is step iteration). If you use @@ to hold the input
executed. This capability is the key element of record across multiple iterations of the DATA
techniques used to read more complex files and step, you must execute an INPUT or INPUT @
to improve efficiency. statement to release the current record.
@ (trailing at-sign) tells SAS to keep this record Grouping Variables and Informats
current until either an INPUT is executed without You may group variables and informats to
a trailing @ or trailing @@, or until this iteration reduce the size of the INPUT statement. This
of the DATA step is completed. technique, illustrated by Program 3 at the end of
this paper, is particularly useful when you are
In the following example, only TYPE A reading into arrays or numbered variables. SAS
observations are written. Since the variables B, recycles the informat list until the variable list is
C, and D are read only when TYPE is A, wasted exhausted.
processing is avoided.
Comparison of Methods
DATA trash; We prefer to use formatted input with column
INFILE trash;
INPUT type $ @; and line pointer controls. Some types of data
IF type = ’A’ THEN DO; (e.g. packed decimal, signed numeric,
INPUT b c d; hexadecimal, and dates) can be read only with
OUTPUT; formatted input. Data errors do not cascade
END;
RUN; beyond the element being read. Confusion
related to default processing is avoided. Finally,
@@ (double trailing at-sign) tells SAS to keep the code helps document the data structure.
this record current through successive iterations
of the DATA step. The line will be released INFILE STATEMENT OPTIONS
when the first of three events occurs: the column The INFILE statement has many options, some
pointer moves past the end of the record; an specific to the host operating system and some
INPUT statement is executed without a @ or generic to any SAS application. In this section,
@@; or when the DATA step iteration ends and we will explore the more commonly used SAS
the last executed INPUT statement did not have options.
a @@. For example, the following program
reads multiple observations from each input END=variable sets variable to 1 (true) when the
record: current record is the last record in the file. This
option is frequently used for efficiency purposes.
DATA trash; Imbedding the INPUT statement in a DO loop
INFILE trash; reduces DATA step overhead.
INPUT type $ size @@;
RUN;
DATA stuff;
INFILE injunk END=nomore;
WARNING: When using @ or @@, care must DO UNTIL(nomore);
be taken to avoid infinite loops. For example, INPUT ... ;
OUTPUT;
the following program may result in an infinite END;
loop: RUN;
The INPUT statement reads the data into the Program 5: Using Array Variables
remaining variables by using grouped format Array variables may be references directly or by
lists. Grouped format lists are a very compact index, as this example illustrates.
way to read in repeating fields because the
format lists are recycled until all of the variables DATA sales invalid;
are exhausted. Grouped format lists consist of 2 array fund {8} $3;
lists, each enclosed by parentheses: the first is array actdate {8} 8;
array amount {8} 8;
the list of variables and the second is the format
list to be recycled. The format lists can also drop i;
include column pointer controls, vital in this case input policy $char5.
due to the intermixing of the data. accounts 3. @ ;
end; len 2.
output sales; last $varying15. len
end; ;
else DATALINES;
do; 04JOHN10HUNGERFORD
output invalid; ;
end;
cards; CONCLUSION
12345001xxx0101199912345
12345002yyy0202199954321zzz0303199967890 It is hoped that this paper has provided a good
run; overview of the more commonly used SAS
features for reading external files. Please obtain
Program 6: Multiple Records Per Observation and read the SAS Institute publications
appropriate for your system. NESUG and SUGI
This program code reads a policy information file papers, and the SAS-L Internet distribution list
that spans multiple physical records. are also good sources of SAS programming tips.
INFILE multirec n=4;
INPUT Please remember that there are often multiple
#1 @7 policy $8. ways to solve any particular programming
@28 issuedte yymmdd8.
#2 @33 agent $40. problem. Take the time to experiment with
#4 @65 state $3. different techniques to improve your skills.
;
The #1, #2, and #4 line pointer controls moves The author may be reached via e-mail at
the pointer to those lines before the next part of clinton.rickards@pharma.com.
the INPUT statement is executed. Line 3 is read
from the physical file but is not used. SAS® is a registered trademark of
SAS Institute Inc., Cary, NC, USA
Program 7: Comma Delimited, Quoted Text
File REFERENCES
This data format is frequently called CSV SAS Language: Reference, Version 6.06, First
(comma separated values) files. Although this Edition, SAS Institute, Inc., Cary, NC
example uses a comma as the delimiter, any
other set of delimiters may be used to separate SAS Technical Report P-222, Changes and
the data values. Note that consecutive delimiters Enhancements to Base SAS Software. Release
in the file are needed when a value is missing, 6.07, SAS Institute, Inc., Cary, NC
as is SHOES in the second record.
SAS Technical Report P-242, SAS Software:
DATA sugi24; Changes and Enhancements. Release 6.08,
INFILE datalines dlm=’,’ dsd; SAS Institute, Inc., Cary, NC
INPUT
name $
count
footwear $
method $
;
DATALINES;
"JOHN",123,"SHOES",CAR
"JOE",,"SANDALS","TRAIN"
;
DATA names;
INFILE datalines;
INPUT
len 2.
first $varying15. len