Académique Documents
Professionnel Documents
Culture Documents
Use another statistical package that supports both formats to make it convert the dataset
Write own conversion program in/as:
Stata (slow, portable) Mata (faster, portable) Plugin (very fast, not portable, dependent on Statas bit-width) Standalone (very fast, not portable, independent of Statas bitwidth)
2
This can be automated with an .ado wrapper similar to USESAS by Dan Blanchette, which requires SAS to be installed to import data to Stata These are not true readers, since they require SPSS or SAS to be installed (with license costs, etc.)
4
DBMS/Copy
Both support command line parameters to convert in a batch-mode and thus can be wrapped for use with Stata, see e.g. STCMD by Roger Newson
(as of July 13, 2008)
5
USESPSS
USESPSS is a new command for Stata to read in SPSS data (*.sav files) It is a true reader does not require any other software (other than OS Windows)
Free
Implemented as a plugin, with portions of code (e.g. file decompression) written in assembler for performance optimization Note: SPSS format documentation is not released, and only fragmented information is available in the Internet
6
USESPSS Features
Reads *.sav files originating from both Windows and UNIX versions of SPSS (LoHi and HiLo byte orders) Supports compressed and non-compressed SPSS files Preserves variable and value labels
USESPSS Syntax
usespss can be used as any other command in the command line, users .do files and .ado programs:
Memory Tradeoff
Stata and plugins share the same address space
As a consequence, plugins can read Statas data directly (if they know where it is located) and call Statas subroutines (if exposed). However, the more memory is allocated for Stata data, the less memory is available to the plugins, because the size of the address space is limited (typically 2GB on a 32-bit Windows system). In other words, plugins compete for memory between themselves and with Stata.
9
Memory Tradeoff
Similarly to Stata, usespss attempts to load the whole data file into memory; this speeds up the 2-pass processing (1st pass optimization of the storage types, 2nd pass actual conversion) But, when user loads the SPSS data Stata data (if any) is discarded. So Statas memory use can be temporarily decreased within usespss.ado
It is important to do this when working with large files, otherwise the plugin will not be able to allocate enough memory to load the SPSS data file.
10
Memory Use
Consider the following code: set mem 800m usespss using mydata.sav, lowmemory(10) memory(800)
Limit, e.g. 2GB
Free memory
10m
usespss.ado starts
usespss.ado ends
DESSPSS
desspss is a new Stata command to describe the contents of an SPSS system *.sav file does not destroy data in the memory works much faster than
usespss using filename.sav, saving(filename.dta) describe
because no optimization/conversion is actually performed, but does not list the variable types (these are determined after optimization) saves all descriptive information in r()
12
13
Demonstration:
Embedded artificially created dataset in SPSS format:
Questions?
14