Vous êtes sur la page 1sur 99

1. Dataplot Dataplot is a free, public-domain, multi-platform (Unix, VMS, Linux, Windows 95/98/ME/XP/NT/2000, etc.

) software system for scientific visualization, statistical analysis, and non-linear modeling. The target Dataplot user is the researcher and analyst engaged in the characterization, modeling, visualization, analysis, monitoring, and optimization of scientific and engineering processes. The original version was released by James J. Filliben in 1978 with continual enhancements to present. Authors: James J. Filliben and Alan Heckert Statistical Engineering Division, Information Technology Laboratory, National Institute of Standards and Technology, with tcl/tk GUI interface by Robert R. Lipman, formerly of the Mathematical and Computational Sciences Division. Project co-sponsor: HPCC / SIMA. Some of the on-line documentation utilizes the Portable Document Format (PDF). If you do not already have a PDF reader installed for your browser, a number of freely available PDF readers are available.

2. BioEstat

Descriptive and Multivariate Analysis (Be careful: software has only a portuguese version)

3. Instat + Instat is a general statistical package. It is simple enough to be useful in teaching statistical ideas, yet has the power to assist research in any discipline that requires the analysis of data. Instat began life on a BBC microcomputer. It was first used on a training course on 'statistics in agriculture' held in Sri Lanka during 1983. The BBC micro version was marketed commercially from mid-1985, with the DOS version for PCs becoming available in 1987. From 1994 Instat was free-ofcharge. Updated DOS versions were released in 1996 and 1997. Instat has been used widely in the UK and elsewhere by a range of companies, research institutes, schools, colleges, universities and private individuals. At Reading it has been used extensively on training courses run by the SSC. It has also been used in many countries on statistics courses and on courses related to health, agriculture and climatology.

'Instat+' (i.e. the Windows version of Instat) has been developed mainly because of its continued use for the analysis of climatic data. Funding from the UK Met Office for a new climatic version, supplemented by support from the SSC and the efforts of other friendly collaborators, led to the Windows version, which was first used on training courses in 1999.

4. MacAnova MacAnova is a free, open source, interactive statistical analysis program for Windows, Macintosh, and Linux written by Gary W. Oehlert and Christopher Bingham, both of the School of Statistics, University of Minnesota. In spite of its name, MacAnova is not just for Macintosh computers and not just for doing Analysis of Variance. MacAnova has many capabilities but its strengths are analysis of variance and related models, matrix algebra, time series analysis (time and frequency domain), and (to a lesser extent) uni- and multi-variate exploratory statistics. The current version is 5.05 release 1. Core MacAnova has a functional/command oriented interface, but an increasing number of capabilities are available through a menu/dialog/mouse

type interface. Although the language and syntax are S-like (for those of you familiar with S, S-Plus or R), MacAnova is not S or R. There is extensive documentation for MacAnova available from these Web pages. A recent addition is a HTML version of the information in all help files.

5. MicrOsiris MicrOsiris is a comprehensive statistical and data management package for Windows developed by Van Eck Software and freely distributed. Derived from OSIRIS IV, a statistical and data management package developed and used at the University of Michigan, MicrOsiris includes special techniques for data mining (SEARCH) and analysis of nominal- and ordinal-scaled data (MNA, MCA) and an interface to IVEware. The MicrOsiris IVEware interface command, IVEWARE, invokes the Srcware version of IVEware (installed with MicrOsiris) that can: Perform single or multiple imputations of missing values using the Sequential Regression Imputation Method. Perform a variety of descriptive and model based analyses accounting for such complex design features as clustering, stratification and weighting.

Perform multiple imputation analyses for both descriptive and modelbased survey statistics. Returns an imputed data set for further analysis in MicrOsiris. MicrOsiris accepts data from SPSS, SAS, STATA, and Excel as well as from other sources, including UNESCO IDAMS datasets and older OSIRIS datasets from ICPSR. MicrOsiris requires less than 12MB on disk (32-bit version)--including manual, sample data files, IVEware, and Statistical Decision Tree, and loads in less than 2MB of Memory (additional memory allocated as needed depending on the number of variables used). It will run under a Windows virtual machine on a Mac and on Ubunto/Linux under Wine.

6. R R is a language and environment for statistical computing and graphics. It is a GNU project which is similar to the S language and environment which was developed at Bell Laboratories (formerly AT&T, now Lucent Technologies) by John Chambers and colleagues. R can be considered as a different implementation of S. There are some important differences, but much code written for S runs unaltered under R. R provides a wide variety of statistical (linear and nonlinear modelling, classical statistical tests, time-series analysis, classification, clustering, ...) and

graphical techniques, and is highly extensible. The S language is often the vehicle of choice for research in statistical methodology, and R provides an Open Source route to participation in that activity. One of R's strengths is the ease with which well-designed publicationquality plots can be produced, including mathematical symbols and formulae where needed. Great care has been taken over the defaults for the minor design choices in graphics, but the user retains full control. R is available as Free Software under the terms of the Free Software Foundation's GNU General Public License in source code form. It compiles and runs on a wide variety of UNIX platforms and similar systems (including FreeBSD and Linux), Windows and MacOS. The R environment R is an integrated suite of software facilities for data manipulation, calculation and graphical display. It includes

an effective data handling and storage facility, a suite of operators for calculations on arrays, in particular matrices, a large, coherent, integrated collection of intermediate tools for data analysis, graphical facilities for data analysis and display either on-screen or on hardcopy, and a well-developed, simple and effective programming language which includes conditionals, loops, user-defined recursive functions and input and output facilities.

The term "environment" is intended to characterize it as a fully planned and coherent system, rather than an incremental accretion of very specific and inflexible tools, as is frequently the case with other data analysis software. R, like S, is designed around a true computer language, and it allows users to add additional functionality by defining new functions. Much of the system is itself written in the R dialect of S, which makes it easy for users to follow the algorithmic choices made. For computationally-intensive tasks, C, C++ and Fortran code can be linked and called at run time. Advanced users can write C code to manipulate R objects directly. Many users think of R as a statistics system. We prefer to think of it of an environment within which statistical techniques are implemented. R can be extended (easily) via packages. There are about eight packages supplied with

the R distribution and many more are available through the CRAN family of Internet sites covering a very wide range of modern statistics. R has its own LaTeX-like documentation format, which is used to supply comprehensive documentation, both on-line in a number of formats and in hardcopy.

7. Tanagra TANAGRA is a free DATA MINING software for academic and research purposes. It proposes several data mining methods from exploratory data analysis, statistical learning, machine learning and databases area. This project is the successor of SIPINA which implements various supervised learning algorithms, especially an interactive and visual construction of decision trees. TANAGRA is more powerful, it contains some supervised learning but also other paradigms such as clustering, factorial analysis, parametric and nonparametric statistics, association rule, feature selection and construction algorithms... TANAGRA is an "open source project" as every researcher can access to the source code, and add his own algorithms, as far as he agrees and conforms to the software distribution license. The main purpose of Tanagra project is to give researchers and students an easy-to-use data mining software, conforming to the present norms of the software development in this domain (especially in the design of its GUI and the way to use it), and allowing to analyse either real or synthetic data. The second purpose of TANAGRA is to propose to researchers an architecture allowing them to easily add their own data mining methods, to compare their performances. TANAGRA acts more as an experimental platform in order to let them go to the essential of their work, dispensing them to deal with the unpleasant part in the programmation of this kind of tools : the data management. The third and last purpose, in direction of novice developers, consists in diffusing a possible methodology for building this kind of software. They should take advantage of free access to source code, to look how this sort of software is built, the problems to avoid, the main steps of the project, and which tools and code libraries to use for. In this way, Tanagra can be considered as a pedagogical tool for learning programming techniques. TANAGRA does not include, presently, what makes all the strength of the commercial softwares in this domain : a wide set of data sources, direct access to datawarehouses and databases, data cleansing, interactive utilization, ...

8. ViSta ViSta (The Visual Statistics System) is a software written by Forrest W.Young, Professor of Psychometrics at the University of North Carolina at Chapel Hill. As it is described on the website, this software helps you to see what your data seem to say. Indeed it has its strengths in visualizations which are highly dynamic and very interactive, showing you multiple views of your data simultaneously. ViSta is an open source software (but to contribute to it you have to get a password from the author) and it is available for Windows, Macintosh and Unix. ViSta is an extensible software, since it is open to new contributors (programs written in Fortran, in C and in XlispStat are accessible from within ViSta). ViSta can perform various kind of analysis , also thanks to the implementation of some plugins you can download from its website. In particular ViSta can carry out univariate and various kind of multivariate analysis (for example principal component). ViSta is mainly useful in teaching in univariate and multivariate statistics courses, but it can be also utilized in research. Though it is expected a further enlargement of its functions in the future (but now it is still at 6.4 as last stable version), ViSta has to be considered a software

with some significant limits. At this moment also its interaction with software like Excel is difficult. ViSta graphic interface has a top windows-style menu, but you can also give inputs from command line in the lower part of the screen. The workspace is subdivided between the workmap where the various steps of analysis are listed (and from here you can get either numerical results and graphich visualizations), and the datasheet, where you can enter data to process.

9. WinIDAMS WinIDAMS is a software written by Unesco Secretariat with the cooperation of experts from different countries. It is available freely on request (you need to fill in a form on their website, after that they tell you the exact path where you can download the software and the manual). In the next future a project called OpenIdams ought to begin , so to make this software open source, and consequently extensible thanks to external contributes. WinIDAMS is available only for Windows operating system (hence its name). Besides english, its module is also available in french and spanish. The main analysis methods WinIDAMS can carry out are: regression analysis analysis of variance

discriminant analysis cluster analysis principal component analysis correspondence analysis The first step to do to carry out an analysis in WinIDAMS is defining the application environment (specifing the name of the application you want to create and the folder paths of input and output files). Then you need to create the dictionary file in which specifing the variables of the dataset to analyse. After preparing the dictionary file (dic extension), the following step is creating the data file (dat extension) in which entering values of observations to analyse. Finally you have to build the setup file (set extension) typing a list of instructions (which depends on the kind of analysis and which can be found on the manual) to execute. When execution is finished, an output file including all the results is loaded.

10.Demetra Seasonal adjustment is an important step of the official statistics business architecture and harmonisation of practices has proved to be key element of quality of the output. In this spirit, since the 90s, Eurostat has been playing a role in the promotion, development and maintenance of a software solution

(Demetra) freely available for seasonal adjustment in line with established best practices. In 2008, ESS (European Statistical System) guidelines on SA have been endorsed by the CMFB and the SPC as a framework for seasonal adjustment of PEEIs and other ESS and ESCB economic indicators. ESS guidelines cover all the key steps of the seasonal and calendar adjustment process and represent an important step towards the harmonisation of seasonal and calendar adjustment practices within the ESS and in Eurostat. A common policy for the seasonal and calendar adjustment of all infra-annual statistics will improve the quality and comparability of the national data as well as enhance the overall quality of European to the extent that proper SA tools exist and are available. The SA Steering Group (the Eurostat-ECB high level group of experts from NSIs and NCBs which has produced the ESS Guidelines for seasonal adjustment) is promoting the development of a flexible software solution for SA to be used within the ESS. The group has drawn its attention on the object oriented technologies used by the R&D Unit of the Department of Statistics of the National Bank of Belgium to develop a series of prototype tools for SA. This has been considered as an adequate framework for the cooperative development of a new generation of sustainable SA tools, enabling the implementation of the ESS guidelines and replacing the previous Demetra whose maintenance and sustainability is put in question. Demetra+ is a family of modules on seasonal adjustment, which are based on the two leading algorithms in that domain (TRAMO&SEATS@ / X-12ARIMA). TRAMO&SEATS@ (TRAMO \"Time series Regression with ARIMA noise, Missing values and Outliers\", and SEATS, \"Signal Extraction in ARIMA Time Series\", developed by Agustn Maravall and Victor Gmez) and X-12-ARIMA (developed by David Findley and Brian Monsell) are two different methods to seasonally adjust a time series. Both methods can be divided into two main parts: a pre-adjustment step, which removes the \"deterministic\" component of the series by means of a regression model with Arima noises and the decomposition part itself. The two methods use a very similar approach in the first part of the processing but they differ

completely in the decomposition part. Their comparison is often difficult, even for the modelling step. More especially, their diagnostics focus on different aspects and their outputs take completely different forms. One of the main features of Demetra+ is to normalize - as much as possible - the different methods. It tries to improve the comparability of the two methods by using as much as possible, a common set of diagnostics and of presentation tools. That fundamental choice implies that a number of routines of both methods have been re-written in Demetra+. That can lead, compared to the original programs, to small discrepancies in diagnostics or in peripheral information that should not alter the general \"message\" provided by the algorithms. Under no circumstances should the main results of the original programs (seasonally adjusted series...) be impacted by that solution. 11.Draco Draco is an open-source econometric solution, providing functionality available in popular statistics packages without high purchasing and licensing costs. Draco presents a familiar, yet powerful, user interface to improve efficiency.

The user interface of Draco is similar to spreadsheet applications, presenting all data to the user for all stored variables. All functionality is accessed via the modern interface. Data can quickly be visualized thanks to the Approximatrix Openchart2 plotting library. Approximatrix provides full commercial support for Draco. Support customers are able to provide direct input on the direction of Draco development. Furthermore, those who purchase commercial support can access extended content at the Draco Support Site. Learn more about full support for Draco and consider purchasing support to ensure the continued development of Draco.

12.EasyReg Int. EasyReg is a software written by Herman J. Bierens of Pennsylvania State University, useful to conduct various econometric estimation. EasyReg is written in Visual Basic 5 and it is available only for Windows operating system. EasyReg hasn't a tutorial included (but for each analysis method there are useful guided tours which are included in the setup file or downloadable separately), however it is very easy to run, and it can be useful both in teaching (there is an option where you can select your own econometric level: undergraduate, intermediate and advanced), and in empirical economic research: really it was originally designed by the author to promote his own econometric techniques, but then he made the choice to build a more general software which could perform every kind of econometric analyses. EasyReg is a software with a windows-style user interface, where you can select different options. An EasyReg limit is that it isn't extensible in any way: by it you can use only the modules builded from its author. Nevetheless, EasyReg can be considered as a complete tool to perform econometric analyses and it can be compared with some commercial software. In particular EasyReg can carry out the following analyses: Univariate and multivariate linear regression analysis Nonlinear regression analysis Maximum likelihood Linear general method of moments Nonlinear general method of moments estimation Discrete dependent variables modeling (Probit and Logit) ARIMA estimation Johansen's cointegration analysis

EasyReg can import Excel files saved in csv format.

13.Gretl Gretl was created by Allin Cottrell on the basis of the source code written by Professor Ramu Ramanathan of the University of California. Gretl is available for all main operating systems (Linux, Mac and Windows) and it is downloadable freely with Gnu license (the links to download the different vesions are on the home page of Gretl website). Though it can't be considered as a general-purpose statistical software (its main functions are time series analysis, regression analysis and various econometric tests), it is very useful thanks also to its perfect integration with R. and with two other statistical packages used in seasonal adjustement: Tramo-Seatss and X-12Arima.

Included in this software there is a large database which can be used as examples and moreover additional data can be downloaded directly from the Gretl web site. There is also a manual in pdf format (downloadable separately) which describes all kinds of analysis you can carry out in Gretl. Gretl has an user interface with a menu by which you can open files or select the various modules to perform analyses. Gretl files are in Xml format, but this sofware can also import files saved by spreadsheet. Gretl is an extensible software, thanks to its internal scripting language and to its open source distribution.

14.JMulTi Interactive software designed for univariate and multivariate time series analysis JMulTi is an interactive software designed for univariate and multivariate time series analysis. It has a Java graphical user interface that uses an external engine for statistical computations.

Implemented features include VAR/VEC modelling but also methods that are not yet in widespread use. A full account of implemented methods is available in the features section.

15.Matrixer Matrixer is an econometric software written by Alexander Tsyplakov. It is downloadable directly from the home page of its website, it is available only for Windows operating system, and you can use it freely but only for non commercial purposes. Matrixer is particularly useful for teaching econometrics, but it can be used also for applied research. Its main functions are various regression methods (linear, non linear, logit, probit, non parametric), econometrics tests, and descriptive statistics. The user interface of Matrixer has a windows-style menu with selectable options, but inputs can be given also from command line in a window at the bottom of the screen. This software is particular suited to work with vectors and matrices (hence its name) which can be imported from external sources (Matrixer can open both text files and files with csv extension), or builded easily by an editor.

16.TSW Freeware Statistics and Econometrics Software

17.Biogeme Biogeme is an open source freeware designed for the estimation of discrete choice models. It allows the estimation of the parameters of the following models:

Logit Binary probit Nested logit Cross-nested logit Multivariate Extreme Value models Discrete and continuous mixtures of Multivariate Extreme Value models Models with nonlinear utility functions Models designed for panel data Heteroscedastic models

Object-oriented Software Package designed for the Maximum Likelihood Estimation of Generalized Extreme Value Models

18.Matvec Matvec is an object oriented language originally developed by Tianlin Wang under the supervision of Rohan Fernando. Written in C++, its capabilities have been enhanced by a number of researchers. Currently Matvec's capabilities range from matrix and vector manipulation to the analysis of linear and generalized linear mixed models.

19.MX Mx is a matrix algebra interpreter and numerical optimizer for structural equation modeling and other types of statistical modeling of data. It is being redeveloped as an OpenMx an open source project in collaboration with the University of Virginia and Argonne National Laboratories.

20.Tetrad TETRAD is a program which creates, simulates data from, estimates, tests, predicts with, and searches for causal and statistical models. The aim of the program is to provide sophisticated methods in a friendly interface requiring very little statistical sophistication of the user and no programming knowledge. It is not intended to replace flexible statistical programming systems such as Matlab, Splus or R. Tetrad is freeware that performs many of the functions in commercial programs such as Netica, Hugin, LISREL, EQS and other programs, and many discovery functions these commercial programs do not perform. Tetrad is unique in the suite of principled search ("exploration," "discovery") algorithms it provides--for example its ability to search when there may be unobserved confounders of measured variables, to search for models of latent structure, and to search for linear feedback models--and in the ability to calculate predictions of the effects of interventions or experiments based on a model. All of its search procedures are "pointwise consistent"--they are guaranteed to converge almost certainly to correct information about the true structure in the large sample limit, provided that structure and the sample data satisfy various commonly made (but not always true!) assumptions. Tetrad is limited to models of categorical data (which can also be used for ordinal data) and to linear models ("structural equation models') with a Normal probability distribution, and to a very limited class of time series models. The Tetrad programs describe causal models in three distinct parts or stages: a picture, representing a directed graph specifying hypothetical causal relations among the variables; a specification of the family of probability distributions and kinds of parameters associated with the graphical model; and a specification of the numerical values of those parameters.

The program and its search algorithms have been developed over several years with support from the National Aeronautics and Space Administration and the Office of Naval Research. Joseph Ramsey has implemented most of the program, with substantial assistance from Frank Wimberly. Executable and Source code for all versions of Tetrad IV, and this manual, are copyrighted, 2004, by Clark Glymour, Richard Scheines, Peter Spirtes and Joseph Ramsey. The program may be freely downloaded and used without permission of copyright holders, who reserve the right to alter the program at any time without notification.

21.WinBUGS The BUGS (Bayesian inference Using Gibbs Sampling) project is concerned with flexible software for the Bayesian analysis of complex statistical models using Markov chain Monte Carlo (MCMC) methods. The project began in 1989 in the MRC Biostatistics Unit and led initially to the `Classic' BUGS program, and then onto the WinBUGS software developed jointly with the Imperial College School of Medicine at St Mary's, London. Development is now focussed on the OpenBUGS project. The programs are reasonably easy to use and come with a wide range of examples. There is, however, a need for caution. A knowledge of Bayesian statistics is assumed, including recognition of the potential importance of prior distributions, and MCMC is inherently less robust than analytic statistical methods. There is no in-built protection against misuse. 22.Ade4 Ade 4 was written by some professors (Jean Thioulouse, Daniel Chessel, Sylvain Doledec and Jean-Michel Olivier) of the Univ.Lyon, France, and its development is supported by contracts with the French Ministre de

l'Environnement and the French National Center for Scientific Research(CNRS). Ade 4 is usually useful in the context of environmental data analysis, but it can be also used in other scientific disciplines. In particular, in multivariate statistics, Ade 4 can perform the following analysis methods: -correspondence analysis -principal component analysis -discriminant analysis -canonical correspondence analysis -many regression methods It has great graphical display capabilities, you can build graphs easily from data matrix, and then you can change these visualizations in various ways. Ade 4 is made of several stand-alone applications, called modules, each of them can perform a range of analysis, and a set of graphical applications that show various kinds of visualizations for each analysis. Each module has various options about the kind of analysis to perform and the data to enter. Then the outputs are visualized in a text file. All the modules are managed through an interface, by which you can also see the current work directory and access various data sets included in the Ade-Data database. On the Ade 4 website, besides the various modules and the user interface, you can download as well all the documentation (in pdf format), which explain the different analysis methods. You can open this documentation by the user interface,too. Ade 4 is available for Windows and Macintosh (but there isn't a Linux version), and it is downloadable freely here: you have to download all the modules (or only those you need) from the "SeparateModules..." folder, and the interface from the "MetaCard" folder.

23.Antaeus Antaeus is a utility that lets you create and explore data plots with ease. These plots can be used to confirm or deny the merits of statistical analyses performed from other applications. They can also be used to find patterns in multivariate data that can only be discovered through the use of visual investigation. Antaeus

is meant to supplement statistical analyses by allowing investigators to study the plots that correspond to their formulations, but no statistics is used within Antaeus itself. The basic plot type used everywhere in Antaeus is the scatter plot, supplemented by its high-density counterpart, the sunflower plot (see The Sunflower Plot). These plots are supported by quantile plots and histograms, which are used to see the data of a single variable. Scatter plots are used to look for the relationship between two variables, but as the number of variables increases, more and more scatter plots become necessary to visualize all the possible interrelationships. It may be helpful to remember that data plots are themselves a functional use of fundamental mathematical principles (read Seeing Your Data for a clarification of this statement). The screen shot below shows a scatter plot from the Weather Data demo cube, installed with Antaeus, in the Single Scatter Plot SV (SynchroView). This data consists of 12 variables (measures) and 2 dimensions (date and location) defined by a data table containing 11,458 records, which results in a basic set of 144 different scatter plots. But any scatter plot can be greatly modified by using subsets of the data records, defined in terms of measure and dimension values. Each subset results in a different set of basic scatter plots. Also, as can be seen in the screen shot, the data points in a scatter plot can be separated, using color, into groups corresponding to different values of a dimension (see Dimensions and Separation):

To explore this diversity in a logical manner, it becomes necessary to explore data plots in an organized system. Antaeus uses a structure called the data cube, which is generated from a flat-file comma, tab, or semi-colon delimited data table. A data cube ("cube" for short) is entirely independent of the data table it was created from, and these cubes can be freely exchanged between users. The cube is represented in the interface by a virtual scatter plot matrix within which you navigate the universe of possible scatter plots. The "organized system" consists of supporting SVs that enable you to modify the logical structure of the cube to provide new ways of asking questions about your data. As an example, there is an SV that lets you create new measures as mathematical functions of existing measures (see FunctionsDefining new measures). The Scatter Plot Matrix SV below shows an 8x8 matrix, from the Iris Data Extended demo cube, using 4 new measures defined as simple ratios of the original 4 measures:

Another SV lets you create new dimensions by partitioning the data points of a scatter plot into mutually disjoint groups of records (see PartitionsDefining new dimensions). The following screen shot of the Scatter-Scatter Plot SV, which displays a double plot from the GHCN Station List demo cube, illustrates what you can do by creating a new dimension. In this case, a dimension was created from the Elevation measure with seven values for seven strata of elevation, six of which are separated. Also, one of the cube's subsetsalso created from the Elevation measureis applied as a brush:

(click to enlarge as a scrollable image)

Antaeus exploits the ever-increasing memory and speed of contemporary PCs by allowing data tables and cubes to be very large. This may result in frustration when working interactively with such files, but several mechanisms are provided for dealing with this (see Navigating Large Cubes). All the plots generated by Antaeus are very highly finished and do not require any user interaction to achieve this level of quality. You can size the plots by dragging the edges of their windows, and you have extensive control over the palettes used to color them, but all their construction details are handled proactively. Antaeus is also a platform from which plots may be published. Any plot may be saved to clipboard or file (when too large for the clipboard) as a Windows Enhanced Metafile (EMF). These can be embedded in reports, papers, and presentations created in Microsoft Office products such as Word, PowerPoint and Publisher. Support for EMF files is also increasing among non-Microsoft publication suites. These sophisticated programs are empowered by the use of plots they cannot possibly construct themselves.

24.Arc Freeware Statistical Analysis Tool for Regression Problems

25.Assistat The Software Assistat was developed by Professor Doctor Francisco de Assis of the Department of Agricultura Engineering of the Center of Technology and Natural Resources of the Federal University of Campina Grande City (UFCG), Brazil. Freeware Regression and Variance Analysis, Statistical Tests.

26.Epidata EpiData Entry is used for simple or programmed data entry and data documentation. Entry handles simple forms or related systems Optimised documentation and error detection features. E.g. double entry verification, list of ID numbers in several files, codebook overview of data, date added to backup and encryption procedures. EpiData Analysis performs basic statistical analysis, graphs, and comprehensive data management. E.g. descriptive statistics, SPC Charts, Recoding data, label values and variables. Defining missing values. Software for basic statistical analysis, graphs, and comprehensive data management

27.Epi Info Physicians, nurses, epidemiologists, and other public health workers lacking a background in information technology often have a need for simple tools that allow the rapid creation of data collection instruments and data analysis, visualization, and reporting using epidemiologic methods. Epi Info, a suite of

lightweight software tools, delivers core ad-hoc epidemiologic functionality without the complexity or expense of large, enterprise applications. Epi Info is easily used in places with limited network connectivity or limited resources for commercial software and professional IT support. Epi Info is flexible, scalable, and free while enabling data collection, advanced statistical analyses, and geographic information system (GIS) mapping capability. Since its initial release, Epi Info users have self-registered in over 181 countries covering all continents including Antarctica. Epi Info has been translated in more than 13 languages. More than one million users are estimated. Software for Epidemiological Statistics

28.EzANOVA Software to illustrate how the basics of Analysis of Variance.

29.Factor Software for Exploratory Factor Analysis. Factor is a program developed to fit the Exploratory Factor Analysis model. Below we describe the methods used. Univariate and multivariate descriptives of variables:

Univariate mean, variance, skewness, and kurtosis Multivariate skewness and kurtosis (Mardia, 1970) Var charts for ordinal variables

Dispersion matrices:

User defined tipo matrix Covariance matrix Pearson correlation matrix Polychoric correlation matrix (Polychoric algorithm: Olsson ,1979a, 1979b; Tetrachoric algorithm: Bonett & Price, 2005) with smoothing algorithm (Devlin, Gnanadesikan, & Kettenring, 1975; Devlin, Gnanadesikan, & Kettenring, 1981)

Procedures for determining the number of factors/components to be retained:


MAP: Minimum Average Partial Test (Velicer, 1976) PA: Parallel Analysis (Horn, 1965) Optimal PA. It is an implementation of Parallel Analysis where it is computed based on the same type of correlation matrix (i.e., Pearson or polychoric correlation) and the same type of underlying dimensions (i.e., components of factor) as defined for the whole analysis (Timmerman & Lorenzo-Seva, 2011) Hull method for selecting the number of common factors: this method aims to find a model with an optimal balance between model fit and number of parameters (Lorenzo-Seva & Timmerman, 2011)

Factor and component analysis:


PCA: Principal Component Analysis ULS: Unweighted Least Squares factor analysis (also MINRES and PAF) EML: Exploratory Maximum Likelihood factor analysis MRFA: Minimum Rank Factor Analysis (ten Berge, & Kiers, 1991) Schmid-Leiman second-order solution (1957) Factor scores (ten Berge, Krijnen, Wansbeek, & Shapiro, 1999) Person fit indices (Ferrando, 2009)

In ULS factor analysis, the Heywood case correction described in Mulaik (1972, page 153) is included: when an update has sum of squares larger than the observed variance of the variable, that row is updated by constrained regression using the procedure proposed by ten Berge and Nevels (1977). Some of the rotation methods to obtain simplicity are:

Quartimax (Neuhaus & Wrigley, 1954) Varimax (Kaiser, 1958) Weighted Varimax (Cureton & Mulaik, 1975) Orthomin (Bentler, 1977) Direct Oblimin (Clarkson & Jennrich, 1988) Weighted Oblimin (Lorenzo-Seva, 2000) Promax (Hendrickson & White, 1964) Promaj (Trendafilov, 1994) Promin (Lorenzo-Seva, 1999) Simplimax (Kiers, 1994)

Some of the indices used in the analysis are:


Test on the dispersion matrix: Determinant, Bartlett's test and KaiserMeyer-Olkin (KMO) Goodness of fit statistics: Chi-Square Non-Normed Fit Index (NNFI; Tucker & Lewis); Comparative Fit Index (CFI); Goodness of Fit Index (GFI); Adjusted Goodness of Fit Index (AGFI); Root Mean Square Error of Approximation (RMSEA); and Estimated Non-Centrality Parameter (NCP) Reliabilities of rotated components (ten Berge & Hofstee, 1999) Simplicity indices: Bentlers Simplicity index (1977) and Loading Simplicity index (Lorenzo-Seva, 2003) Mean, variance and histogram of fitted and standardized residuals. Automatic detection of large standardized residuals.

30.G7 Software for Regression Analysis. G7 is an econometric regression and modelbuilding program for Windows. It is designed for estimation of regression equations with annual, quarterly, or monthly data. G7 takes its name from Carl Friedrich Gauss, the originator of the method of least squares. With G7 you can:

Build and use data banks. Thousands of regularly updated economic time series, in the form of G data banks are available through Inforum and EconData. You can easily build banks of your own data prepared in spread-sheet programs, drawn from other data banks, or typed in a convenient free-form format for input to G7 . Transform variables with algebraic formulas or with a variety of functions including logarithms, exponentials, powers, cumulation of stocks, previous-peaks, random numbers, conversion of monthly to quarterly series or of quarterly to annual series, and interpolation from annual to quarterly series or from quarterly to annual series. Estimate ordinary least squares regressions. Employ the Hildreth-Lu procedure to deal with auto correlated errors.

Do seemingly unrelated regression and stacked regression with constraints across equations. Apply conventional two-stage or three-stage least squares. Apply soft constraints -- also called stochastic constraints, mixed or Bayesian estimation or generalized ridge regression on parameter values. Estimate distributed lags with a generalization of the Almon technique and other methods. Plot distributed lag weights. Estimate and forecast with equations involving both lagged values of the dependent variable and moving average error terms (ARIMA techniques). Calculate auto correlation and partial auto correlation functions. Estimate non-linear regression equations with two different algorithms. Estimate proportions models with multinomial regression algorithms. Save the results of estimations in files which can be combined into a model by Build, G7 's model-building partner. Graph series and the results of fitting equations. G7 graphs data with up to seven series on the screen at one time, either with a uniform scale for all series of with a different scale for each series. It does semi-logarithmic graphs with proper marking of the vertical axis. It can make line graphs, bar graphs, and scatter graphs. It allows annotation of the graphs with both words and lines before printing. In addition, there help files are included in addition to the G7 manual, sample and demo scripts, and other documentation.

Although G7 can respond to directly typed commands, the serious user will want to build files of commands with a screen editor and execute these files. As G7 operates, it builds a data bank which can later be permanently saved. Into this bank it can pull series from other source data banks. For each regression, G7 automatically provides the standard error of estimate (SEE), the mean absolute percentage error (MAPE), R2, the auto correlation coefficient of the residuals (rho), the Durbin-Watson statistic, the number of observations, the number of degrees of freedom, the period of estimation, and the SEE and MAPE for forecasting one period ahead taking account of the auto correlation of the residuals. If a regression is being tested beyond the period of fit, it shows the SEE and MAPE in this period. For each independent variable, it shows the regression coefficient, the marginal explanatory value, the t-statistic, F-statistics, the elasticity at the sample mean, the beta coefficient, the mean, and the derivatives of all of the other independent variables with respect to any selected independent variable. You can also display the correlation coefficient matrix for all the variables. In addition, Chow tests of homogeneity, and the Jarque-Bera test of normality are available. After regression, the residuals and the leverage variables are available.

While G7 is intended primarily for time series analysis, it can be adapted for cross sectional data and techniques such as logit, probit, and Tobit analysis can be performed via the non-linear regression capability. The current version of G7 is available for Windows platforms. G7 now is available for free download. On this site, there is also a G7 Reference, a G7 Tutorial, and Clopper Almon's book Craft of Economic Modeling which offers an extensive guide to economic modeling with G7. How to Obtain G7 Inforum maintains G7 and makes it available for free on this site. The easiest way to obtain it is from this document. Click on the G7 icon at the top of the page, or click on the hyperlink below to download the installation program.. Here is what you will need: PDG - This is a self-extracting ZIP file of the G7 program and other Inforum software. After downloading this file, run it by double-clicking on it, or by typing its name at a DOS prompt. Do this in a temporary directory. One of the files that will be extracted is SETUP.EXE. Next, run this setup program, and the InstallShield session will begin. This will load the software into the default location of C:\PDG. See the Readme files that are included in the installation for further instructions. You may also want to create a directory called \GBANKS. Under this directory, you can create subdirectories for each of the databanks you download from EconData. To start G7, click either on the G7 icon that was added to the desktop or the icon added to the Start menu. You also simply may type "G7" at the DOS command line. In response to the first window that opens, find or create a G.CFG file and then hit the {Enter} (return) key. To access the Help menu, choose help from the top of the G7 window. A G7 Tutorial is available, and Chapter 3 of The Craft of Economic Modeling provides an alternative introduction. There is also a G7 Reference installed in the DOC subdirectory below the PDG directory, if you choose to install it. To observe G7 in action, as well as get a tutorial on its use, download and run a G7 Demo.

In addition to the PDG software package, you also can find the latest updates to G7 and other programs on the Downloads page. If you already have installed the PDG package, you simply may download these programs to your PDG directory to update your software.

31.Mondrian Statistical Data Visualization System. Mondrian is a general purpose statistical data-visualization system. It features outstanding interactive visualization techniques for data of almost any kind, and has particular strengths, compared to other tools, for working with Categorical Data, Geographical Data and Large Data. All plots in Mondrian are fully linked, and offer many interactions and queries. Any case selected in a plot in Mondrian is highlighted in all other plots. Currently implemented plots comprise Histograms, Boxplots y by x, Scatterplots, Barcharts, Mosaicplots, Missing Value Plots, Parallel Coordinates/Boxplots, SPLOMs and Maps.

Mondrian works with data in standard tab-delimited or comma-separated ASCII files and can load data from R workspaces. There is basic support for working directly on data in Databases (please email for further info). Mondrian is written in JAVA and is distributed as a native application (wrapper) for MacOS X and Windows. Linux users need to start the jar-file

32.OpenEpi Epidemiologic Statistics for Public Health. OpenEpi provides statistics for counts and measurements in descriptive and analytic studies, stratified analysis with exact confidence limits, matched pair and person-time analysis, sample size and power calculations, random numbers, sensitivity, specificity and other evaluation statistics, R x C tables, chi-square for dose-response, and links to other useful sites. OpenEpi is free and open source software for epidemiologic statistics. It can be run from a web server or downloaded and run without a web connection. A server is not required. The programs are written in JavaScript and HTML, and should be compatible with recent Linux, Mac, and PC browsers, regardless of operating system. (If you are seeing this, your browser settings are allowing JavaScript.) A new tabbed interface avoids popup windows except for help files. Test results are provided for each module so that you can judge reliability, although it is always a good idea to check important results with software from more than one source. Links to hundreds of Internet calculators are provided. The programs have an open source license and can be downloaded, distributed, or translated. Some of the components from other sources have licensing statements in the source code files. Licenses referred to are available in full text at OpenSource.org/licenses. OpenEpi development was supported in part by a grant from the Bill and Melinda Gates Foundation to Emory University, Rollins School of Public Health.

33.PAST Data Analysis Package aimed at Paleontology. PAST is a free, easy-to-use data analysis package originally aimed at paleontology but now also popular in many other fields. It includes common statistical, plotting and modelling functions: -type data entry form

percentile, ternary, survivorship, spindle, matrix, surface and normal probability plots Axis, robust) with bootstrapping and permutation, Generalized Linear Model including logit regression, lin-log (exponential), log-log (allometric), polynomial, logistic, von Bertalanffy, Michaelis-Menten, sum-of-sines, smoothing splines, LOESS smoothing, Gaussian (species packing), multiple regression. -squared w. permutation test, Fisher's exact, Kolmogorov-Smirnov, Mann-Whitney, Shapiro-Wilk, Jarque-Bera, Spearman's Rho and Kendall's Tau tests with permutation, correlation and partial

correlation, polyserial correlation, covariance, contingency tables, one-way and two-way ANOVA, one-way ANCOVA, Kruskal-Wallis test, sign test, Wilcoxon signed rank test with exact test, Friedman test, Fligner-Killeen test for coefficients of variation, mixture analysis, survival analysis (Kaplan-Meier curves, logrank and other tests), risk difference/risk ratio/odds ratio with tests. - and sample-based rarefaction. Capture-recapture richness estimators. Renyi diversity profiles, SHE analysis, beta diversity. -series, log-normal, broken stick. bootstrapping etc.), Principal Coordinates (19 distance measures), Non-metric Multidimensional Scaling (19 distance measures), Detrended Correspondence Analysis, Canonical Correspondence Analysis, Cluster analysis (UPGMA, single linkage, Ward's method and neighbour joining, 19 distance measures, two-way clustering, bootstrapping), k-means clustering, seriation, discriminant analysis, one-way MANOVA, one-way and two-way ANOSIM and NPMANOVA, Hotelling's T2, paired Hotelling's T2, Mahalanobis-distance permutation, Mardia's multivariate normality, Box's M, Canonical Variates Analysis, multivariate allometry with bootstrapping, Mantel test, SIMPER, Imbrie & Kipp factor analysis, Modern Analog Technique, two-block Partial Least Squares. autocorrelation, cross-correlation, wavelet transform, short-time Fourier transform, Walsh transform, runs test, Markov chains. Mantel correlogram and periodogram. ARMA, Box-Jenkins intervention analysis. Parks-McClellan filtering. Point events analysis. Solar forcing model. -squared, 2 Watson's U , Watson-Williams, Mardia-Watson-Wheeler, circular kernel density estimation, angular mean with CI, rose plots, circular correlation), kernel density estimation of point density, point distribution statistics (nearest neighbour and Ripley's K), point alignment detection, coordinate transformations (WGS84, UTM etc.), spatial autocorrelation (Moran's I), Fourier shape analysis, elliptic Fourier shape analysis, Hangle shape analysis, eigenshapes, landmark analysis with Bookstein and Procrustes fitting (2D and 3D), thin-plate spline transformation grids with expansions and principal strains, partial warps and scores, relative warps and scores, centroid size from landmarks, size removal by Burnaby's method. -and-bound and heuristic algorithms, Wagner, Fitch and Dollo characters. Bootstrap, strict and majority rule consensus trees. Consistency and retention indices. Three stratigraphic congruency indices with permutation tests. Cladograms and phylograms.

stratigraphy with the methods of Unitary Associations, Ranking-Scaling (RASC), Appearence Event Ordination and Constrained Optimization (CONOP). Confidence intervals on stratigraphic ranges. -plate spline and kriging with three semivariogram models. Included in the distribution are real data sets for educational use, together with extensive documentation and case studies. PAST has been tested under Windows XP, Vista and Windows 7.

34.PQRS Tool for Calculating Probabilities and Quantiles associated with a large Number of Probability Distributions. PQRS (Probabilities, Quantiles and Random Samples) PQRS is a tool for calculating probabilities and quantiles associated with a large number of probability distributions. In addition, random samples can be drawn and stored to a file. Quantiles and probabilities are displayed and edited in their

natural position relative to the probability (density) graph. This makes PQRS very easy to use.

PQRS was originally designed as a tool for students using computer aided instruction of statistics in order to make printed tables obsolete. Compared to printed statistical tables PQRS gives you:

more distributions, higher precision (up to 12 decimals), results for any parameter value (within a certain range) and for any specified x-value or probability. Compared to similar computer programs PQRS gives you in addition:

the probability (density) graph, the graph of the cumulative distribution function, probability (density) graph, formulas for the probability (density) function, mean and variance, greek symbols for parameters like alpha, beta, mu and sigma. Additional advantages:

quantiles are also given for discrete distributions, left side probabilities P(X < value), right side probabilities P(X > value) as well as P(X = value) are simultaneously displayed (the latter only if non-zero). random samples of any size can be drawn and written to a file.

35.PSPP Program for Statistical Analysis of Sampled Data. PSPP is a program for statistical analysis of sampled data. It is a Free replacement for the proprietary program SPSS, and appears very similar to it with a few exceptions. The most important of these exceptions are, that there are no time bombs; your copy of PSPP will not expire or deliberately stop working in the future. Neither are there any artificial limits on the number of cases or variables which you can use. There are no additional packages to purchase in order to get advanced functions; all functionality that PSPP currently supports is in the core package. PSPP can perform descriptive statistics, T-tests, linear regression and nonparametric tests. Its backend is designed to perform its analyses as fast as possible, regardless of the size of the input data. You can use PSPP with its graphical interface or the more traditional syntax commands. A brief list of some of the features of PSPP follows:

Supports over 1 billion cases. Supports over 1 billion variables. Syntax and data files are compatible with SPSS. Choice of terminal or graphical user interface. Choice of text, postscript or html output formats. Inter-operates with Gnumeric, OpenOffice.Org and other free software. Easy data import from spreadsheets, text files and database sources. Fast statistical procedures, even on very large data sets. No license fees. No expiration period. No unethical end user license agreements. Fully indexed user manual. Free Software; licensed under GPLv3 or later. Cross platform; Runs on many different computers and many different operating systems.

PSPP is particularly aimed at statisticians, social scientists and students requiring fast convenient analysis of sampled data.

36.Simfit Simulation, Curve Fitting, Statistics and Plotting. Simfit is a free software OpenSource Windows/Linux package for simulation, curve fitting, statistics, and plotting, using a library of models or user-defined equations. It can be used in: 1. biology (nonlinear growth curves); 2. ecology (Bray-Curtis dendrograms); 3. psychology (factor analysis); 4. physiology (membrane transport); 5. pharamacology (dose response curves); 6. pharmacy (pharmacokinetics); 7. immunology (ligand binding); 8. biochemistry (calibration); 9. biophysics (enzyme kinetics); 10.epidemiology (survival analysis); 11.medical statistics (meta analysis); 12.chemistry (chemical kinetics); 13.physics (dynamical systems); and 14.mathematics (numerical analysis). Clipboard data and spreadsheet export files can be analyzed, and macros to interface with Microsoft Office are provided.

37.Zaitun Time Series Zaitun Time Series is a free and open source software designed for statistical analysis of time series data. It provides easy way for time series modeling and forecasting. Zaitun Time Series is a free software. It can be used for any purpose, includes for commercial use. Simple and easy to use The interface is designed by considering the ease of user. The interface is consisted of three main views, they are project view, variable view, and result view which simplify the management of the time series data and the analysis or forecasting result. Sophisticated analysis models Zaitun Time Series provides several statistics and neural networks models, and graphical tools that will make your work on time series analysis easier.

Statistics dan Neural Networks Analysis: Trend Analysis, Decomposition, Moving Average, Exponential Smoothing, Multiple regression, Correlogram, Neural Networks. Graphical Tools: Time Series Plot, Actual and Predicted Plot, Actual and Forecasted Plot, Actual vs Predicted Plot, Residual Plot, Residual vs Actual Plot, Residual vs Predicted Plot.

A simple way to do data forecasting Data forecasting can be done in an easy way, only by clicking the Forecasted option and fill the number of data forecasting step. The result of forecasting in graphical view can be showed only by clicking the Actual and Forecasted option. Stock market data Zaitun Time Series has a capability to deal with the stock market data. It is facilitated with the stock data type which can help the visualization of the stock market data in a candle stick graph. Zaitun Time Series also has a feature to import live stock market data from online stock data provider like Yahoo Finance. This feature is very helpful especially for the user who wants to do analysis on the stock data movement and make a prediction on the values of the stock data for the next few days. The Developer Team

Rizal Zaini Ahmad Fathony (Pekanbaru, Indonesia): core development, programmer

Suryono Hadi Wiboowo (Flores, Indonesia): programmer Khaerul Anas (Semarang, Indonesia): programmer Lia Amelia (Palembang, Indonesia): test and documentation, site administration, marketing

Previous Team Member: Almaratul Sholihah, Muhamad Fuad Hasan, Rismawaty, Aris Wijayanto, Dewi Andriyanti, and Wawan Kurniawan.

38.Minitab Minitab Inc. delivers software and services for quality improvement and statistics education. For more than 35 years, thousands of distinguished organizations in more than 100 countries have turned to Minitab for tools that help quality initiatives yield bottom-line benefits. Businesses trust Minitab Statistical Software to analyze their data, and more than 4,000 colleges and universities use it to teach statistics. Companies worldwide use Quality Companion to manage their improvement projects and Quality Trainer to learn statistics online.

Minitabs customers have also come to rely on our outstanding services, including training and free technical support.

39.Lisrel software tools for statistical data analysis, useful for work in a variety of fields such as:

Statistics (including survey research and informatics) Behavioral and social sciences (such as psychology, sociology, psychiatry, criminal science, family studies, political science, developmental research, anthropology, or social work) Medical research (including nursing, pharmacy, epidemiology, gerontology, kinesiology, sport science, and other fields) Education (in administration, policy studies, test analysis, counseling, and more) Business research (marketing, management, economics, organization) Environmental science (including resource administration and longitudinal research) Other diverse research areas, such as language studies, engineering, law, chemistry, or urban planning

40.SAS SAS helps organizations anticipate and optimize business opportunities. We do this through advanced analytics that turn data about customers, performance,

financials and more into meaningful information. The result? Fact-based decisions for undeniable bottom line impact this is how we transform the way our customers do business. Driven to solve customer business problems, privately-held SAS works closely with customers in all stages of research and development to ensure that they get the most out of new offerings. More than 12,000 employees across more than 400 SAS offices provide local, hands-on support for customers whenever and wherever they need it. Customer loyalty, evidenced by long-standing SAS user communities and a strong renewal history, echoes this customer partnership and our shared commitment to a successful future. What truly differentiates SAS is its "creative capital," in the words of Chief Executive Officer Jim Goodnight. The technical and domain expertise of SAS employees allow the company to serve nearly all industries in multiple cuttingedge analytical capacities, including cloud and high-performance computing, indatabase processing and taking full advantage of the value hidden in unstructured data.

41.Oktave GNU Octave is a high-level interpreted language, primarily intended for numerical computations. It provides capabilities for the numerical solution of linear and nonlinear problems, and for performing other numerical experiments. It also provides extensive graphics capabilities for data visualization and manipulation. Octave is normally used through its interactive command line interface, but it can also be used to write non-interactive programs. The Octave language is quite similar to Matlab so that most programs are easily portable.

42.Gnumeric The Gnumeric spreadsheet is part of the GNOME desktop environment: a project to create a free, user friendly desktop environment. The goal of Gnumeric is to be the best possible spreadsheet. We are not attempting to clone existing applications. However, Gnumeric can read files saved with other spreadsheets and we offer a customizable feel that attempts to minimize the costs of transition.

43.Scilab Scilab is recognized as having educational value Scilab and its module for High schools have been granted with the RIP label ("Reconnu d'intrt pdagogique - recognized as having educational value) by the committee decision of June 15, 2011. Awarded by the French Department of Education, this label is designed to guide teachers in selecting tools that meet the needs and requirements of the educational system.

1. Free Mat FreeMat is a free environment for rapid engineering and scientific prototyping and data processing. It is similar to commercial systems such as MATLAB from Mathworks, and IDL from Research Systems, but is Open Source. FreeMat is available under the GPL license.

44.Euler Math Toolbox The Euler Mathematical Toolbox is a software written and maintained by R. Grothmann, professor of mathematics at the University of Eichsttt.

Powerful, versatile, mature software for numerical and for symbolic computations including an infinite arithmetic, Used in many schools and universities for teaching and research. Similar to Matlab, but has an own style, and a different syntax. Supports symbolic mathematics using the open algebra system Maxima. For Windows, or under Linux in Wine. Exports to HTML, PNG, SVG and more formats. Friendly, easy to use user interface. Extensive help, documentation, reference and examples. Free and open source.

45.Daniels XL Toolbox Microsoft Excel Addin that helps analyzing and presenting data.

46.Pop Tools PopTools is an add-in for PC versions of Microsoft Excel (version 97 and up) that helps with the analysis of matrix population models and simulation of stochastic processes. It was originally written to analyse ecological models, but has much broader application. It has been used for studies of population dynamics, financial modelling, risk analysis, and calculation of bootstrap and resampling statistics. When installed, PopTools adds a new menu item to Excels main menu, and also adds over a hundred new worksheet functions. The routines include array formulas for matrix decompositions (Cholesky, QR, singular values, LU), eigenanalysis (eigenvalues and real eigenvectors of square matrices) and formulas for generation of random variables (eg, Normal, binomial, gamma, exponential, Poisson, logNormal). Many of these functions depend on Jean DeBords TPMath library a Pascal numerical library which has been compiled into a DLL accessed via Excel. Also included in PopTools are routines for iterating spreadsheets. These make it possible to run Monte Carlo simulations, conduct randomisation tests (including the Mantel test) and calculate bootstrap statistics. Some facilities are available for function minimisation and parameter estimation using maximum likelihood techniques, and there are a number of auditing and other tools that the author finds useful in his everyday work. PopTools requires no programming knowledge, but to fully utilise the package you need some knowledge of matrix algebra, and some understanding of probability and statistics. It is therefore most suitable for those who have done some undergraduate statistics.

47.XL Statistic Set of Microsoft Excel Workbooks for Statistical Analysis of Data.

48.Weka Weka is a collection of machine learning algorithms for data mining tasks. The algorithms can either be applied directly to a dataset or called from your own Java code. Weka contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization. It is also well-suited for developing new machine learning schemes.

49.Rosetta ROSETTA is a toolkit for analyzing tabular data within the framework of rough set theory. ROSETTA is designed to support the overall data mining and knowledge discovery process: From initial browsing and preprocessing of the data, via computation of minimal attribute sets and generation of if-then rules or descriptive patterns, to validation and analysis of the induced rules or patterns. ROSETTA is intended as a general-purpose tool for discernibility-based modelling, and is not geared specifically towards any particular application domain. ROSETTA offers a highly intuitive GUI environment where data-navigational abilities are emphasized. The GUI is highly object-oriented in that all manipulable objects are represented as individual GUI items, each with their own set of context-sensitive menus. The computational kernel is also available as a command-line program, suitable for being invoked from, e.g., Perl or Python scripts.

50.Rapid Miner Use RapidMiner and explore your data! Simplify the construction of analysis processes and the evaluation of different approaches. Try to find the best combination of preprocessing and learning steps or let RapidMiner do that automatically for you. More than 400 data mining operators can be used and almost arbitrarily combined. The setup is described by XML files which can easily be created with a graphical user interface (GUI). This XML based scripting language turns RapidMiner into an integrated development environment (IDE) for machine learning and data mining. RapidMiner follows the concept of rapid prototyping leading very quickly to the desired results. Furthermore, RapidMiner can be used as a Java data mining library. The development of most of the RapidMiner concepts started in 2001 at the Artificial Intelligence Unit of the University of Dortmund. Several members of the unit started to implement and realize these concepts which led to a first version of RapidMiner in 2002. Since 2004, the open-source version of RapidMiner (GPL) is hosted by SourceForge. Since then, a large number of suggestions and extensions by external developers were also embedded into RapidMiner. Today, both the open-source version and a close-source version of RapidMiner are maintained by Rapid-I.

51.Adam Soft ADaMSoft is a free and Open Source statistical software developed in Java. It is multilingual and multiplatform. It contains data management methods, Data Mining techniques and it offers several facilities in order to create dynamical reports or to store documents. Starting from ADaMSoft release 2.0.0 it contains, also, a new system, called Strategies Support System that helps the user to find the better strategy for its analytical problem. Furthermore, using its Web Application server it is possibile to use ADaMSoft through Internet. This permits to produce dynamically the web pages and to realize a web-architecture which can be used to analyze and/or share the Information.

52.Statistic 101 Statistics101 is a giftware computer program that interprets and executes the simple but powerful Resampling Stats programming language. The original Resampling Stats language and computer program were developed by Dr. Julian Simon and Peter Bruce as a new way to teach Statistics to social science students. Of course, social science students aren't the only ones who can benefit. Anyone who wants to learn statistics will find that the resampling approach helps in understanding statistical concepts from the simplest to the most difficult. In addition, professionals who want to use resampling, bootsrapping, or Monte Carlo simulations will find Statistics101 helpful. The history, description, and application of the Resampling method to a vast range of statistical problems are described fully in Dr. Simon's book Resampling: The New Statistics.

53.Rundom Pro Statistical Application supporting frequently used Classical and Resampling Methods.

54.PS PS is an interactive program for performing power and sample size calculations that may be downloaded for free. It can be used for studies with dichotomous, continuous, or survival response measures. The alternative hypothesis of interest may be specified either in terms of differing response rates, means, or survival times, or in terms of relative risks or odds ratios. Studies with dichotomous or continuous outcomes may involve either a matched or independent study design. The program can determine the sample size needed to detect a specified alternative hypothesis with the required power, the power with which a specific alternative hypothesis can be detected with a given sample size, or the specific alternative hypotheses that can be detected with a given power and sample size. The PS program can produce graphs to explore the relationships between power, sample size and detectable alternative hypotheses. It is often helpful to hold one of these variables constant and plot the other two against each other. The program can generate graphs of sample size versus power for a specific alternative hypothesis, sample size versus detectable alternative hypotheses for a specified power, or power versus detectable alternative hypotheses for a specified sample size. Linear or logarithmic axes may be used for either axes. Multiple curves can be plotted on a single graphic.

55.AM AM is a statistical software package for analyzing data from complex samples, especially largescale assessments such as the National Assessment of Educational Progress (NAEP) and the Third International Mathematics and Science Studies (TIMSS). From its origin as a specialized tool for analyzing large-scale assessment data, AM has evolved into a more generalized and growing tool for analyzing data from complex samples in general. Originally, AM was developed to estimate regression models through marginal maximum likelihood (MML). Because large-scale assessments are often low-stakes assessments for students, students are usually asked to respond to only a few items; each student sees only part of the whole test. Otherwise, they would be unlikely to expend real effort on any items. As a result, individual test scores are subject to substantial measurement error, which would bias many statistical estimates. Rather than assign each student an error-filled score, MML procedures represent each students proficiency as a probability distribution over all possible scores. MML procedures use these probability distributions in the estimation process. Another characteristic of large-scale assessments has led to a wider applicability of AMthey almost always draw a sample from a complex design. AM automatically provides appropriate standard errors for complex samples using a Taylor-series approximation. This happens automatically even when new procedures are added to the software. Over time, the software has grown to offer a set of non-MML statistics, including regression, probit, logit, cross-tabs, and other statistics that are useful for survey data in general. The American Institutes for Research is committed to keeping AM available as a free and growing tool for the research community. Visit this web site for further information, updates, and technical support.

56.X Pro Package specializing in Exact Parametric Statistical Methods.

57.Winstat Winstats provides access to scatter plots, curve fitting, histograms, statistical data, and standard theoretical probability distributions. It performs many statistical tests and calculates confidence intervals. It simulates dealing cards, rolling dice, sampling candy, taking random walks, and tossing darts, needles and coins. There are two least-squares demos and a confidence-interval demo.

58.Statistical Lab The Statistical Lab is an explorative and interactive tool designed both to support education in statistics and provide a tool for the simulation and solution of statistical problems. The graphical user interface is designed to make complex statistical relations easy to understand. It connects and displays data frames, frequency tables, random numbers or matrixes in a user-friendly statistical worksheet allowing users to run calculations, conduct analyses and perform multiple simulations and manipulations.

59.Statist Statist is a small and portable statistics program written in C. It is terminalbased, but can utilise GNUplot for plotting purposes. It is simple to use and can be run in scripts. Big datasets are handled reasonably well on small machines.

60.Statext Statistics in Text Mode.

61.StatEasy StatEasy adalah sebuah software yang didistribusikan gratis dan dapat bebas didistribusikan dan diinstal pada sejumlah PC, jika tidak untuk tujuan komersial.

StatEasy lahir di universitas pada tahun 1996 oleh Agostino Di Ciaccio dan Simone Borra, adalah untuk memungkinkan siswa untuk menggunakan perangkat lunak statistik gratis di PC mereka di rumah, baik untuk menyebarkan dan mengembangkan metode baru analisis statistik. Secara khusus, tidak ada modul latihan yang sangat menarik, yang terdapat dalam perangkat lunak, berorientasi pada ajaran Statistik.

62.StatCalc StatCalc is a PC calculator that computes table values of 34 statistical distributions. It also computes moments, and many other statistics; see the table of content.

63.SSP Software for Descriptive Statistics.

64.SLGallery Statistical Distribution Graphs and Values.

65.Mortpak Lite Mortpak-Lite: the UN Software package for mortality measurement : interactive software for the IBM-PC and Compatibles. This report contains the working manual for MORTPAK, a software package for demographic measurement in developing countries, with special emphasis on mortality measurement.

66.Eviews EViews (Econometric Views) is a statistical package for Windows, used mainly for time-series oriented econometric analysis. It is developed by Quantitative Micro Software (QMS), now a part of IHS. EViews can be used for general statistical analysis and econometric analyses, such as cross-section and panel data analysis and time series estimation and forecasting. EViews combines spreadsheet and relational database technology with the traditional tasks found in statistical software, and uses a Windows GUI. This is combined with a programming language which displays limited object orientation. EViews relies heavily on a proprietary and undocumented file format for data storage. However, for input and output it supports numerous formats, including databank format, Excel formats, PSPP/SPSS, DAP/SAS, Stata, RATS, and TSP. EViews can access ODBC databases. EViews file formats can be partially opened by gretl.

67.Curve Expert Software CURVE EXPERT ini berguna untuk membuat curva trendline berikut persamaan curva-nya dengan mudah.

68.Kyplot KYPLOT An Integrated Environment for Data Analysis and Visualization.

69.OpenStat

The above shows the data grid when the program has begun. Across the top are the main menues which, when clicked, drop down a list of sub-menues. Below that are some boxes which report the current grid row, column, number of cases, number of variables, the American Standard Code for Information Interchange (ASCII) for a character entered in a grid cell, and a status indicator. There is also a box for editing the contents of a previously entered cell value. Below the "grid" where data are entered is a button that when pressed adds another variable column (with a default type of floating point values.) There is also a box which indicates the current name of the file. This changes after the file has been saved. The Files Menu A variety of options exist for saving and opening data files. The preferred method is to use the file extension .TEX which saves not only the data from the grid but also the definition of the variables in the grid. Tab files are useful for importing data from other programs (for example Excel files) or for exporting a file to another program.

The Variables Menu The typical user, in creating a new data file, will select the "Define" option. This option lets the user specify the name of a variable (grid column), the type of data in the vaiable (floating point, integer, string, etc.), the number of decimal fractions, and a value representing a missing value. One can also sort the data in the grid in an ascending or descending order of one of the variables. Occassionally one will need to transform the values of a variable (for example into normally distributed values) or perform a mathematical transformation such a the log of the values. The user can also combine values of several variables to create a new variable or enter an equation to combine and transform multiple variables. If a file contains one variable of values and another variable containing the frequency count of values in the previous variable, one can construct a new file that contains each of the values in one column. Finally, a large file may be split into several files for different analyses or several files may be merged. The Edit Menu A typical user will often need to insert a new row or column in the data grid, delete a row or column of the grid, copy and paste row or column data, etc. The Edit menu provides a variety of tools for modifying data in the grid. Excel users, for example, may copy a block of data from an Excel file and paste that block of data into the OpenStat grid. Occassionally a user will need to recode values in one of his or her variables. The recode option provides this capability. In some statistics programs, data which represents group membership may consist of strings such as "Male" or "Female". OpenStat requires group codes to be integer values. An option exists for creating integer codes from a variable containing string codes. If you are a user from a country that uses the comma (,) to separate decimal fractions rather than the period (.), you are going to be using the European standard for coding numbers. If you load a file using the USA standard (period separator) you may need to switch the coding to the European standard (comma separator.) The Analyses Menu Under the Analyses main menu are listed a number of major sub-menues. Many of the descriptive and analyses procedures produce graphical output. As an example, the following is a plot of X versus Y from the Descriptives Menu:

Clearly, there are too many procedures to describe here. OpenStat contains a large variety of parametric, nonparametric, multivariate, measurement, statistical process control, financial and other procedures. One can also simulate a variety of data for tests, theoretical distributions, multivariate data, etc.

70.Zelig Zelig is a single, easy-to-use program that can estimate, help interpret, and present the results of a large range of statistical methods. It literally is "everyone's statistical software" because Zelig uses (R) code from many researchers. We also hope it will become "everyone's statistical software" for applications, and we have designed it so that anyone can use it or add their methods to it. Zelig comes with detailed, self-contained documentation that minimizes startup costs for Zelig and R, automates graphics and summaries for all models, and, with only three simple commands required, generally makes the power of R accessible for all users. Zelig also works well for teaching, and is

designed so that scholars can use the same program with students that they use for their research. Zelig adds considerable infrastructure to improve the use of existing methods. It generalizes the program Clarify (for Stata), which translates hard-to-interpret coefficients into quantities of interest; combines multiply imputed data sets (such as output from Amelia) to deal with missing data; automates bootstrapping for all models; uses sophisticated nonparametric matching commands which improve parametric procedures (via MatchIt); allows one-line commands to run analyses in all designated strata; automates the creation of replication data files so that you (or, if you wish, anyone else) can replicate the results of your analyses (hence satisfying the replication standard); makes it easy to evaluate counterfactuals (via WhatIf); and allows conditional population and superpopulation inferences. Zelig includes many specific methods, based on likelihood, frequentist, Bayesian, robust Bayesian, and nonparametric theories of inference.

71.HLM HLM Software for Hierarchical Linear Modeling. In social research and other fields, research data often have a hierarchical structure. That is, the individual subjects of study may be classified or arranged in groups which themselves have qualities that influence the study. In this case, the individuals can be seen as level-1 units of study, and the groups into which they are arranged are level-2 units. This may be extended further, with level-2 units organized into yet

another set of units at a third level and with level-3 units organized into another set of units at a fourth level. Examples of this abound in areas such as education (students at level 1, teachers at level 2, schools at level 3, and school districts at level 4) and sociology (individuals at level 1, neighborhoods at level 2). It is clear that the analysis of such data requires specialized software. Hierarchical linear and nonlinear models (also called multilevel models) have been developed to allow for the study of relationships at any level in a single analysis, while not ignoring the variability associated with each level of the hierarchy. The HLM program can fit models to outcome variables that generate a linear model with explanatory variables that account for variations at each level, utilizing variables specified at each level. HLM not only estimates model coefficients at each level, but it also predicts the random effects associated with each sampling unit at every level. While commonly used in education research due to the prevalence of hierarchical structures in data from this field, it is suitable for use with data from any research field that have a hierarchical structure. This includes longitudinal analysis, in which an individual's repeated measurements can be nested within the individuals being studied. In addition, although the examples above implies that members of this hierarchy at any of the levels are nested exclusively within a member at a higher level, HLM can also provide for a situation where membership is not necessarily "nested", but "crossed", as is the case when a student may have been a member of various classrooms during the duration of a study period. The HLM program allows for continuous, count, ordinal, and nominal outcome variables and assumes a functional relationship between the expectation of the outcome and a linear combination of a set of explanatory variables. This relationship is defined by a suitable link function, for example, the identity link (continuous outcomes) or logit link (binary outcomes).

72.Ministep MINISTEP Software for Multiple-Choice, Rating Scale and Partial Credit Rasch Analysis. MINISTEP is a reduced version of WINSTEPS. It has complete WINSTEPS functionality.

73.ESTA + Software for Descriptive Statistics. This is a descriptive statistics program that can analyze and plot using data distributed in one-dimensional or twodimensional variables.

74.IRRISTAT IRRISTAT Software for Basic Statistical Analysis of Experimental Data aimed primarily at the Analysis of Data from Agricultural Field Trials. IRRISTAT merupakan software statistik yang dikembangkan oleh IRRI. Software ini dapat diperoleh secara gratis dengan cara men-download-nya dari website IRRI pada alamat situs: http://www.irri.org/science/software/irristat.asp. IRRISTAT merupakan software freeware yang boleh digunakan dan disebarluaskan secara gratis selama tidak digunakan untuk kepentingan komersial. Program aplikasi ini dapat dijalankan pada berbagai operating system berbasis Windows 32 bit.

Karena program ini dibuat untuk kepentingan para pemulia dan agronom, IRRISTAT memiliki beberapa kelebihan bila dibandingkan dengan software statistik lainnya. Keunggulan tersebut antara lain tersedianya modul untuk membuat rancangan percobaan dengan rancangan yang umum digunakan dalam program pemuliaan tanaman, seperi rancangan kisi dan rancangan petak terbagi. Modul dan fasilitas utama yang disediakan meliputi: 1. Data management with a spreadsheet 2. Text editor 3. Summary Statistics and Scatterplot Graphics 4. Analysis of Variance 5. Regression and Correlation 6. Single Site Analysis of Plant Breeding Variety Trials 7. Cross site and AMMI anaysis 8. Pattern analysis of GxE Interaction 9. Quantitative trait loci analysis 10. Randomization and layout of experimental designs 11. Display of linear forms for general factorial EMS 12. Generation of coefficients for orthogonal polynomials Selain itu, software ini juga dilengkapi dengan modul-modul untuk melakukan berbagai analisa statistik, baik yang sederhana maupun yang rumit seperti mixed model analysis yang berbasis pada RML (Restricted Maximum Likelihood). IRRISTAT juga dilengkapi dengan modul untuk melakukan QTL analisis. Software ini dilengkapi pula dengan menu Help dan juga manual. Meskipun menu Help-nya kurang memadai, namun unformasi yang diberikannya cukup praktis. Manual IRRISTAT ditulis secara sederhana dan mudah dipahami. Selain itu, manual tersebut dilengkapi pula dengan contoh soal dan latihan. Dengan demikian, untuk keperluan analisis data, IRRISTAT merupakan pilihan yang baik. Selain karena merupakan freeware, software ini juga praktis dan sederhana.

75.PAMCOMP Software for Basic Statistical Analysis of Experimental Data aimed primarily at the Analysis of Data from Agricultural Field Trials. PAMCOMP (Person-years And Mortality COMputation Program) is a free application for calculating person-years and standardised mortality ratios (SMRs).The program was developed with Visual Basic 6.0 (SP5) and Visual C++ (SP5) and runs under Windows 95/98/2000/ME/NT and XP. The calculation of person-years allows flexible stratification by sex, and self-defined and unrestricted calendar periods and age groups. Furthermore it is possible to lag person-years to account for latency periods. The SMR computation includes calculation of 90%, 95%, and 99% confidence intervals. Import and export filters for standard personal computer file formats (such as ASCII, dBase, MS-Excel, Paradox and MS-Access) to import cohort and reference data and to export distributions of person-years and deaths are available. In addition importing of external ODBC data sources is possible.

76.REGRESS + Tool for Univariate Mathematical Modeling. General 1. Simple (univariate) mathematical modeling 2. Data: -- deterministic (regression) -- stochastic (random variates) 3. Up to 2,147,483,647 points (minimum 7) 4. Robust goodness-of-fit testing 5. Bootstrap confidence intervals (90, 95, and 99 percent) for -- parameters -- stochastic-model goodness-of-fit metrics 6. Bootstrap methodology (where appropriate): -- BCa technique (state-of-the-art) -- percentile technique -- tunable precision

7. Optional "freezing" of initial estimate(s) for any parameter(s) 8. Choice of optimization criterion 9. No hidden assumptions anywhere: -- no approximations, apart from those common to sampling and bootstrapping generally -- no data transformations of any kind 10."Smart" dialogs to run unattended and/or in the background 11.Textfile input (may contain unlimited comments) 12.Both text and graphical output 13.One keystroke makes a plot (may be saved as a PICT) 14.Extensive documentation (in PDF): -- Tutorial (50 pp.) -- Users' Guide (36 pp.) -- Technical Details and References (8 pp.) -- Appendix A: A Compendium of Common Probability Distributions (120 pp., separate volume) -- Appendix B: Error Messages (4 pp.) 15.Lots of sample datafiles Deterministic Modeling 16.Models: y = f(x), with 1 to 10 parameters -- 22 Built-in families of models -- Gaussian-Lorentzian model for spectral peaks (see Example #5) -- "User-defined" model 17.Special Simulated Annealing mode to help find initial parameter estimates 18.Optimization criteria: -- Least-squares -- Minimum average deviation 19.[Optional] Weights (for the dependent variable, y) 20.[Optional] Listing of fitted data and residuals Stochastic Modeling 21.56 Built-in distributions: -- 30 Continuous -- 17 Continuous binary mixtures -- 5 Discrete -- 4 Discrete binary mixtures 22.Optimization criteria: -- Maximum-likelihood (all)

-- Minimum Kolmogorov-Smirnov statistic (continuous variates) -- Minimum Chi-square (discrete variates) 23.[Optional] Discrete input may be grouped 24.[Optional] Creation of samples of random variates (textfiles, see Example #10)

77.STATTUCINO Software for Descriptive Statistics. Features


Frequencies Summary Statistics: Means, minimum, maximum, standard deviation T test Paired T test Correlations Anova Regression Analysis Logistic Regression Analysis Import csv files Import zohosheet files Save results in HTML Online help files Export data in csv files (to excel)

78.SmartPLS SmartPLS is a software application for (graphical) path modeling with latent variables (LVP). The partial least squares (PLS)-method is used for the LVPanalysis in this software.

In the download area, the first beta-version is accessible (free of charge). A registration is required! The following new features are presented in the new release SmartPLS 2.0 (beta):

a completely reengineered software application using the JAVA Eclipse Platform, the option to easily extend the functionalities of SmartPLS by JAVA Eclipse Plug-ins, and a SmartPLS community to discuss all software and PLS related topics with other users and experts.

79.RSTUDIO RStudio is a new integrated development environment (IDE) for R. RStudio combines an intuitive user interface with powerful coding tools to help you get the most out of R.

Productive RStudio brings together everything you need to be productive with R in a single, customizable environment. Its intuitive interface and powerful coding tools help you get work done faster. Runs Everywhere RStudio is available for all major platforms including Windows, Mac OS X, and Linux.

80. SCIGRAPHICA SciGraphica, developed by Adrian E. Feiguin, is a scientific application for data analysis and technical graphics. It has similarities with Sigmaplot and pretends to be a clone of the popular commercial (and expensive) application "Microcal Origin". It fully supplies plotting features for 2D, 3D and polar charts. The aim is to obtain a fully-featured, cross-plattform, user-friendly, self-growing scientific application. It is free and open-source, released under the GPL license. Main features:

You can plot functions and manipulate data in worksheets. You can open several worksheets and plots and work with them interactively and at the same time. The plots are fully configurable using a control panel dialog. The look and feel is completely WYSIWYG. Production/Publication quality PostScript output. You can interact with the plots double-clicking, dragging and moving objects with the mouse. Native XML file format. You can insert Python expressions in the worksheets. Terminal with command-line Python interface for interacting with plots and worksheets

It is completely programmed in C from scratch, using the GTK+ and GtkExtra libraries, and released under the GPL agreement. Data manipulation and fitting features are in Binaries are currently available for several Linux platforms. the roadmap.

81.FITYK fityk by Marcin Wojdyr is a free GPL-licensed peak fitting program for Linux, Windows and MacOS X. For optimization it has builtin algorithms for Levenberg-Marquard gradient-based method, Nelder-Mead downhill simplex method and Genetic Algorithms. Every of these methods has a set of adjustable parameters, for greater flexibility.

82.DAP a statistics and graphics package developed by Susan Bassein for Unix and Linux systems, with commonly-needed data management, analysis, and graphics (univariate statistics, correlations and regression, ANOVA, categorical data analysis, logistic regression, and nonparametric analyses). Provides some of the core functionality of SAS, and is able to read and run many (but not all) SAS program files. Dap is freely distributed under a GNU-style license.

83.PINT Pint is a program for Power analysis IN Two-level designs (for determination of standard errors and optimal sample sizes in multilevel designs with 2 levels). It was written by Tom Snijders, Roel Bosker, and Henk Guldemond. The newest (Windows) version is 2.11 (April 2003). This program calculates approximate standard errors for estimates of fixed effect parameters in hierarchical linear models with two levels. 84.FIASCO FIASCO is a collection of software designed to analyze fMRI data using a series of processing steps. The input is the raw data, and the output are statistical brain maps showing regions of neural activation.

85.SISA Simple Interactive Statistical Analysis for PC (DOS) from Daan Uitenbroek. An excellent collection of individual DOS modules for several statistical calculations, including some analyses not readily available elsewhere.

Simple Interactive Statistical Analysis

86.STATISTICAL SOFTWARE by Paul W. Mielke Jr. -- a large collection of executable DOS programs (and Fortran source). Includes: Matrix occupancy, exact g-sample empirical coverage test, interactions of exact analyses, spectral decomposition analysis, exact mrbp (randomized block) analyses, exact multi-response permutation procedure, Fisher's Exact for cross-classfication and goodness-of-fit, Fisher's combined pvalues (meta analysis), largest part's proportion, Pearson-Zelterman, Greenwood-Moran and Kendall-Sherman goodness-of-fit, runs tests, multivariate Hotelling's test, least-absolute-deviation regression, sequential permutation procedures, LAD regression, principal component analysis, matched pair permutation, r by c contingency tables, r-way contingency tables, and Jonkheere-Terpstra.

87.SYSTAT Powerful statistical software ranging from the most elementary descriptive statistics to very advanced statistical methodology. Novices can work with its friendly and simple menu-dialog; statistically-savvy users can use its intuitive command language. Carry out very comprehensive analysis of univariate and multivariate data based on linear, general linear, and mixed linear models; carry out different types of robust regression analysis when your data are not suitable for conventional multiple regression analysis;compute partial least-squares regression;design experiments, carry out power analysis, do probability calculations on many distributions and fit them to data; perform matrix computations. Provides Time Series, Survival Analysis, Response Surface Optimization, Spatial Statistics, Test Item Analysis, Cluster Analysis, Classification and Regression Trees, Correspondence Analysis, Multidimensional Scaling, Conjoint Analysis, Quality Analysis, Path Analysis, etc. A 30-day evaluation version is available for free download.

88.STATLETS a 100% Pure Java statistics program. Should run on any platform (PC, Mac, Unix) that supports Java. The free Academic Version is limited to 100 cases by 10 variables.

89.WINKS (Windows KWIKSTAT) WINKS (Windows KWIKSTAT) -- a full-featured, easy-to-use stats package with statistics (means, standard deviations, medians, etc.), histograms, t-tests, correlation, chi-square, regression, nonparametrics, analysis of variance (ANOVA), probability, QC plots, cpk, graphs, life tables, time series, crosstabs, and more. Works on Windows XP (as well as Windows 2000, NT, 98, ME and 95.) Comes in Basic and Professional editions. Evaluation version available for download.

90.STUDYRESULT StudyResult -- (30-day free trial) General statistics package for: paired & unpaired t-test, one-way ANOVA, Fisher's exact , McNemar's, Chi2, Chi2 homogeneity , life table & survival analysis, Wilcoxon rank-sum & signed-rank, sign test, bioequivalence testing, correlation & regression coefficient tests. Special features for interpreting summary data found in publications (p-values

& conf. intervals from summary statistics, converts p-values to CI's & vice versa, what observed results are needed to get a significant result, estimates from publications needed for sample size calculations). Includes equivalenceand non-inferiority testing for most tests.

91.STATGRAPHICS Plus STATGRAPHICS Plus (for Windows) -- over 250 statistical analyses: regression, probit, enhanced logistic, factor effects plots, automatic forecasting, matrix plots, outlier identification, general linear models (random and mixed), multiple regression with automatic Cochrane-Orcutt and Box-Cox procedures, Levene's, Friedman's, Dixon's and Grubb's tests, Durbin-Watson p-values and 1variable bootstrap estimates, enhanced 3D charts. For Six Sigma work: gage linearity and accuracy analysis, multi-vari charts, life data regression for reliability analysis and accelerated life-testing, long-term and short-term capability assessment estimates. Two free downloads are available: fullfunction but limited-time(30 days), and unlimited-time but limited-function (no Save, no Print, not all analyses).

92.PRISM Prism -- from GraphPad Software. Performs basic biostatistics, fits curves and creates publication quality scientific graphs in one complete package (Mac and

Windows). Windows demo is fully-functional for 30 days, then disables printing, saving and exporting; Mac demo always disables these functions.

93.CoSTAT CoStat -- an easy-to-use program for data manipulation and statistical analysis, from CoHort Software. Use a spreadsheet with any number of columns and rows of data: floating point, integer, date, time, degrees, text, etc. Import ASCII, Excel, MatLab, S+, SAS, Genstat, Fortran, and others. Has ANOVA, multiple comparisons of means, correlation, descriptive statistics, analysis of frequency data, miscellaneous tests of hypotheses, nonparametric tests, regression (curve fitting), statistical tables, and utilities. Has an auto-recorder and macro programming language. Callable from the command line, batch files, shell scripts, pipes, and other programs; can be used as the statistics engine for web applications. Free time-limited demo available.

94.G*POWER 3 G*Power 3 -- a very general Power Analysis program for Windows and Macintosh. Performs exact analysis for 6 types of correlation tests, 3 types of bivariate regression tests, 1-group and 2-group comparison of means tests (parametric and non-parametric), 4 types of multiple regression tests, logistic regression, poisson regression, ordinary and repeated-measures ANOVAs, ANCOVAs, MANOVAs, multivariate T2 and MANOVAs, 8 types of tests of proportions (McNemar, Fisher, etc.), 1-group and 2-group variance tests, and completely generic tests involving the binomial, normal, t, chi-square, and F distributions. Computes power, sample sizes, alpha, beta, and alpha/beta ratios. Has a comprehensive web-based tutorial and reference manual.

95.STATMATE GraphPad StatMate takes the guesswork out of evaluating how many data points you'll need for an experiment, and makes it easy for you to quickly calculate the power of an experiment to detect various hypothetical differences. Its wizard-based format leads you through the necessary steps to determine the tradeoffs in terms of risks and costs. There is no learning curve with StatMate because it is self-explanatory. All the documentation you need is built right into the program.

Why sample-size matters Many experiments and clinical trials are run with too few subjects. An underpowered study is wasted effort if even substantial treatment effects go undetected. When planning a study, therefore, you need to choose an appropriate sample size. Your decision depends upon a number of factors including, how scattered you expect your data to be, how willing you are to risk mistakenly finding a difference by chance, and how sure you must be that your study will detect a difference, if it exists. StatMate shows you the tradeoffs Some programs ask how much statistical power you desire and how large an effect you are looking for and then tell you what sample size you should use. The problem with this approach is that often you can't really know this in advance. You want to design a study with very high power to detect very small effects and with a very strict definition of statistical significance. But doing so requires lots of subjects, more than you can afford. StatMate 2 shows you the

possibilities and helps you to understand the tradeoffs in terms of risk and cost so you can make sound sample-size and power decisions. What about power? You also need to know if your completed experiments have enough power. If an analysis results in a "statistically significant" conclusion, it's pretty easy to interpret. But interpreting "not statistically significant" results is more difficult. Its never possible to prove that a treatment had zero effect, because tiny differences may go undetected. StatMate shows you the power of your experiment to detect various hypothetical differences.

96.Scientific Calculator Scientific Calculator - ScienCalc program contains high-performance arithmetic, trigonometric, hyperbolic and transcendental calculation routines. All the function routines therein map directly to Intel 80387 FPU floating point machine instructions.

97.DISTRIBUTIONS Distributions -- Windows program allows for the analysis of discrete single dimension distributions. The program is based on various manipulations of the poisson, binomial and hypergeometric distribution. Available are the probability of an observed number of cases given a certain null hypothesis, the calculation of exact poisson, binomial or hypergeometric confidence intervals, the exact and approximate size of a population using catch-recatch methodologies, the full analysis of a Poisson distributed rate ratio, Fieller analysis, and two versions of the negative binomial distribution can be used in various ways. Beside the exact procedures there are also various approximate procedures available. From the Downloads section of the QuantitativeSkills web site.

98.MULTINOMIAL This Windows program is the exact solution to the Chi-square Goodness of fit test of testing for a difference between an observed and an expected distribution

in a one-dimensional array. For example, the test can be used to compare the distribution of diseases in a certain locality with an expected distribution on the basis of national or international experiences using an ICD classification. In a two-category array the multinomial test provides a two-sided solution for the Binomial test. For example, Multinomial {10 20 0.20 0.80} gives the two-sided probability (0.105) for the single sided Binomial {0.20 10 30} probability (0.061). The multinomial allows you to work with empty '0' observation cells although you must have an expectation about a cell. From the Downloads section of the QuantitativeSkills web site.

99.TABLES Tables -- a Windows program for the analysis of tables with up to 2*7 and 3*3 cells. The program allows for exact and approximate statistics to be calculated for traditional, ordinal and agreement tables. Fisher exact, Number Needed to Treat, Proportional Reduction in Error Statistics, Normal Approximations, Four different Chi-squares, Gamma, Odds-ratio, t-tests and Kappa are among the many statistical procedures available. From the Downloads section of the QuantitativeSkills web site. 100. MOREPOWER MorePower -- another well-implemented power/sample-size calculator for any ANOVA design, for 1- and 2-sample t-tests, and for 1- and 2-sample binomial testing (sign test, chi-square test).

101. EQPLOT EqPlot -- Equation graph plotter program plots 2D graphs from equations. The application comprises algebraic, trigonometric, hyperbolic and transcendental functions.

102. BLOCKTREAT BlockTreat -- a Java program that implements a very general Monte Carlo procedure that performs non-parametric tests (based on random permutations, not ranks) for block and treatment tests, tests with matching, k-sample tests, and tests for independence between any two random variables. Designs may be incomplete and unbalanced, or even have supernumerary entries. The tests are "exact", in the Monte-Carlo sense -- they can be made as accurate as desired by specifying enough random shuffles. 103. PCP (Pattern Classification Program) PCP (Pattern Classification Program) -- a machine-learning program for supervised classification of patterns (vectors of measurements). PCP implements: Fisher's linear discriminant, dimensionality reduction using SVD, PCA, feature subset selection, Bayes error estimation, parametric classifiers (linear and quadratic), L-S (pseudo-inverse) linear discriminant, k-Nearest Neighbor, neural networks (Multi-Layer Perceptron), SVM, model selection for SVM, cross-validation, and bagging (committee) classification. Supports interactive (keyboard-driven menus) and batch processing. 104. PEPI PEPI -- a collection of 43 small DOS / Windows programs that perform a large assortment of statistical tests. They can be downloaded individually, or as a single ZIP file. (A new Windows version is being developed; the test version can be downloaded here.) They were written to accompany the book Computer Programs for Epidemiologic Analyses: PEPI v. 4.0, by Abramson and Gahlinger, which is available for purchase. A freely-accessible article describing the new features of WinPEPI can be accessed here. The programs include: p-value adjustments for multiple significance tests; Attributable and Prevented Fractions: Case-Control Studies; Analysis of 2 x 2 Tables; Chi-square Tests of Association; Combining Measures of Association or Probabilities; Confidence Intervals; Aids to Use of Pearson's Correlation Coefficients; ifference Between Rates, Proportions or Means; Direct Standardization; Exact Test for a 2 x K Table; Tests for Goodness of Fit ; Fitting of Poisson and Binomial Distributions; Appraisal of Frequency Distribution ; Indirect Standardization; Agreement Between Categorical Ratings; Life Table Analysis; Logistic Regression Analysis (Unconditional and Conditional); WilcoxonMann-Whitney Test and Related Procedures ; Extended Mantel-Haenszel Procedure: Trend Analysis; Multiple Matched Controls; Correcting for Misclassification in 2 x 2 Tables; Analysis of Paired Samples ; Poisson Probability: Observed vs Expected Events; Poisson Regression Analysis; Power of a Test Comparing Two Proportions or Means; Probability and Inverse

Probability Values: Z, t, Chi Square, F; Procedures using Random Numbers; Association Between Ordinal-Scale Variables; Comparison of Two Rates or Proportions; Comparison of Person-Time Incidence Rates; ower and Sample Size for Regression and Correlation Analyses; Comparison of Several Related Samples; Sample Size for Estimation of Proportion, Rate, or Mean; Sample Sizes for Comparison of Two Samples ; Internal Consistency of a Scale; Screening and Diagnostic Tests ; Seasonal Variation ; Smoothing of Curves and Median Polish Procedure; Kaplan-Meier Life Table Analysis, Log-rank and Logit-rank Tests; Calculation of Elapsed Time; Trend Analysis and Multiple Comparisons, and two special calculators: WHATIS and WHATS. 105. EASY SAMPLE EasySample -- a tool for statistical sampling. Supports several types of attribute and variable sampling and includes a random number generator and standard deviation calculator. Has a consistent, easy-to-use interface. Results may be saved or read in CSV (spreadsheet compatible) or XML (Internet compatible) file formats or printed. 106. GROCER Grocer -- a free econometrics toolbox that runs under Scilab. It contains: most standard econometric capabilities: ordinary least squares, autocorelated models, instrumental variables, non linear least squares, limited dependent variables, robust methods, specification tests (multicolinearity, autocorelation, heteroskedasticity, normality, predictive failure,...), simultaneous equations methods (SUR, two and three stage least squares,...), VAR, VECM, VARMA and GARCH estimation, the Kalman filter and time varying parameters estimation, unit root tests (ADF, KPSS,...) and cointegration methods (CADF, Johansen,...), HP, Baxter-King and Christiano-Fitzgerald filters. It also contains some rare -and useful- features: a pc-gets device that performs automatic general to specific estimations, and a contributions device, that provides contributions of exogenous variables to an endogenous one for any dynamic equation. Has a -rough- interface with Excel and unlike Gauss or Matlab, it deals with true timeseries objects. 107. BIOMAPPER Biomapper -- a kit of GIS and statistical tools designed to build habitat suitability (HS) models and maps for any kind of animal or plant. Deals with: preparing ecogeographical maps for use as input for ENFA (e.g. computing frequency of occurrence map, standardisation, masking, etc.); Exploring and comparing them by mean of descriptive statistics (distribution analysis, etc.);

Computing the Ecological Niche Factor Analysis and exploring its output; and Computing and evaluating a Habitat Suitability map

108. ROC Curve ROC Curves -- a set of downloadable programs and Excel spreadsheets to calculate and graph various kinds of ROC (Receiver Operator Characteristic) curves.

109. BKD: Bayesian Knowledge Discoverer BKD: Bayesian Knowledge Discoverer -- a computer program able to learn Bayesian Belief Networks from (possibly incomplete) databases. Based on a new estimation method called Bound and Collapse. Developed within the Bayesian Knowledge Discovery project. See also the commercial product, called Bayesware Discoverer, available free for non-commercial use.

110. BPP (Binomial Probability Program) The Binomial Probability program (BPP) is a menu driven program which performs a variety of functions related to the success/ failure situation. Given the probability of occurrence for a specific event, this program calculates the probability that EXACTLY, NO MORE THAN, or AT LEAST a certain number of events occur in a given number of trials for all possible outcomes, and will generate plots for each of these. The program allows the user to repeatedly combine probabilities in series or in parallel, and at any time will show a trail of the calculations which led to the current probability value. Other program capabilities are the calculation of probabilities from input data, Gaussian approximation, and the generation of a mean time between failure (MTBF) table for various levels of confidence. Up to 2200 trials may be run, limited by IBM PC BASIC memory utilization. It is assumed that the user is familiar with the theory behind binomial probability distribution. 111. Weibull Trend Toolkit Weibull Trend Toolkit -- Fits a Weibull distribution function (like a normal distribution, but more flexible) to a set of data points by matching the skewness of the data. (Windows)

112. TURNER Macintosh software for interactivly analysing multidimensional discrete data. Uses interactive paradigms from exploratory graphical data analysis to the concise treatment of categorical data, typically arranged in two- or multi-way contingency tables. Including standard features for categorical data like Pearson's chi-squared test and log-linear models it offers the whole goodness-

of-fit family of power divergence statistics and the N-value. Interactive contingency tables provide the user with the facility of easily switching between all two-dimensional views of multivariate data. All displays dealing with the same data set are fully linked and may be interacted with directly. 113. QUEST QUEST is a binary-split decision tree algorithm for classification and data mining developed by Wei-Yin Loh (University of Wisconsin-Madison) and YuShan Shih (National Chung Cheng University, Taiwan). QUEST stands for Quick, Unbiased and Efficient Statistical Tree. The objective of QUEST is similar to that of the CART(TM) algorithm described in the book, Classification and Regression Trees, by Breiman, Friedman, Olshen and Stone (1984). [CART is a registered trademark of California Statistical Software, Inc.] The major differences are:

QUEST uses an unbiased variable selection technique by default QUEST uses imputation instead of surrogate splits to deal with missing values QUEST can easily handle categorical predictor variables with many categories

If there are no missing values in the data, QUEST can optionally use the CART algorithm to produce a tree with univariate splits. 114. CRUISE CRUISE is a statistical decision tree algorithm for classification (also called supervised learning) developed by Hyunjoong Kim (Yonsei University, Korea) and Wei-Yin Loh (University of Wisconsin-Madison, USA). It is a muchimproved descendant of an older algorithm called FACT. CRUISE stands for Classification Rule with Unbiased Interaction Selection and Estimation. CRUISE is unique among classification tree algorithms in possessing the following properties:

It splits each node into as many subnodes as the number of classes in the response variable It has negligible bias in variable selection It has several ways to deal with missing values It can detect local interactions between pairs of predictor variables

115. AMELIA A program for substituting reasonable values for missing data (called "imputation") A collection of MS-DOS program from the Downloads section of the QuantitativeSkills web site:

Hypergeometric -- calculates the hypergeometric probability distribution to evaluate hypothesis in relation to sampling without replacing in small populations Binomial -- calculates probabilities for sampling with replacing in small populations or without replacing in very large populations. Can be used to approximate the hypergeometric distribution. The binomial is probably the best known discrete distribution. Poisson -- calculates probabilities for samples which are very large in an even larger population. Is used to approximate the binomial distribution, try to compare it with the binomial! The distribution is more often used in a completely different way, for the analysis of how rare events, such as accidents, cumulate for a single individual. For example, you can use it to estimate your chances of getting one, two, three or more accidents in any one year considering that on average people get 'U' accidents per year. Negative binomial -- Also used to study accidents, is a more general case than the Poison, it considers that the probability of getting accidents if accidents clusters differently in subgroups of the population. However, the theoretical properties of this distribution and the possible relationship to real events are not well known. Negative binomial -- Another version of the negative binomial, this one is used to do the marginal distribution of binomials (try it!). Often used to predict the termination of real time events. An example is the probability of terminating listening to a non-answering phone after n-rings. Multinomial -- Same as the multinomial above, this one for DOS computers. Fisher -- Is used to calculate the exact p-value in 2*2 tables. It is o.k. for one sided testing but not so exact for two sided testing, where there are different theories about how to do it. The sum of small p-values is the most used method, but there does not seem to be a good rationale for that. Use the fisher exact instead of the Chi-square when you have a small value in one cell or a very uneven marginal distribution. SPRT -- This method of analysis is not often used, which is a pity because it is actually quite good. It is based on the case of phenomena being observed, tested, or data collected, sequentially in time. The testing or data collection is stopped as soon as some upper or lower limit is

crossed of the proportion positive or negative events or outcomes relative to the total number observed. Was originally developed to keep the costs of 'destructive' testing low. Is sometimes used in medical trials to monitor the amount of negative side effects and to decide if the trial should be stopped because the number of side effect is considered unacceptably high. Chi-square -- Calculates the Chi-square and some other measures for two dimensional tables CASRO -- Calculates response rates according to different procedures. The CASRO (Council of American Survey Research Organizations) procedure is the 'accepted' procedure for surveys.

116. DATA PREPARATOR A program for substituting reasonable values for missing data (called "imputation") A collection of MS-DOS program from the Downloads section of the QuantitativeSkills web site:

Hypergeometric -- calculates the hypergeometric probability distribution to evaluate hypothesis in relation to sampling without replacing in small populations Binomial -- calculates probabilities for sampling with replacing in small populations or without replacing in very large populations. Can be used to approximate the hypergeometric distribution. The binomial is probably the best known discrete distribution. Poisson -- calculates probabilities for samples which are very large in an even larger population. Is used to approximate the binomial distribution, try to compare it with the binomial! The distribution is more often used in a completely different way, for the analysis of how rare events, such as accidents, cumulate for a single individual. For example, you can use it to estimate your chances of getting one, two, three or more accidents in any one year considering that on average people get 'U' accidents per year. Negative binomial -- Also used to study accidents, is a more general case than the Poison, it considers that the probability of getting accidents if accidents clusters differently in subgroups of the population. However, the theoretical properties of this distribution and the possible relationship to real events are not well known. Negative binomial -- Another version of the negative binomial, this one is used to do the marginal distribution of binomials (try it!). Often used to

predict the termination of real time events. An example is the probability of terminating listening to a non-answering phone after n-rings. Multinomial -- Same as the multinomial above, this one for DOS computers. Fisher -- Is used to calculate the exact p-value in 2*2 tables. It is o.k. for one sided testing but not so exact for two sided testing, where there are different theories about how to do it. The sum of small p-values is the most used method, but there does not seem to be a good rationale for that. Use the fisher exact instead of the Chi-square when you have a small value in one cell or a very uneven marginal distribution. SPRT -- This method of analysis is not often used, which is a pity because it is actually quite good. It is based on the case of phenomena being observed, tested, or data collected, sequentially in time. The testing or data collection is stopped as soon as some upper or lower limit is crossed of the proportion positive or negative events or outcomes relative to the total number observed. Was originally developed to keep the costs of 'destructive' testing low. Is sometimes used in medical trials to monitor the amount of negative side effects and to decide if the trial should be stopped because the number of side effect is considered unacceptably high. Chi-square -- Calculates the Chi-square and some other measures for two dimensional tables CASRO -- Calculates response rates according to different procedures. The CASRO (Council of American Survey Research Organizations) procedure is the 'accepted' procedure for surveys.