File STAT00.WU

Directory of image this file is from
This file as a plain text file







C APRIL 10 1975 HASSLE STAT PACK LARS PALMER AB HASSLE FACK 431 20 MOLNDAL 1 SWEDEN
HASSLE STAT PACK GENERALS HASSLE STATISTICAL PACKAGES CONSIST OF THREE DISTINCT COMPONENTS: A)2 SIMPLE STATISTICAL PROGRAMS FOR BASAL STATISTICAL ANALYSES WRITTEN IN BASIC (STAT1.BA AND STAT2.BA). B) A MORE ADVANCED AND COMPLETE STATISTICAL PACKAGE WRITTEN IN FOCAL C) THE MOST COMPLETE OF THE PACKAGES; A LARGE FORTRAN IV PACKAGE CONSISTING OF ONE LARGE PROGRAM FOR THE STATISTICAL ANALYSES AND SEVERAL ASSESSORY PROGRAMS SPECIFICALLY FOR MANIPULATION ON THE DATA MATRIX AND FOR ANALYSES OF A MORE COMPLEX NATURE THAT CAN NOT BE COVERED BY THE MAIN PROGRAM. THIS INTRODUCTION COVERS SOME GENERAL ASPECTS WHICH ARE RELEVANT TO ALL OF THE PROGRAMS. ANY DIVERGENCES FROM THIS INTRODUCTION WILL BE SPECIFIED UNDER THE DIFFERENT PROGRAMS. A.THE DATA FORMAT AS IN ALL STATISTICAL ANALYSES WE START WITH A DATA MATRIX CONSISTING OF A NUMBER OF ROWS AND A NUMBER OF COLUMNS. THE ROWS ARE INDIVIDUAL OBSERVATIONS ON THE SAME OR DIFFERENT INDIVIDUALS. THE COLUMNS ARE DIFFERENT TYPES OF OBSERVATIONS. AS A SIMPLE EXAMPLE: LABEL THE COLUMNS LENGTH, HEIGHT, WEIGHT AND LABEL THE ROWS A,B,C ETC FOR EACH INDIVIDUAL. THE NUMBER OF COLUMNS CAN BE AS LOW AS ONE IN WHICH CASE WE CAN ONLY CALCULATE PARAMETERS ON THIS VARIABLE. THE MAXIMUM NUMBER IS DIFFERENT IN THE DIFFERENT PROGRAMS. IF WE HAVE SEVERAL COLUMNS WE CAN ALSO BESIDES THE PARAMETERS CALCULATE STATISTICAL MEASURES BETWEEN THE COLUMNS. B.MISSING DATA ALL ROUTINES (EXCEPT THE BASIC PROGRAMS) HANDLE MISSING DATA. AS A CONVENTION IN ALL THE PROGRAMS MISSING DATA IS SET TO ZERO. AS BOTH FOCAL AND FORTRAN ARE ABLE TO HANDLE VERY SMALL NUMBERS THERE IS NO OBJECTION FOR USING 10E-15 INSTEAD OF A ZERO VALUE. THE FOCAL AND FORTRAN PROGRAMS IN ALL ROUTINES COMPENSATE THE MISSING VALUES. GEN - DATA-PAGE 1
HASSLE STAT PACK GENERALS THIS COMPENSATION CAN BE TREATED IN THREE DIFFERENT WAYS: 1. ON CALCULATING PARAMETERS ALL ZERO VALUES IN THE COLUMN ARE SKIPPED. 2. WHEN COMPARING TWO COLUMNS, E.G. BY T-TESTS, ALL OBSERVATIONS IN BOTH COLUMNS ARE SKIPPED IF THERE IS A MISSING DATA IN ONE COLUMN. 3. IN CERTAIN CASES E.G. 2 WAY ANALYSIS OF VARIANCES THE WHOLE ROW IS SKIPPED IF THERE IS A SINGLE MISSING DATA IN ANY COLUMN. C.END CODES WHEN INPUTTING DATA TO THE PROGRAMS IT IS NECESSARY TO SIGNAL TO THE PROGRAM THAT END OF DATA IS REACHED. THIS IS DONE BY GIVING AN END CODE. THE END CODE IS IN MOST CASES A VERY LARGE NUMBER, BUT OTHER CONVENTIONS MAY IN SOME CASES BE USED. ALSO IN THE DATA FILE EXTREMELY LARGE NUMBERS ARE USED TO SIGNAL END OF DATA TO THE PROGRAM WHEN DOING INPUT FROM THE FILE. THIS NUMBERS ARE AUTOMATICALLY INSERTED BY THE PROGRAMS AND CAN THEREFORE APPEAR ON LISTING OF A DATA FILE EVEN THOUGH IT WAS NOT INPUT FROM THE KEYBOARD. D.DATA FILES THE FOCAL AND FORTRAN PROGRAMS WERE WRITTEN WITH THE GOAL TO PROVIDE PROGRAMS THAT COULD BE USED FOR A LARGE NUMBER OF STATISTICAL ANALYSES EASILY REQUESTED FROM THE KEYBOARD AND AT THAT THE SAME TIME COULD PRESERVE THE DATA THAT ONCE HAD BEEN INPUT. FOR THIS REASON THE PROGRAMS CREATE AND READ FROM DATA FILES. THE CREATION OF THE FILES IS DESCRIBED IN THE INDIVIDUAL PROGRAM MANUALS. THE FILES ARE NOT COMPATIBLE, I.E. IT IS NOT POSSIBLE TO CREATE A PROGRAM FILE BY A FOCAL PACKAGE AND READ IT WITH A FORTRAN OR VICE VERSA. HOWEVER, THE STRUCTURE OF THE FILES IS REASONABLY COMPATIBLE AND IT WOULD NOT BE VERY DIFFICULT TO WRITE A CONVERSION PROGRAM BETWEEN THE FILES. SUCH A PROGRAM IS, HOWEVER, NOT INCLUDED IN THE PACKAGE. GEN - DATA-PAGE 2
HASSLE STAT PACK GENERALS E.HEADINGS IN THE FORTRAN PACKAGE THERE IS ALSO A CAPABILITY OF INCLUDING HEADINGS TO THE MATERIAL. THE DATA FILE HOLDS ONE HEADING (72 CHARACTERS LONG) FOR THE WHOLE MATERIAL AND ONE HEADING (18 CHARACTERS) FOR EACH OF THE SUB GROUPS. THESE HEADINGS ARE COPIED ONTO THE OUTPUT LIST BY THE STATISTICAL PROGRAM. IN SOME CASES, E.G. IN THE ANOVA PROGRAM, THE HEADINGS ARE USED FOR SPECIAL PURPOSES. F.VARIOUS ANALYSES IN THE PACKAGES ALL THREE PROGRAM PACKAGES ARE BASED ON A LIST OF DIFFERENT STATISTICAL ANALYSES WHICH HAVE BEEN KEPT AS FAR AS POSSIBLE THE SAME IN ALL THE PROGRAMS. THIS IS COMMENTED ON HERE AND SOME GENERAL COMMENTS ARE MADE AS TO THE FORMULAS USED IN THE VALUES ANALYSES. NUMBER B F F(BASIC,FOCAL,FORTRAN) -1 X X X LISTS THE INDATA. IN SOME CASES ALSO CONTAINS CORRECTIONS ROUTINES. 0 X X X ALWAYS FETCHES NEW DATA. EITHER FROM KEYBOARD OR FROM THE INPUT DATA FILE. ACTUAL INTERPRETATION OF ZERO VARIES SLIGHTLY BETWEEN THE PROGRAMS. 1 X X X CALCULATION OF MEANS, STANDARD DEVIATION AND STANDARRD ERROR ACCORDING TO STANDARD FORMULAS. THE DEVISOR FOR STANDARD DEVIATION IS IN ALL CASES N= -1. 2 X X X T-TESTS. THREE FORMULAS FOR T-TESTS ARE IMPLEMENTED IN THE PROGRAM. THESE ARE: A) STANDARD T-TEST COMPARING THE MEANS OF THE TWO COLUMNS CALCULATING ACCORDING TO THE FORMULA (I). B) T-TEST WITH THE PRESUMPTION THAT THE VARIANCES IN THE DIFFERENT GROUPS ARE NOT EQUAL (FORMULA II). ANALYSES-PAGE 3
HASSLE STAT PACK GENERALS C)T-TEST ON PARIED VALUES. IF ONE VALUE IN A ROW IS MISSING SO IS THE ROW SKIPPED IN THE CALCULATION (FORMULA III). 3 X X X REGRESSION LINES CALCULATED ACCORDING TO STANDARD FORMULAS (SEE REF. S&R PAGE 7) FROM WHICH TEST DATA FOR REGRESSION LINES IS TAKEN. 4 - X X THE NORMAL REGRESSION LINE PRESUMES THAT X IS MEASURED WITHOUT ERROR. IF THIS IS NOT THE CASE, I.E.IF X AND Y ARE RANDOM VARIABLES, IT IS NOT POSSIBLE TO USE THE NORMAL REGRESSION LINE. THERE ARE ARE SOME SPECIAL WAYS OF SOLVING THIS PROBLEM. THE ONE USED HERE IS THE SO CALLED BARTLETTS METHOD OF REGRESSION. SEE REFERENCE S&R PAGE 7 FROM WHICH ALSO THE TEST DATA IS TAKEN. 5 - X X CORRELATIONS EXPRESSED AS CORRELATION MATRIX. IF THERE ARE MISSING DATA THE CORRELATION IS ONLY CALCULATED FOR THE COMPLETE ROW. 6 - X X ANALYSIS OF VARIANCE, IN THE TEXT USUALLY ABBREVIATED TO ANOVA. 7 - - X SHEFFE CONTRASTS. THE SHEFFE CONTRASTS IN BOTH THE FORTRAN AND FOCAL PROGRAMS IS AN ADOPTATION OF THE PROGRAM DECUS FOCAL NO 8-66. THE SHEFFE CONTRASTS IS CALCULATED AS A COMPLETE MATRIX ON COLUMNS AGAINST EACH OTHER FOR THE GIVEN LEVELS OF SIGNIFICANTS. FOR FURTHER DETAILS SEE THE FOCAL AND FORTRAN PROGRAMS (SEE ALSO ANALYSIS 36). 7 - X - COCHRANS & BARTLETTS TESTS. 8 - X X ANALYSIS OF VARIANCES TWO-SIDED. CALCULATED ACCORDING TO NORMAL FORMULAS. THE FOCAL PROGRAM REFUSES TO ACCEPT ANALYSIS 8 IF MISSING DATA IN THE MATRIX. THE FORTRAN PROGRAM SOLVES THIS PROBLEM BY SKIPPING EVERY ROW IN WHICH THERE IS A MISSING DATA. UNSYMMETRIC TWO-WAY ANALYSIS OF VARIANCES CAN BE SOLVED BY THE PROGRAM UNSYM WHICH IS INCLUDED IN THE FORTRAN PACKAGE. 13 - X X WILCOXON MATCHED PAIRS RANK SIGN TEST. ANALYSES-PAGE 4
HASSLE STAT PACK GENERALS PROGRAMMED EXACTLY FROM SIEGEL (SEE REF. S PAGE 7) BOTH FOR THE LARGE AND THE SMALL CASES AND CORRECTIONS FOR TIES (TEST DATA SIEGEL PAGES 82 RESP. 79). 14 - X X MANN-WHITNEY U-TEST. THE COMMENTS IN 13 APPLY HERE TOO. (TEST DATA SIEGEL PAGE 122.) 15 - X X KRUSKAL-WALLIS ONE WAY ANALYSIS OF VARIANCES BY RANKS. PROGRAMMED FROM SIEGEL (TEST DATA SIEGEL PAGE 187). 16 - X X SPEARMAN RANK PROGRAMMED DIRECTLY FROM SIEGEL (TEST DATA PAGE 206). 17 - X X FRIEDMAN TWO-WAY NON-PARAMETRIC ANALYSIS OF VARIANCES (SIEGEL TEST DATA PAGE 171). 18 - - X THE NUMERICAL INTEGRATION PERFORMED FOR LABORATORY DATA. THE NUMERICAL INTEGRATION IS PERFORMED BOTH BY A METHOD OF OVERLAPPING PARABOLAS AND BY THE TRAPEZE METHOD. 19 - - X CALCULATE MEDIANS AND RANKS OF THE GIVEN COLUMN. 20 - - X CROSS TABULATION WITHIN ONE COLUMN AS X AXIS AND ONE AS Y AXIS. COUNT HOW MANY FALL IN THE INTERVALS. 21 - - X SCATTER PLOT. ACTUALLY THE SAME INFORMATION AS IN 20 GIVEN AS A PLOT OF X AGAINST Y. 21 - X - REARRANGEMENT OF DATA MATRIX. 22 - X X VARIOUS CONVERSIONS SUCH AS CONVERTING ONE COLUMN TO THE LOGARITHMS OF ANOTHER GIVEN COLUMN. ANALYSES-PAGE 5
HASSLE STAT PACK GENERALS 36 - X - SHEFFE (SEE 7) 37 - X - CHI-TWO 47-49 - X - FILE MANIPULATION 48 - - X COMMENTS 49 - - X FLAG SETTINGS 50 - X X END OF ANALYSIS EXIT THE PROGRAM. ANALYSES-PAGE 6
HASSLE STAT PACK GENERALS G.REFERENCES THE FOLLOWING ARE REFERRED TO AT VARIOUS PLACES, AND DATA FROM THESE BOOKS ARE USED AS TEST DATA IN THE STATDA.DA FILE. S.SIEGEL (S) NONPARAMETRIC STATISTICS FOR THE BEHAVIORAL SCIENCES 1956 (ANALYSES NOS. 11-17) R SOKAL & F J ROHLF BIOMETRY 1969 (S&R) (ANALYSES NOS. 4,6,8) W DIXON & F MASSEY INTRODUCTION TO STATISTICAL ANALYSIS (D) 1969 (ANALYSES NOS. 2,3,5) P DAVIS & P RABINOWITZ NUMERICAL INTEGRATION 1967 (D&R) (ANALYSIS NO. 18). DECUS FOCAL 8-66 ANALYSIS NO.7 & 36 H.TEST DATA TO THE FOCAL AND FORTRAN PROGRAMS THERE EXISTS A TEST DATA FILE WHICH CONTAINS DATA TO TEST ALMOST ALL FUNCTIONS OF PROGRAMS. THIS TEST DATA IS TAKEN FROM VARIOUS SOURCES. 3,4 S&R 6,8 D 12-17 SIEGEL I.MAIN DIFFERENCES THE PRINCIPLE DIFFERENCE BETWEEN THE THREE PROGRAMS - THE BASIC, FOCAL AND FORTRAN PROGRAMS - LIES IN THE IMPLEMENTATION OF THE HANDLING OF THE DATA. THE BASIC PROGRAM INCLUDES SOME VERY SIMPLE PROGRAMS THAT READ THE DATA FROM THE KEYBOARD BUT DO NOT SAVE THE DATA. ONLY SUMS AND PRODUCT SUMS ARE ACCUMULATED AS REQUIRED FOR THE ANALYSES ON WHICH ARE THEN CALCULATED BY VARIOUS STATISTICAL METHODS ON REQUEST. GENERALS-PAGE 7
HASSLE STAT PACK GENERALS THE FOCAL PROGRAM IS MORE SOPHISTICATED. IT FETCHES DATA FROM THE KEYBOARD THAT SAVES THE DATA IN CORE AND ON REQUEST ANALYZES ON IT AND WRITES IT TO THE OUTPUT FILE. DUE TO THIS CONSTRUCTION AND DUE TO THE FOCAL PROGRAM LANGUAGE THE FOCAL PROGRAM IS LIMITED TO SOME ABOUT 100 VARIABLES AND IS SLOW IN EXECUTION. THERE ARE, HOWEVER, CASES WHERE THE FOCAL PROGRAM DEFINITELY IS TO PERFER BEFORE THE FORTRAN, E.G. IN EDUCATIONAL ENVIRONMENTS WHEN MANY PEOPLE WITH LITTLE COMPUTER TRAINING USE STATISTICAL PACKAGES. IT IS DEFINITELY SIMPLIER TO USE THAN THE FORTRAN PACKAGE. THE FORTRAN PACKAGE IS IMPLEMENTED SLIGHTLY DIFFERENTLY. IT CONSISTS OF THREE PROGRAMS: 1)OUTLAY WHICH READS DATA FROM THE KEYBOARD AND CREATES THE DATA FILE 2)KSORT WHICH IS AVAILABLE FOR VARIOUS REORGANIZATIONS OF THE DATA FILE AND PERFORMS CONVERSIONS OF THE FILE (WHICH IS MORE COMPLEX THAN THOSE THAT ARE IN THE FOCAL PACKAGE) AND 3)STAT, THE STATISTICAL PACKAGE. THIS ARRANGEMENT IS ON THE WHOLE NOT SUITABLE FOR DATA INPUT FROM KEYBOARD BUT SPECIFICALLY DESIGNED FOR FILE INPUT. THIS MEANS THAT AT LEAST TWO AND SOMETIMES THREE PROGRAMS HAVE TO BE INVOKED TO DO A STATISTICAL ANALYSIS, A THING WHICH, AT LEAST FOR THE UNFAMILIAR COMPUTER USER, IS MORE DIFFICULT THAN USING THE FOCAL PROGRAM. HOWEVER, THE ADVANTAGE OF THE FORTRAN PROGRAM IS A MUCH LARGER REPERTOIRE OF STATISTICAL ANALYSES AVAILABLE, A MUCH LARGER AREA (AT THE EXPENSE OF UTILIZING 12 K OF CORE) AND A MUCH FASTER EXECUTION. THE LATTER IS TRUE IN A NON-FPP CONFIGURATION AND OF COURSE MUCH MORE SO IN A SYSTEM HAVING A FLOATING POINT PROCESSOR. GENERALS-PAGE 8



Feel free to contact me, David Gesswein djg@pdp8online.com with any questions, comments on the web site, or if you have related equipment, documentation, software etc. you are willing to part with.  I am interested in anything PDP-8 related, computers, peripherals used with them, DEC or third party, or documentation. 

PDP-8 Home Page   PDP-8 Site Map   PDP-8 Site Search