A More Informative Approach

by J. C. W. Rayner, D. J. Best, P. B. Brockhoff and G. D. Rayner.

This website is an additional resource for users of the book
*Nonparametrics for Sensory Science: A More Informative Approach*, recently released by
Blackwell Publishing.

Specifically, this website will contain updated errata for the book as well as the latest versions of software written by the authors to implement their techniques. This software is illustrated by application to selected examples from the book.

This website, as well as the software and and information contained within, or referred to by this website, is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. This website may be changed or updated at any time.

To run software used to perform statistical analyses from the book, scroll down to the third heading
**Examples** below where detailed instructions are provided.

Please address any comments, errors or suggestions about this site to pbb@imm.dtu.dk

We have chosen to use R as the language/environment for developing our statistical software. It is straightforward to download and install R for a wide variety of computer systems. Alternatively, a remote instance of R called Rweb can be used that does not require R to be installed on your computer.

R is a very powerful system for statistical computation and graphics. It consists of a language plus a run-time environment with graphics, a debugger, access to certain system functions, and the ability to run programs stored in script files. The R-Project homepage contains all the information you will need to download, setup, and use R with most PC's.

If you prefer, instead of setting up R on your computer you can go to the Rweb site and just type or paste your R commands in there. If you are trying to run one of the examples below, DON'T FORGET TO INCLUDE THE PROGRAM DEFINITION CODE AHEAD OF THE EXAMPLE CODE each time you run an analysis. Note that each time you press the "submit" button on the Rweb page to process some input, then on the computer running this webpage R is opened, run on this input, then R is closed - therefore any input or results will not be remembered after the "submit" button is pressed.

R is very similar to the S language/environment. Many statisticians will have heard of the value-added version of S sold by Insightful Corporation as S-PLUS (see the Insightful S-PLUS page for further information). Most programs in S-PLUS can be ported to R with only cosmetic changes if any. The R-Project homepage contains very detailed information about the differences between R and S-PLUS.

The following is a selection of examples from the book
*Nonparametrics for Sensory Science: A More Informative Approach*
where the results have been generated using programs developed in the R language/environment.
Each example contains a link to both the R code used (generally entering the data and applying
programs to it) as well as a separate link showing the results that should be generated if this
code text is pasted into an R dialog window (along with the program definitions of course).

Each of these examples uses programs/scripts that R needs to know about before it can perform the analysis. Make sure the PROGRAM DEFINITIONS are pasted into your R session dialog window before attempting to run the examples below.

- First click here to open another webpage for Rweb. Once you have opened the Rweb page you will need to move between this webpage and the one you have just opened. Locate the big box near the top of the Rweb webpage and the "Submit" button underneath it. DON'T PRESS THIS BUTTON until the very end of ALL these instructions!
- Now click here to open another page for the program definitions, select all the text in this page, copy this program definition text, then move to the Rweb webpage you opened earlier, and paste the text into the big box near the top of the Rweb webpage. Once you have done this you can close the webpage you have just now opened (the one containing program definitions text).
- Next choose one of the examples below, and click on the relevant code link for this example (DON'T FORGET you need to press the "Back" button on your web browser to get back to these instructions). Copy all the example code text, then move to the Rweb webpage you opened earlier, and paste the text into the big box near the top of the Rweb webpage (take care to paste it BELOW the text you pasted there earlier).
- Finally, press the "Submit" button on the Rweb webpage (located underneath the big box near the top of the Rweb webpage where you have been pasting text). After a short delay, all the code which you pasted in earlier is printed in a window with your analysis results following it - you will probably need to scroll down to see the end.

- 7.3 Examination Mark Example code and result
- 7.5 Cordial Drink Preference Example code and result
- 7.7 Radioactive Counts Example code and result
- 7.8 Milk Bacteria Example code and result
- 7.9 Fat and Protein Content Example code and result

Note that for a few of the examples above the data x is transformed to x*10 or x+1 before analysis (eg examples 3.6, 4.6.2, 6.2 and 6.3). This is because the rank table program requires whole numbers (integers greater than zero) as input.

Similarly, for example 4.2.1 the text explains that the ranks are inverted - that is, the lowest rank of 1 is given to the highest score and the highest rank of 4 is given to the lowest score. In addition, the input data x is fractional. To produce rank equivalent whole number data, the original data x is transformed to 10*(20-x) prior to analysis.

In section 3.5 of the book where the ties in the Tomato example data are randomly assigned, in addition to the ties noted in the text (for consumers 4 and 12) there is also a tie for consumer 22 between the Florade and Momotaro varieties. For this consumer, the tie was eliminated by ranking Momotaro above Florade. Note in the output above for this example both the broken tie data as well as the original data are analysed to produce results that do not materially disagree.

In section 6.2 of the book (table 6.5 dealing with wine example 6.2) the total number of degrees of freedom is mis-stated. As the output above for wine example 6.2 shows, because the relevant U matrix for this data is not of full rank, then 4 degrees of freedom are "lost" in the analysis. Refer to the discussion near the end of section 3.6 and the paper:

Brockhoff, P. B., Best ,D. J. & Rayner, J. C. W. (2004). Partitioning Anderson's Statistic for Tied Data.Journal of Statistical Planning and Inference, No. 121, p93-111.

In section 6.6 the book incorrectly states that "For the hot chips data S takes the value 14.9 on 8 degrees of freedom with a p-value 0.06...". As the example 6.7 output shows, the correct value for the CMH analysis Extended Stuart Test Statistic applied to this data is 12.18, which with 8 degrees of freedom corresponds to a p-value of 0.14. The conclusion in the text is largely unchanged though once again the Anderson statistic is more sensitive.

In section 7.3, the Examination Mark dataset example, the statistic V_{3} is given as negative when
it should be positive. In addition, the reader should understand that the p-value of 0.045 given for
S_{4} is in fact correct (it has been obtained via parametric bootstrap) - this is
because S_{4} only asymptotically has the χ_{4} distribution.
For the small sample size in this example (n=20) the asymptotics are not yet reliable.

About half way through section 7.5 (just after figure 7.2) two orthogonal polynomials are defined,
the first should be g_{1}(i) rather than g_{i}(i).

In the third paragraph of section 7.6 the text reads
"To get a statistic with an approximate chi-squared distribution with m-q degrees of freedom we should use
(X_{P}^{2}-Sum[V_{r}^{2},r=1,...,m-1]).
Often Sum[V_{r}^{2},r=1,...,m-1] is negligible, but this needs checking for each data set."
In these expressions the upper limit of the sums is meant to be q rather than m-1, so the expressions
should be (X_{P}^{2}-Sum[V_{r}^{2},r=1,...,q]) and Sum[V_{r}^{2},r=1,...,q].

In section 7.7 the correct value residual is R=4.48 (as shown in example 7.7) rather than 3.49. For this case, where the parameter estimate is essentially found by fitting location, we recommend "attributing" the zero df component to location (as in the second GOF analysis shown in example 7.7) which produces a p-value of 0.81141559 though the conclusion is essentially unchanged wherever this is attributed.

In section 7.8 the X_{P}^{2}=6.859 value has p-value of 0.65 rather than 0.81
as in the text (see output for example 7.8).
Also a few expected values in table 7.2 (bacterial cell counts) are incorrect.
This table should be:

Observed | Expected | |

0 | 56 | 60.88 |

1 | 104 | 90.76 |

2 | 80 | 85.16 |

3 | 62 | 64.12 |

4 | 42 | 42.32 |

5 | 27 | 25.58 |

6 | 9 | 14.49 |

7 | 9 | 7.85 |

8 | 5 | 4.09 |

9 | 3 | 2.07 |

10 | 2 | 1.02 |

11 | 1 | 1.66 |

Total | 400 | 400 |

In section 7.9 the components V_{r,s} are defined incorrectly -
the divisor should be sqrt(n) rather than n to give
V_{r,s}=Sum[g_{r}(y_{i,1})g_{s}(y_{i,2})/sqrt(n),r=1,...,n]

We welcome information about potential misprints etc, please email details of any such information to pbb@imm.dtu.dk

Last updated: 5 April, 2005.