Below are some thoughts about the design of statistical dialogs. They are by no means definitive, or even correct.

GUI should be as simple as possible, but no simpler.

There is definite value in resisting the urge to include every possible eventuality or user need into a single dialog. You certainly don't want this to happen. A UI should as simple and streamlined as possible, with commonly used elements at the forefront, and little used tweaks hidden (perhap in a sub-dialog).

That said, going too simple can paradoxically increase complexity. If you narrow the use of a single dialog to include only a very specific function, for example loading am SPSS dataset, then you will be forced, later down the road, to create dialogs for each specific function that was not included (i.e. for Stata, SAS, and csv data files). This can lead to a proliferation of menu items, where only one was needed (Load Data).

Organize implemented procedures by task.

Think about a user sitting down to use your software. What are they trying to accomplish? What is his/her goal?

A function call is not a goal. Users don't sit down to do a t-test. A t-test is the chosen procedure to accomplish the task of comparing two distributions. t-tests, Mann-Whitney, Kolmogorov-Smirnov, etc., all relate to the same task of comparing two distributions, so they should be put in the same dialog.

Don’t cover all tasks, but covered tasks should be (almost) comprehensively covered.

It is better to cover one task very well then to cover 10 tasks poorly. It is the matter of 10 minutes to create a dialog to create a scatter plot of two variables. But it is something quite different to add options for:

  • Plotting symbols
  • Color
  • Paneling (grid and wrap)
  • Regression lines (smooth, polynomial and linear)
  • Labels / Title
  • Transparency
  • Preview

Don’t restrict the user to analyzing one variable at a time

If the user is likely to want to do the same action on several variables. Don't make them go through the dialog once for every variable. Make the dialog such that the action can be applied to multiple variables. If the action is an analysis format the results into a nice table (see multi.test).

Make output human readable

Format results into easy to read tables.

Don’t make the user come to you

Test your GUI on multiple platforms in multiple consoles.

  • Cross-platform is good
  • All consoles should be supported

Don’t hide the console or otherwise get in the way

You are not the owner of R, the user is.

Help should be easy to find.

Add help buttons. Feel free to add pages to this manual for your dialogs. The password for editing is the primes between 4 and 12 with no spaces. I hate spammers.

Try resizing your dialog.

Don't be SPSS prior to 2006. Let the user resize your dialog.

Dialogs MUST have memory

If a dialog does not remember it's settings the last time it was run, it is nearly useless. Data analysis is an iterative process, it is very rare that that the user will specify exactly the right set of options the first time.



Analysis design considerations

In the creation of a statistical GUI the analysis philosophy of the author is necessarily imposed on the software. By choosing what to include, what options to make default, and where to place those options, the author guides the default behavior of the user. Below are some decisions that I made that have a bearing on how Deducer is used. Many of these are open to debate.

All analyses should have a visualization.

Humans are visual creatures and best understand data when it is presented in a visual manner. This helps both in the understanding of results, and in the diagnosis of possible assumption violations.

Mid p-values are better than standard p-value for exact and monte carlo tests

Standard 'exact' p-values are slightly conservative. The mid p-value is a minor modification that maintains an alpha level closer to the nominal level.

Type II SSQ > Type III SSQ

R uses type II sum of squares where as most other packages use type III (SAS, SPSS).

You probably don’t want to use a hypothesis test to detect assumption violation

If you have a small sample size, you have no power to detect even major violations. If you have a large sample size you will almost surely find a statistically significant violation, even if the magnitude is small.