Date: Tue, 12 Aug 1997 23:49:10 GMT lukrite@REMOVESPAM.dillons.co.uk "SPSSX(r) Discussion" Ed UUNet UK server (post doesn't reflect views of UUNet UK) Monte-Carlo simulation, am I doing it correctly??? Please help, thank you. To: STAT-L@VM1.MCGILL.CA, EDSTAT-L@JSE.STAT.NCSU.EDU

I have a question with regards to the use of a monte carlo simulation and whether I am using it correctly. My knowledge of stats is somewhat sketchy, please bare with me.

Say that you have 100 numbers, the distibution of which is normal, about a mean of zero. The null hypothesis is that the 100 numbers are random. You then build a model that tries in some way to predict these numbers. It makes predictions of positive or negative.

The model makes 100 predictions (obviously) and makes 50 positive predictions and 50 negative predictions. The model's accuracy, as far as sign(+-) is concerned, right half the time, 50:50 (as you might expect from a random predictor).

However, you notice that when the model predicts a postive number and it is right, that number is a large (relatively speaking) positive number and when it is wrong the 'wrong' number is a 'low' negative number. The opposite is the case for the negative predictions (the correct neg predictions are 'large' negative numbers and the incorrect negative predictions are 'small' positive numbers). This seems to suggest that while the model cannot predict the sign of the numbers, it is having some success predicting the magnitude.

You're trying to find out how well the model is predicting the larger numbers above what might be expected to be random.

This is what I did: With Excel I simulated random 10,000 'predictions' of the numbers, constrained so that there were always 50 positive predictions and 50 negative ones (as this is what the model did). On each run through the data I computed the sum of all the negative predictions and the same for all the positive predictions.

I then took the absolute value of those two sums and added them together to get a 'score' (The higher the 'score' the more large numbers were correctly predicted (randomly, in this case) for both -ve and +ve predictions. I did this 10,000 times.

I then made a histogram of these scores and plotted the result of the model on it, finding that the model's results were above the 90% quartile, above 2 standard deviations. The graph was a nice normal dist.

My question is, is this experiment correctly 'specified'?

Also, if the model had, for example, made 40 positive predictions and 60 negative predictions should the montecarlo simulation constrain the random 'predictions' to 40 +ve and 60-ve in order to make a fair comparison with the models predictions???

Any help would be appreciated. Note that the values for the numbers that I am actually using are not necessary normally distributed.

Ed

