|Date: ||Tue, 12 Aug 1997 23:49:10 GMT|
|Sender: ||"SPSSX(r) Discussion" <SPSSX-L@UGA.CC.UGA.EDU>|
|From: ||Ed <lukrite@REMOVESPAM.DILLONS.CO.UK>|
|Organization: ||UUNet UK server (post doesn't reflect views of UUNet UK)|
|Subject: ||Monte-Carlo simulation, am I doing it correctly??? Please help,
IF REPLYING TO THIS MESSAGE PLEASE REMOVE THE 'REMOVESPAM. ' LINE FROM
MY EMAIL ADDRESS.
I have a question with regards to the use of a monte carlo simulation
and whether I am using it correctly. My knowledge of stats is somewhat
sketchy, please bare with me.
Say that you have 100 numbers, the distibution of which is normal,
about a mean of zero. The null hypothesis is that the 100 numbers are
random. You then build a model that tries in some way to predict these
numbers. It makes predictions of positive or negative.
The model makes 100 predictions (obviously) and makes 50 positive
predictions and 50 negative predictions. The model's accuracy, as far
as sign(+-) is concerned, right half the time, 50:50 (as you might
expect from a random predictor).
However, you notice that when the model predicts a postive number and
it is right, that number is a large (relatively speaking) positive
number and when it is wrong the 'wrong' number is a 'low' negative
number. The opposite is the case for the negative predictions (the
correct neg predictions are 'large' negative numbers and the incorrect
negative predictions are 'small' positive numbers). This seems to
suggest that while the model cannot predict the sign of the numbers,
it is having some success predicting the magnitude.
You're trying to find out how well the model is predicting the larger
numbers above what might be expected to be random.
This is what I did:
With Excel I simulated random 10,000 'predictions' of the numbers,
constrained so that there were always 50 positive predictions and 50
negative ones (as this is what the model did). On each run through the
data I computed the sum of all the negative predictions and the same
for all the positive predictions.
I then took the absolute value of those two sums and added them
together to get a 'score' (The higher the 'score' the more large
numbers were correctly predicted (randomly, in this case) for both -ve
and +ve predictions. I did this 10,000 times.
I then made a histogram of these scores and plotted the result of the
model on it, finding that the model's results were above the 90%
quartile, above 2 standard deviations. The graph was a nice normal
My question is, is this experiment correctly 'specified'?
Also, if the model had, for example, made 40 positive predictions and
60 negative predictions should the montecarlo simulation constrain the
random 'predictions' to 40 +ve and 60-ve in order to make a fair
comparison with the models predictions???
Any help would be appreciated. Note that the values for the numbers
that I am actually using are not necessary normally distributed.