« Busted | Main | Dr. Daniel Sosin, Sharkansky's Next Target »

February 15, 2005



Those who ignore discipline despise themselves, but whoever heeds correction gains understanding.

also also

Stern discipline awaits those who leave the path,
but those who hate correction will die.


Can't liberals just admit when they are wrong and get over it? That has to be the most Super-Clintonesque mea-culpa since the great controversy over what the meaning of IS is. If obfuscation were an Olmpic sport, you'd have the gold for sure.


skippy--would you like to try being more specific?


mea culpa. I still love you both and await each and every blog.


Unlike Skippy,

I congratulate you on your courageous attempt at fact-finding, appreciate the honesty you displayed above, and look forward to future posts despite my own particular view of political thought as articulated by those on the Left...

Thanks, TJ and Carla.


I think Skippy is saying if you don't get it right the first time you abrogate your right to free speech. In the illustrious words of Bill O'Reilly, "just shut up!"

This is akin to a meme often expressed on the right about voting: if the voting machine kicks out your vote, sorry you blew it, better luck next time.


Perhaps I should add: And some on the Right...


TJ and Carla,

Thanks for running the regression and all your work.

Why don't you just publish the correlation coefficients and R2, and let the stat saavy reader (not that there are very many lol) take a look.

Same for the regression coefficients and R2. A graph of the regression line with the data points would be the most informative. The slope of the line would show how strongly variance was related to county size. I'd like to see the regression with the King county data point also. I guess I'm just not convinced that it's inclusion invalidates the regreesion model. If you provided an excel spreadsheet with the values, we might be able to plot the variances against the regression line and see how far out King was. So how many standard errors away was the King data point. More than 2?

After the big discussion today on Horsesass, I'm coming around to the conclusion that any failure of King or any other county to reconcile the variances, as required by statute of regulation, is not in and of itself any grounds for an election challenge. Any error by an election official must be such as to have changed the election result and post election accounting clearly couldn't do that, no matter how embarrassing or inept it might have been.

Only if the variances can be shown to be convincing evidence of improper votes would they be relevant. And while its possible that the overvote variances were improper votes, its just as likely, if not much more likely that it was due to human error by election workers in failing to post names or to barcode pollbooks. So unless more shows up, the variances are not sufficient evidence that illegal votes occurred to be relevant.


Could you please remove the - conservative from this site's name? That would be like the NRA adding Liberal into their name. Who btw, I firmly and financially support. :)


Tom if you'd like to fisk me on my conservatism, please do. I'm sure my gun collection is bigger than yours.


Chew2, the adjusted r2 was .909. The r2 was something higher, and I think r was .957. To me, it was essentially a draw between the straight linear regression line, and a Lowess sort of upside-down hook, as to which was the best fit. The former did a lot better at the bottom, the latter at the top.

I've got it in SPSS format; I'll convert to HTML and add it to the correction post.

Thanks to all the SoundPolitics folks for stopping by; even if it was for Schadenfreude purposes. You all ought to riffle through some of Zap's posts if you're looking for something more ideologically correct (although not if you're jonesin' for Hugh Hewitt.)


That beeping sound you hear is the reporter backing into an overintellectualized correction, retraction and apology. In the future, try a more simplified, direct approach after you have erroneously charged the next windmill. Just say you were wrong and move on.



You said:

"Chew2, the adjusted r2 was .909. The r2 was something higher, and I think r was .957."

This was for the Pearson correlation. What about the regression, especially the Beta slope coefficient? If I recall correctly a simple regression with only 1 independent variable is mathematically the same as a Pearson correlation, except for the addition of the "a" term (y intercept), which as I also recall just adds 1 degree of freedom, so you lose a little bit of information. So your R2 for the regression should be nearly identical to the Pearson R2.

BTW. I think you guys were being way overly appologetic. You got one data point wrong, based on initial misinformation from Spokane. You corrected it. Your overall results and conclusions were not affected by this one data point. That is, the variance was strongly related to county size.


chew 2--
the beta coefficient was .957, same as the original r, correct. The rest of the test results are here:

And to you and Rusty--
Ordinarily a generally minor error wouldn't rate quite such a production, I agree. But three things prompted it:

1) The error changed a key conclusion of the analysis--namely, that King wasn't the worst performing county, in absolute terms.

2) In a series that refers to another researcher and critiques his methods and conclusions, it's pretty counterproductive if you don't have all of your OWN ducks in a row.

3) When Sharkansky screws up, the correction is buried in text and almost never acknowledged again. We're better than that. We're not trying to further a political agenda (at least I'm not); we're trying to reach conclusions based on good information.


TJ and Karla,

Kudos to you then for be so forthright about correcting your errors.

And thanks for posting the stat results. I realize that I'm not sure what they all mean now!!!

But I did want to point out one thing about King County. What we want to know is whether the amount of variance for King, 1860 or whatever, is a lot more than what we would expect taking into account its large size.

If I'm reading the results right, the regression estimate for the King variance is 1654.49 which is 205.51 less than the actual variance of 1860. The standard deviation of the estimate is 453.074. So the residual or difference between the actual and the predicted is somewhat less than 1/2 of a standard deviation. I don't have a statistical table handy, but I'm betting that this is pretty damn low. That is the King variance is not statistically significantly different than what we would predict based on it's size. (Granted this is for the regression that included King.)


Thanks for looking at the numbers. If I'm reading your notes and following the output correctly, the standard deviation of the estimate is 453, when done with King. What I understand to be the proper course, is to eliminate outliers more than 2 SD away from the mean of the rest of the cases. I didn't take the time to compute the mean, because the next highest variance is 282, which means King is over 5 SD away from the next largest case, much less the mean. So I eliminated King and ran them again.

What you're suggesting is that the estimate of King's variance in the model is away from the actual result by less than half of a SD, but that's when King is in the mix, so I'm not sure that's appropriate.

Am I following?



Yes you are reading me right.

I've forgotten the rule of thumb for when to eliminate an observation. But lets's say you're right, that its 2 st.dev. (I think there were some other tests one could do also, but can't remember what they are.)

The "residual" is the difference between the regression predicted value and the actual value for y.

The printout says the standard deviation of the residuals is 138.026. So since the residual for King is 205.51, it is about 1.5 std.dev.away, and doesn't exceed the 2 std.dev.cut-off So it looks like you could leave King in.

You didn't print out the residuals for the regression without King. But you could print them out and then do the same thing I did for those residuals also. Except that you would have to hand calculate the predicted value for the King variance. This would be be your Beta coefficient (8.238E-04) multiplied times the X value (# of voters), then add the Y intercept value ( -24.431).

The residual for King should be a lot larger though since you didn't include King in the regression model. But it's possible it could still be less than 2 standard deviations away from its predicted value which is my cut off for statistical significance vs. just chance (95% confidence level). But you'd probably want to use a lower level.

One funny thing is that the R2 for the regression without King is lower than the one with King. So King seems to add a lot of information.


chew2--since I've provided analyses both with and without King, and either way there appears to be a strong relationship, I'll let it lie. However, I did want to comment on your last sentence: my perception is that the lower r2 without King validates the correctness of eliminating King from the analysis. The reason you'd want to take King out, is because its outlier status is so great as to positively skew the correlation value. The further away the outliers are, the stronger the correlation appears. So when I got over .900 as an r2, I was suspicious. The fact that the r2 went down when the outlier was tossed, suggests that I was probably right to exclude it. And the fact that it's still a very high value, suggests there is indeed a relationship.

So, have we bored everyone else to tears yet?



I misread your proposed test. I gather you said eliminate a data point if it is more than "2 standard deviations" from the mean of the the other data values. I don't think that is a correct test, because then you would force all your data to be close together around the mean value, which isn't required for a regression or any other statistical test.

I think my interpretation makes more statistical sense. The residuals are suposed to be distributed normally with constant variance for a regression model wot work properly. What you need to worry about is a data point whose residual is much larger than the other residuals. So testing the King residual against the standard deviation of all the residuals makes some statistical sense to see how influential it is. And 2 standard deviations is the 95% confidence level cut off.

We might be a little over our heads here. -)



"The fact that the r2 went down when the outlier was tossed, suggests that I was probably right to exclude it."

Trying to remember my stat is hard, but this sounds like a legitimate argument, that King as a really large value might have had too much influence on your results. Although I think there are additional more accurate tests that you could perform to see whether that is indeed the case, other than looking at the change in R@.

However, you also lose the information provided by the only really large county in your sample. What you needed were other large counties in the sample. Perhaps from other states?

The one stat book I still have around says you don't always discard an influential observation, since it may contain very valuable information. But then goes on to discuss considerations which are too complicated to get into here, and which I don't understand anymore.

In any case good job.

The comments to this entry are closed.

April 2006

Sun Mon Tue Wed Thu Fri Sat
2 3 4 5 6 7 8
9 10 11 12 13 14 15
16 17 18 19 20 21 22
23 24 25 26 27 28 29


Blog powered by Typepad