A reporting error in Friday's co-published article on county "discrepancies" in the contested Washington election, requires several things: a correction, a retraction, an apology, a restatement of the facts and a firming of the premise laid out in the original article.
The first county that provided information to TJ at AlsoAlso on the subject of ballot discrepancies was Spokane, whose Paul Brandt responded immediately and specifically. Noticing that the discrepancy was the more unusual "voters without ballots," TJ called back and confirmed the numbers which ended up in our original article. We then moved on to collect similar information from other counties across the state.
During today's presentation by Dean Logan in front of King County Council, a by-county comparison of discrepancies listed Spokane with what appeared to be a total of 77. In checking the information, it was discovered that what was initially reported as a 976-ballot shortage was an unadjusted number. The county recorded 1,052 absentee ballots mailed after the deadline, but awarded "voter credit" to those voters so as to maintain them as active registrants. Stripping these "courtesy credits" is necessary to attempt an apples to apples comparison of discrepancy. King County has not used the credit system since 2002. To repeat, the initially reported discrepancy for Spokane was accurate, but we failed to fully refine our question to them, so as to receive the appropriate information.
Spokane's adjusted variance ratio is .04%, much lower than the originally reported .47%. They are in fact one of the counties more able to fully reconcile their tables, particularly given their population size relative to the others. AlsoAlso and Preemptive Karma regret the error, and retract the conclusion that another county had a higher rate of comparable discrepancy than King. So far in our analysis, none has. Allegations of anyone ignoring data for partisan reasons have also been redacted from the original text.
Interestingly, when told over the phone of King's report about Spokane, the Spokane representative clucked a bit and said something to the effect of, "See,that's why we don't recommend trying to compare these things..." Logan's presentation to Council stressed the non-regulatory, unusual circumstance of a "reconciliation" conducted 1/3 on poll night, the rest several weeks later. We have discovered in this exercise the folly of an overly serious look at the ballot/voter variances. To do so is to admit that they are legally or administratively relevant. Comments to the originally posted article have in fact stressed this theme. We clearly have some smart folks here.
Another comment that came up was the request to run a linear regression on the relationship we discussed: the hypothetical that the number of total ballots has a material effect on ballot variance. In light of the changed information, King's total of both ballots and discrepancies were both so far ahead of the rest of the pack, that we knew it would distort our attempts to see an honest correlation, making the relationship appear much stronger than perhaps it is. So we re-ran the Pearson and two non-parametric correlations, and then added a simple linear regression, but without King County's numbers, to be more confident of the results. The adjusted r-squared value remained a very strong .909, meaning roughly speaking that the number of ballots is responsible for about 90% of the variance in ballot/voter discrepancy. The relationship was significant to less than 1 in 1,000, as the possibility the results were random. So despite the change in discrepancy for Spokane, our original conclusions still appear valid, strengthened now by extra research. Those of you who are interested in such things, can find the results of the tests here.
It's certainly unpleasant--to be lauded in print by one such as Orcinus as having followed journalist-worthy practices--and not 48 hours later have to print a retraction. We're not journalists, we're not paid, and we're not subject experts. We merely engage ourselves in a subject in order to present an educated review. But we hope if we are honest and forthright with our missteps, you will stick with us as we continue. Zap and Kevin will surely have some useful and illustrative Scripture here; please leave your own verses or other comment below.
Update 11AM 2/16--
Additionally, it's been pointed out that Yakima's discrepancy is more ballots than voters, not the other way around. That was a simple error of transcription. As the original article indicated, we were not concerned with the direction of the discrepancy in this analysis, since mathematically and functionally it makes no difference.
--TJ and Carla
Those who ignore discipline despise themselves, but whoever heeds correction gains understanding.
also also
Stern discipline awaits those who leave the path,
but those who hate correction will die.
Posted by: Zap | February 15, 2005 at 14:56
Can't liberals just admit when they are wrong and get over it? That has to be the most Super-Clintonesque mea-culpa since the great controversy over what the meaning of IS is. If obfuscation were an Olmpic sport, you'd have the gold for sure.
Posted by: Skippy | February 15, 2005 at 15:36
skippy--would you like to try being more specific?
Posted by: Torridjoe | February 15, 2005 at 15:52
mea culpa. I still love you both and await each and every blog.
Posted by: swatter | February 15, 2005 at 16:05
Unlike Skippy,
I congratulate you on your courageous attempt at fact-finding, appreciate the honesty you displayed above, and look forward to future posts despite my own particular view of political thought as articulated by those on the Left...
Thanks, TJ and Carla.
Posted by: marks | February 15, 2005 at 16:18
I think Skippy is saying if you don't get it right the first time you abrogate your right to free speech. In the illustrious words of Bill O'Reilly, "just shut up!"
This is akin to a meme often expressed on the right about voting: if the voting machine kicks out your vote, sorry you blew it, better luck next time.
Posted by: John | February 15, 2005 at 16:20
Perhaps I should add: And some on the Right...
Posted by: marks | February 15, 2005 at 16:31
TJ and Carla,
Thanks for running the regression and all your work.
Why don't you just publish the correlation coefficients and R2, and let the stat saavy reader (not that there are very many lol) take a look.
Same for the regression coefficients and R2. A graph of the regression line with the data points would be the most informative. The slope of the line would show how strongly variance was related to county size. I'd like to see the regression with the King county data point also. I guess I'm just not convinced that it's inclusion invalidates the regreesion model. If you provided an excel spreadsheet with the values, we might be able to plot the variances against the regression line and see how far out King was. So how many standard errors away was the King data point. More than 2?
After the big discussion today on Horsesass, I'm coming around to the conclusion that any failure of King or any other county to reconcile the variances, as required by statute of regulation, is not in and of itself any grounds for an election challenge. Any error by an election official must be such as to have changed the election result and post election accounting clearly couldn't do that, no matter how embarrassing or inept it might have been.
Only if the variances can be shown to be convincing evidence of improper votes would they be relevant. And while its possible that the overvote variances were improper votes, its just as likely, if not much more likely that it was due to human error by election workers in failing to post names or to barcode pollbooks. So unless more shows up, the variances are not sufficient evidence that illegal votes occurred to be relevant.
Posted by: chew2 | February 15, 2005 at 21:17
Could you please remove the - conservative from this site's name? That would be like the NRA adding Liberal into their name. Who btw, I firmly and financially support. :)
Posted by: Tom | February 15, 2005 at 22:53
Tom if you'd like to fisk me on my conservatism, please do. I'm sure my gun collection is bigger than yours.
Posted by: Zap | February 15, 2005 at 23:59
Chew2, the adjusted r2 was .909. The r2 was something higher, and I think r was .957. To me, it was essentially a draw between the straight linear regression line, and a Lowess sort of upside-down hook, as to which was the best fit. The former did a lot better at the bottom, the latter at the top.
I've got it in SPSS format; I'll convert to HTML and add it to the correction post.
Thanks to all the SoundPolitics folks for stopping by; even if it was for Schadenfreude purposes. You all ought to riffle through some of Zap's posts if you're looking for something more ideologically correct (although not if you're jonesin' for Hugh Hewitt.)
Posted by: Torrid | February 16, 2005 at 00:16
That beeping sound you hear is the reporter backing into an overintellectualized correction, retraction and apology. In the future, try a more simplified, direct approach after you have erroneously charged the next windmill. Just say you were wrong and move on.
Posted by: Rusty | February 16, 2005 at 07:26
TJ,
You said:
"Chew2, the adjusted r2 was .909. The r2 was something higher, and I think r was .957."
This was for the Pearson correlation. What about the regression, especially the Beta slope coefficient? If I recall correctly a simple regression with only 1 independent variable is mathematically the same as a Pearson correlation, except for the addition of the "a" term (y intercept), which as I also recall just adds 1 degree of freedom, so you lose a little bit of information. So your R2 for the regression should be nearly identical to the Pearson R2.
BTW. I think you guys were being way overly appologetic. You got one data point wrong, based on initial misinformation from Spokane. You corrected it. Your overall results and conclusions were not affected by this one data point. That is, the variance was strongly related to county size.
Posted by: chew_2 | February 16, 2005 at 08:00
chew 2--
the beta coefficient was .957, same as the original r, correct. The rest of the test results are here:
http://alsoalso.typepad.com/also_also/files/REVISEDSTATS.HTM
And to you and Rusty--
Ordinarily a generally minor error wouldn't rate quite such a production, I agree. But three things prompted it:
1) The error changed a key conclusion of the analysis--namely, that King wasn't the worst performing county, in absolute terms.
2) In a series that refers to another researcher and critiques his methods and conclusions, it's pretty counterproductive if you don't have all of your OWN ducks in a row.
3) When Sharkansky screws up, the correction is buried in text and almost never acknowledged again. We're better than that. We're not trying to further a political agenda (at least I'm not); we're trying to reach conclusions based on good information.
Posted by: Torridjoe | February 16, 2005 at 08:57
TJ and Karla,
Kudos to you then for be so forthright about correcting your errors.
And thanks for posting the stat results. I realize that I'm not sure what they all mean now!!!
But I did want to point out one thing about King County. What we want to know is whether the amount of variance for King, 1860 or whatever, is a lot more than what we would expect taking into account its large size.
If I'm reading the results right, the regression estimate for the King variance is 1654.49 which is 205.51 less than the actual variance of 1860. The standard deviation of the estimate is 453.074. So the residual or difference between the actual and the predicted is somewhat less than 1/2 of a standard deviation. I don't have a statistical table handy, but I'm betting that this is pretty damn low. That is the King variance is not statistically significantly different than what we would predict based on it's size. (Granted this is for the regression that included King.)
Posted by: chew2 | February 16, 2005 at 12:23
chew2--
Thanks for looking at the numbers. If I'm reading your notes and following the output correctly, the standard deviation of the estimate is 453, when done with King. What I understand to be the proper course, is to eliminate outliers more than 2 SD away from the mean of the rest of the cases. I didn't take the time to compute the mean, because the next highest variance is 282, which means King is over 5 SD away from the next largest case, much less the mean. So I eliminated King and ran them again.
What you're suggesting is that the estimate of King's variance in the model is away from the actual result by less than half of a SD, but that's when King is in the mix, so I'm not sure that's appropriate.
Am I following?
Posted by: Torridjoe | February 16, 2005 at 13:00
TJ,
Yes you are reading me right.
I've forgotten the rule of thumb for when to eliminate an observation. But lets's say you're right, that its 2 st.dev. (I think there were some other tests one could do also, but can't remember what they are.)
The "residual" is the difference between the regression predicted value and the actual value for y.
The printout says the standard deviation of the residuals is 138.026. So since the residual for King is 205.51, it is about 1.5 std.dev.away, and doesn't exceed the 2 std.dev.cut-off So it looks like you could leave King in.
You didn't print out the residuals for the regression without King. But you could print them out and then do the same thing I did for those residuals also. Except that you would have to hand calculate the predicted value for the King variance. This would be be your Beta coefficient (8.238E-04) multiplied times the X value (# of voters), then add the Y intercept value ( -24.431).
The residual for King should be a lot larger though since you didn't include King in the regression model. But it's possible it could still be less than 2 standard deviations away from its predicted value which is my cut off for statistical significance vs. just chance (95% confidence level). But you'd probably want to use a lower level.
One funny thing is that the R2 for the regression without King is lower than the one with King. So King seems to add a lot of information.
Posted by: chew_2 | February 16, 2005 at 15:39
chew2--since I've provided analyses both with and without King, and either way there appears to be a strong relationship, I'll let it lie. However, I did want to comment on your last sentence: my perception is that the lower r2 without King validates the correctness of eliminating King from the analysis. The reason you'd want to take King out, is because its outlier status is so great as to positively skew the correlation value. The further away the outliers are, the stronger the correlation appears. So when I got over .900 as an r2, I was suspicious. The fact that the r2 went down when the outlier was tossed, suggests that I was probably right to exclude it. And the fact that it's still a very high value, suggests there is indeed a relationship.
So, have we bored everyone else to tears yet?
Posted by: Torridjoe | February 16, 2005 at 16:48
TJ,
I misread your proposed test. I gather you said eliminate a data point if it is more than "2 standard deviations" from the mean of the the other data values. I don't think that is a correct test, because then you would force all your data to be close together around the mean value, which isn't required for a regression or any other statistical test.
I think my interpretation makes more statistical sense. The residuals are suposed to be distributed normally with constant variance for a regression model wot work properly. What you need to worry about is a data point whose residual is much larger than the other residuals. So testing the King residual against the standard deviation of all the residuals makes some statistical sense to see how influential it is. And 2 standard deviations is the 95% confidence level cut off.
We might be a little over our heads here. -)
Posted by: chew2 | February 16, 2005 at 17:04
ps.
"The fact that the r2 went down when the outlier was tossed, suggests that I was probably right to exclude it."
Trying to remember my stat is hard, but this sounds like a legitimate argument, that King as a really large value might have had too much influence on your results. Although I think there are additional more accurate tests that you could perform to see whether that is indeed the case, other than looking at the change in R@.
However, you also lose the information provided by the only really large county in your sample. What you needed were other large counties in the sample. Perhaps from other states?
The one stat book I still have around says you don't always discard an influential observation, since it may contain very valuable information. But then goes on to discuss considerations which are too complicated to get into here, and which I don't understand anymore.
In any case good job.
Posted by: chew2 | February 17, 2005 at 15:56