By
popular demand, an extended discussion of the gory analytical details...
confuseling wrote:You asked if anyone wanted to see your working. Yes please.
BACKGROUND
The GR vote was structured as a rank-ordering of five choices. These are ordinal data and need to be treated as such. To be clear: I'm saying that any attempt to demote these data to nominal, or to promote them to interval, is an act of intellectual dishonesty, deliberate or otherwise.
The five choices can be broken down into two discrete and non-overlapping subsets: three choices that take a technical position on systemd/gratuitous init coupling (S/GIC), represented as:
(1) MUST not
(2) SHOULD not
(3) MAY
And two choices that do not take a technical position, summarized as:
(4) "I don't want to talk about this."
(5) "Let's talk about this some more."
Again, each vote was a
ranking of all five options, which means that each DD's vote can be represented as a vector in 5-dimensional space.
In case it isn't obvious, both #4 and #5 are identical in that they are talking about the individual's perceived need for dialog, be it
further or
any. But they do
not affect that person's
relative preferences for #1-#3. At all.
Thus #4 and #5 are said to be
orthogonal to #1-#3. To build an analytical dataset, we separate out the individual's opinions about the need for dialog from his/her opinions about S/GIC (and thanks to their orthogonality, without altering the relative rankings of either subset in any way).
CLEANUP
Now, the resulting data is going to be harder to read if we don't take a moment to re-rank the votes. Re-ranking also does not affect the data in anyway. Watch.
Take two example vectors:
12345
14532
When we separate out the editorial opinions about dialog, we are left with:
123
145
In case it isn't obvious at first glance, these two votes express identical rankings of the three technical choices (that is: MUST not/SHOULD/not/MAY). Because these data are ordinal, not interval, re-ranking the data doesn't alter the results of any appropriate analytical procedure. So our vectors now read:
123
123
tl;dr This is an entirely cosmetic change that does not affect the analysis in any way.
Update July 2016
The original post provided a full summary of the analytical results, but lacked sufficient procedural detail to allow replication. In an effort to permit any motivated investigator to run/replicate this analysis, the following portion of this post has been updated and expanded.
NOTE: Small differences in inclusion/exclusion criteria have altered the total count of votes slightly. But the results remain unaffected.
Begin updated content
A total of 483 GR votes were cast, but not all of those votes are analytically usable.
Examples of unusable votes include:
- Votes that "ranked" multiple technical choices identically. (A "vote" for everything is the same as a "vote" for nothing.)
- Votes that failed to assign an explicit rank to each of the three technical choices (rendering ordinal analysis impossible).
- NOTE: Although some might argue that "no vote" is basically the same as "low vote," replicable analysis is not and cannot be based on individual claims of clairvoyance. If the person actually casting the vote can't be bothered to express a simple, clear preference among three distinct choices, then there's really no point in trying to guess what s/he "really" meant, much less in trying to compare, for instance, a 42413 vote to a -2213 vote.
In all, some 125 of the GR votes (26% of the total) were unusable under these criteria.
ON WITH THE SHOW
The easiest way (at least for me) to think about the re-ranked data is as a series of three-digit codes that tells us the individual voter's preference among the three technical options.
Each digit of the 3-digit code represents the individual's preference among the three alternatives. The first digit position is the relative ranking of "MUST not," the second digit is the relative ranking of "SHOULD not," and the third digit is the ranking for "MAY." So a code of
123 would correspond to an extreme contra-S/GIC position (MUST not/SHOULD not/MAY), and a code of
321 would represent the opposite pole (MAY/SHOULD not/MUST not).
Aside: In case it's not obvious, we've just seamlessly deobfuscated the data by reducing what was once a huge 5-D matrix down to a handful of discrete points. Dimensionality reduction is a
fascinating topic for anyone who's actually enjoying reading this.
Okay, so there are six possible "states" for each vote, but some of those "states" seem nonsensical or contradictory on their face. For example, a vote of either
132 (MUST not/MAY/SHOULD not) or
231 (MAY/MUST not/SHOULD not) would seem to be best explained by a typo. Happily, these contradictions occur in only 8 votes (6 for
132 and 2 for
231); because they are so few in number, the 8 affected data points may be included in or excluded from the analysis without affecting the ultimate results. (For simplicity's sake, this analysis excludes these 8 points.)
Ultimately, the 350 votes in the analytical dataset can be represented using the remaining four points. Let's have a look:
As I've grown tired of repeating, the two "extremes" (the ??3s and the ??1s) are easily visible in the data, and they are nearly identical in size. They serve mostly just to cancel each other out. As is often the case in political elections, it's the folks in the middle who end up making the decision.
In this instance, those "moderates" are the 312s: folks who think that S/GIC is a Genuinely Bad Idea, but who simultaneously think that a(ny) blanket prohibition is also a Genuinely Bad Idea. (In my current mood, I do not concede their point, not even grudgingly.)
But here's the thing and there's just no getting around it: the 312s chose
SHOULD NOT as their
first choice. They pointedly and explicitly say that S/GIC is a bad idea, their only quibble is over Just How Bad.
To recap the quote from above:
The 312's basically wrote:...a requirement for a non-default init system will mean the software will be unusable for most Debian users and should normally be avoided.
(Emphasis added)
"This a bad idea and should be avoided in all but the most exceptional circumstances" is in no way a pro-S/GIC stance, nor is it even slightly equivocal on the question. And
any attempt to suggest otherwise is nothing less than an egregious attempt to distort/misrepresent the historical facts. And that's exactly what this thread exists to combat.
On technical merits, S/GIC lost, 206-to-144. Really. Which is precisely why "we don't even need to talk about this" was pure cowardice.
QE-friggin-D
Thus endeth the lesson.
End updated content
Now, @confuseling, for the third time... about that billion-dollar value proposition? I think I've been pretty patient about it. But it has been over a month since I first asked, and it's getting to the point where continued lack of response seems like you tacitly admitting that, y'know, there isn't one. Maybe publish your current draft just to shut me up. Or hey, point me to a quantitative cost justification published/produced by someone else. I just want to see the data, I'm not picky about authorship.
(Sincere and legitimate analytical questions responded to. Trolling of any stripe pointedly ignored.)
Edit: For the benefit of folks who popped into this post from the link I added to the initial post, here's a handy "return" link back:
http://forums.debian.net/viewtopic.php? ... 52#p570367