I have personally have seen one of the Nightforce scopes "fail" that "passed" this test on a competitors rifle. Is that definitive? I also saw an Arken fail on the same day as well as rings from a company all love here. Using the sample size of one the drop test is wrong. That would be ridiculous statement since my sample size is one! Virtually every scope that has 'Passed" has failures listed here on RS. Even a never dropped SWFA had a reticle failure. Hopefully not the Maven 1.2 because I have several of them now.

The godgfather of the drop test once replied in a snarky way comparing his testing to the FAA and the NTSB. It was not worthy of a reply because it was ridiculous. My son works in aerospace and the button you push to recline your seats gets a more through CONSISTENT testing than scopes here. The NTSB test multiple vehicles in the exact way. We have all seen the cars on the track hitting the barrier. They all impact in exactly the same way. They test several IDENTICALLY. One is not tested on Tuesday in June at 80 degrees with a stick on the accelerator driven off a cliff at Bucks ranch and the other tested in January at 10 degrees on. I don't know but maybe the grease or lube is sticky when cold. Do we know the answer to that? It probably isn't but do we know? If so tells us and tell us why we know that. Have we all forgot "your groups are too small?" The same tester talks fancy about the WES calculator and we "all suck" as well and group size over 20 shot groups. You are going straight to HE double toothpicks if you dare think he may not have been perfect. Do you think if a tester shot a 20 shot group rather than 10 shot group it would be a larger group? Statistically the group would be larger and invalidate two separate 10 shot groups used in the drop as a comparison. It's a shame because the basic premise is great and if we were not locked in emotionally and started controlling variables the big hitters would pay attention.
Heck the drop test may even be spot on! Right here there was a post about Maven and its already legendary 1.2 on a podcast and they(Maven) dismissed the test and hurt feelings. They also said the internals were the same as the 1.0 that did not do well on testing. What?! No way F sakes! I was standing at a competition as a judge with the Nightforce and Leupold reps when a competitor asked them about the drop test. They both chuckled and ignored it and gave it no time. Remember Leupold uses the "Punisher" to replicate impacts and Nightforce uses the Rubber pad we have all seen. What they both share is they use a collimator to see if there is a shift. This eliminates the human, BC variables from projectile to projectile, ES, SD and so on.
It does not make me crazy when someone does not like Leupold. It makes me crazy when someone sells gear that has been treating them well because a "hunter" that shoots his 10 rounds a year says scope A failed when they were sighting in their high power magnum.
There's a lot in this post all packed together, but I think you are off target in a few places. Many of these points are also brought up by posters above.
1) First, you're packing ideas into my quoted post that arent there. I said the scope evals show LESS than what many people give them credit for. I am saying they are NOT designed to be definitive--there's a reason they are called "evals" not "tests", and that has been made explicitly clear many times. So Im not sure why you are asking if anything is "definitive". Of course it's not definitive. Of course people have seen nightforce, or whatever scopes fail--no one ever has said that scopes that passed the eval never fail, nor have they said that scopes that fail the eval will always fail. The ONLY claim has been that scopes that pass seem to correlate with scopes that fail less in heavy use, and scopes that fail seem to correlate with scopes that fail more in heavy use--no attempt has been made to quantify a failure rate beyond "likely higher" or "likely lower", and even this has been hedged by attempting to verify with multiple examples, either crowd-sourced or otherwise. Individuals may make statements more encompassing than this, but those statements are not supported by the tests contained in the scope eval forum--I think it's important to separate the actual evals and what they can show from the broad "colloquial" statements made by people attempting to generalize from them. My post that you quoted was actually saying that a leupold scope that doesnt fail in use is not a "contradiction" of the eval results, and nor are the eval results claiming that no one can possibly have a Leupold scope that functions properly...and at the same time, that a person with a scope that functions properly doesnt "refute" the eval results.
It is not only possible, but highly likely, that both things can be true at the same time--this is exactly what you'd expect.
2) The NTSB crash tests on automobiles
dont test a statistically significant sample of cars--they use a couple examples to test the design,
assuming that virtually all the cars off the production line will behave similarly, and they extrapolate these few results to the millions of cars that will be sold over the next years. Tests are useless if they are too cumbersome to be realistic or too cumbersome to be performed at all.
As someone above said "the perfect is the enemy of the good"--if we dont test something at all becasue we'll only accept a "perfect" answer, then we are worse off than if we evaluate semi-objectively and take those results with a caveat. All standardized testing I have been involved with--that includes EN, UIAA and ASTM testing of life safety equipment, ski equipment, climbing equipment, avalanche safety equipment, fabric testing, etc,--make concessions that involve extrapolating from smaller sample sizes, frequency of testing, how they test, etc, in order to make the tests practical. Extrapolating from a smaller sample size is common practice even among "truly standardized" tests, and does not render a test invalid. Statistically speaking if you are looking for a 1-in-a-million failure, the odds of getting that failure on that particular tested example should be extremely low, while the odds of a pass should be extremely high.
If you DO get a failure, especially if you get two in a row, that statistically suggests your assumption that failures are rare is likely wrong--that's a critical difference.
If a scope is truly going to fail at a low rate, then what are the ODDS of it failing twice in a row? When someone says they saw a XX brand scope fail at a match, we should all yawn. But, when the same person has seen multiple failures, that's different and becomes more significant. When you get ten people in a room and more than one of them has sent in
multiple of the same scope for problems...even though the total number is low, based on the
odds of any one scope failing it is entirely correct from a statistical point of view to flag that as a "likely to be a legit problem".
3) “quick and dirty” experiments are used all the time and have real value despite their drawbacks, because you can still make conclusions, identify problems and do it all without the unrealistically high demands of a true standardized testing procedure. small sample sizes and semi-scientific experiments have legitimate value and are a critical part of legitimate "testing" used in many industries worldwide by real, live engineers.
The OP posted that his leupold worked based on the fact he was able to make a shot on a pig, the title of the thread using that one example to poke fun at the common perception that there is a high failure rate with Leupold scopes. The post I quoted was also in essence claiming that
becasue their scope had been used and worked, it wasnt possible there was a high failure rate at all. In both cases we have an anecdotal claim (cumulatively a bunch of them), with zero quantification, that their scopes worked as expected, with the unsaid part of the post clearly understood to be "becasue their one or a couple examples worked, there cant possibly be any widespread problem with any of them". That's just not how the world works though. First, we have zero info to suggest that they actually measured anything--let alone using a large sample size of shots, etc--all we know is that "my gear worked to shoot a pig/deer/goat/chupacabra/whatever". But
the critical part that is missing is that NONE of this is definitive. The most unrealistic result in all of this would be that either 100% of examples of a brand would fail for all users, or that no examples would fail for any users. In both cases that is so unrealistic as to be off the table.
The ONE thing we should all EXPECT to see is some variability, with some users reporting problems and some users reporting no problems. So my quoted post was an attempt to say this--that the fact that some people
dont have problems doesnt indicate one way or another if there is a high failure rate; and that if there is indeed a high failure rate, it doesnt at all mean that all examples will fail.
All I'm saying is that it's entirely possible for there to be BOTH a high failure rate, AND for some people's scopes to work fine, and that it's not only possible, it's highly likely that both of those things would be true at the same time.