Drop Test Pass/Fail

This is how I feel about these tests. No one is saying that every scope will pass or fail based on the testing of one. But if the failure rate is low then it is unlikely that the one you test will fail. Probabilities.

It’s not the test or tester’s stated intention, however the general voice of the forum parrots results in a way that seems to suggest any scope that passes has superpowers. Those scopes are often recommended by folks who haven’t even used them and/or certainly haven’t tested them themselves.

It’s a lot to put on a test of one.
 
  • Like
Reactions: BLJ
Wow that’s a little extreme and you’re comparing apples to oranges. We aren’t talking about planes crashing

1.
One catastrophic failure can be meaningful—but it depends on context.
  • If the circumstances of failure are well-understood and clearly linked to a design flaw, then yes—one failure can reveal a critical issue.
    Example: A structural wing failure on a test flight due to design, not pilot error or extreme weather.
  • But if the cause is unclear, or the failure could be due to user error, outlier conditions, or even a fluke, then you do need more data before making strong claims about the overall design.
2.
Catastrophic outcomes demand higher scrutiny, not necessarily conclusions.

You can absolutely justify immediate concern or action after one serious failure, but whether you declare the entire system flawed depends on whether you’ve isolated the root cause.
  • In your Boeing example, even a single crash would ground the prototype pending investigation.
  • But whether the entire plane design is scrapped would depend on the outcome of that investigation, not just the crash itself.
3.
Practical vs. statistical significance.

You might be saying: “I don’t need a statistical study to trust my gut when something fails spectacularly.” That’s totally valid from a consumer or reviewer standpoint. If a piece of gear breaks badly on its first trip—especially under normal use—you’re justified in being skeptical and even in warning others.

But from an engineering or scientific standpoint, people would want repeatability and causal clarity before calling the design a failure across the board.

Just saying…that from a data standpoint you can’t draw anything conclusive from this.
I see where you are coming from, but it is important to understand that the conclusions made by form are often accompanied by many other examples from other testing and running the shoot2hunt schools.
 
Before Form's tests I just took the approach of calling up some dealers I know and asking them which scopes have the highest RMA warranty claims. Not only was there a definite trend that matches empirical evidence gathered here, but also certain models had much higher RMA from within a brand than other models. I don't know if the dealers I spoke with are seeing tens of thousands of scopes, but certainly they are selling in the hundreds or perhaps thousands from a maker and know these things.
 
@fwafwow , I hear what you are saying, I guess its the ability to quickly compare different aspects as a tool to identify what to look for in the evals that is helpful. I know for a fact a lot of people dont read that much--I do, but even I get lost in it sometimes, get distracted, etc. Example--the credo 3-9. This scope may have been a fail (I dont recall what is a fail vs a "partial pass")--but looking at this it sure would be a shame if the credo 3-9 and the leupold vx3 were both simply recorded as "fail"? Is this format too complicated? I personally think it provides additional clarity, allows comparisons at a glance, helps target what to look into deeper, but still offers a very quick, easy to digest format.

View attachment 639033

Pretty cool to see the Arken passed. But I could see it wearing out. But dang for $350 the two I have are pretty nice.

Ha I guess I shouldn't drop the Athlon I have on the 10/22


Sent from my iPhone using Tapatalk
 
Before Form's tests I just took the approach of calling up some dealers I know and asking them which scopes have the highest RMA warranty claims. Not only was there a definite trend that matches empirical evidence gathered here, but also certain models had much higher RMA from within a brand than other models. I don't know if the dealers I spoke with are seeing tens of thousands of scopes, but certainly they are selling in the hundreds or perhaps thousands from a maker and know these things.
This is not a safe assumption to make unless the retailer you are talking to is actually looking at failure RATE, ie the ratio of total number of failures versus total sold, based on actual data. It highlights the differences between where you need a ratio to say something, and where you really dont, so worth pointing out in this context.
I guarantee most customer service people at a retailer are not going to be looking at data, they are going to base their reaction to this question on their feel for what they see the most. at best they are going to track returns, but rarely will they compare that to total sales to get a rate. Very few retailers actually track return rates.
Example, one of our customer service people came to me a month or so ago saying based on warranty replacements they had provided recently they thought there was a high return rate on a specific product we make, and they were worried there was a problem we needed to address. So we sat down and looked at it with numbers. They had indeed offered several warranty replacements in quick succession, more-so than any other item. However, it was right at the "later-middle" part of the season when the majority of product use was going to happen on a winter ski-season item (ie the period of highest breakage), and based only on that season's sales combined with the totqal number of returns for the year, the return rate was less than 1%. This is the whole reason for compliling data and relying on it, while NOT relying on gut-reaction, because unless you have the big-picture visibility you dont have enough info to say one way or another.
Since it always comes up , also remember that despite that^^, it's still true that if you objectively evaluate 1 specimen, it is unlikely to fail unless there is a high failure rate, and if you get 2 failures in a row it is almost a certainty that there is an extremely high rate of failure...because with a low failure rate of even 5% (which is MUCH higher than the products I'm familiar with fail), you still have a 95% NON-FAILURE rate, so the odds of getting 2 failures in a row are extremely low. Perhaps someone can calculate the actual odds of failing 2 consecutive evalautions if a product has a 5% failure rate, 10%, 20%, etc...suffice to say its EXTREMELY unlikely if the failure rate is truly low.
 
It’s not the test or tester’s stated intention, however the general voice of the forum parrots results in a way that seems to suggest any scope that passes has superpowers. Those scopes are often recommended by folks who haven’t even used them and/or certainly haven’t tested them themselves.

Good point, but forum parrots are good for clicks and web traffic. Especially when they fly off to other forums.

And who doesn't want to be part of a clique? 🤪
 
Sheeples much….? This is interesting data, but you can hardly draw any conclusions from this.

How many scopes were tested for each make and model? This is a biased sample size and nothing conclusive can be drawn from a spread sheet with pass or fail.

Once again, this data is useless with such a small sample size. You can’t test 1, 2 or even 10 scopes and say that it is representative of the entire population….you need hundreds if not thousands of samples for each scope. People should definitely be testing this stuff for themselves and not believing everything they read just because someone tested a few scopes.

I understand your frustration, but I think you are highlighting aspects that are outside the scope of what Formi has laid out.

Large sample sizes are obviously not realistic for this program, and if you go back and read some of Formi's posts he's been careful to call these scope "evals" and not "tests". That was smart, and I would say highly commendable.

He did that well before it even became a thing on Rokslide (i.e. posted it on other forums). Many members believe that "drop testing" started at Rokslide - it's been going on well before Rokslide was ever created.

Anyway, I would just think of "drop tests" as a gut-check, spot test, or simply a consumer's proof of concept (prior to incorporating a scope onto a platform) rather than a scientific study.

How is it useful?

You're looking for solid designs. If one sample of a certain model passes, you assume that the actual mechanical design is sound. At least for that application. You perform ongoing monitoring, and ideally add more samples. That's pretty straight forward.

However, if one sample of a certain model fails, then it gets more complicated. If it confirms a previous bias, one may see little value in doing more evaluations. Especially if the track record from other reliable sources support that decision.

If subsequent samples continue to fail, then why continue? Root cause might be design and/or build quality, but who cares?

If instead, some of the samples pass and some fail then you need to look into the build quality. Obviously the design is sound, given that some samples passed.

That's where statistics becomes most relevant - characterizing that one particular model that has some fails. We'll need sufficient samples, repeatable methods, thorough documentation, blah, blah, and blah. But is it worth it, to determine a failure rate? Maybe, but I think time/money would be better invested in other designs.

Limitations...

Unfortunately, we are somewhat stuck relying on track records. So if a certain model passes, ideally with multiple samples, and we have corroborating information from other reliable sources, then it's about as good as it gets without more time/money.

Some people may remember when Frank Galli stated that Nightforce failed the least, Leupold the most, and S&B somewhere in between. How many samples, under what conditions, and what measures? Who knows, but his observation can still be valuable, even taken with a grain of salt. Just remember, things can change!

There's a group of people that would trust their life with a DMR/HDMR anywhere in the world, even today with newer options. I'd be interested in failure rates, infant mortality, and other measures, but some of those scopes saw absolute hell and created fans for life. I can't quantify that, but respect the track record and sources. Especially given the application.
 
How is it useful?

You're looking for solid designs. If one sample of a certain model passes, you assume that the actual mechanical design is sound. At least for that application. You perform ongoing monitoring, and ideally add more samples. That's pretty straight forward.

However, if one sample of a certain model fails, then it gets more complicated. If it confirms a previous bias, one may see little value in doing more evaluations. Especially if the track record from other reliable sources support that decision.

If subsequent samples continue to fail, then why continue?
I may be misreading your post, but I think both you and the person you are responding to are missing a critical point. That is that regardless of outcome MOST individual scopes are likely to react similarly in a relatively standardized eval, so a single data point does tell you something useful, and 2 tells you quite a lot, and that for my purposes that level of confidence is plenty.

A scope with a 1% failure rate has a 99% chance of passing on any individual test. So with lots of tests, only 1 out of every 100 will fail, ie the odds of having that scope fail on your first eval are about 1%. A scope with a 1% failure rate that fails 2 CONSECUTIVE tests is 1% of 1%, or 0.01%, ie you would expect 2 consecutive failures once every 10,000 evals. Even a 10% failure rate (ie 1 out of every 10 scopes fails), the odds of getting 2 consecutive failures is only 1 tenth of 1%, ie you might expect that to happen by chance once every THOUSAND evals.

Which means if you get 2 consecutive failures the probability alone makes it very likely that scope model has a MUCH higher failure rate than 1%, and probably also much higher than 10%. And those are not good odds in my book.
With that^^ in mind, you’ll notice that in several of the most contentious evals two examples were checked (and both failed). You’ll also note that in almost all cases of passing evals there are numerous crowd-sourced “confirmation” evals.

So yes, evaluating 1 individual that results in a failure isnt a certainty by any stretch, but it tells you what is LIKELY to happen if you tested a lot more examples. And if you evaluate 2 samples and both fail? Yes, you could just be really, really, really, really REALLY “lucky”. But the odds are against that. In this case the odds very strongly suggest that scope model will fail at a very high rate—and since I have other good options I DONT NEED certainty, that probability is plenty for me to stay away until I have better info. With nothing to lose and everything to gain, someone tell me why exactly I should care about certainty in this case, anyway? Sorry, its on the scope manufacturer to convince me at this point, Im not the FAA, Im just a guy looking to decrease my chances of a scope failure.
 
I may be misreading your post, but I think both you and the person you are responding to are missing a critical point. That is that regardless of outcome MOST individual scopes are likely to react similarly in a relatively standardized eval, so a single data point does tell you something useful, and 2 tells you quite a lot, and that for my purposes that level of confidence is plenty.

A scope with a 1% failure rate has a 99% chance of passing on any individual test. So with lots of tests, only 1 out of every 100 will fail, ie the odds of having that scope fail on your first eval are about 1%. A scope with a 1% failure rate that fails 2 CONSECUTIVE tests is 1% of 1%, or 0.01%, ie you would expect 2 consecutive failures once every 10,000 evals. Even a 10% failure rate (ie 1 out of every 10 scopes fails), the odds of getting 2 consecutive failures is only 1 tenth of 1%, ie you might expect that to happen by chance once every THOUSAND evals.

Which means if you get 2 consecutive failures the probability alone makes it very likely that scope model has a MUCH higher failure rate than 1%, and probably also much higher than 10%. And those are not good odds in my book.
With that^^ in mind, you’ll notice that in several of the most contentious evals two examples were checked (and both failed). You’ll also note that in almost all cases of passing evals there are numerous crowd-sourced “confirmation” evals.

So yes, evaluating 1 individual that results in a failure isnt a certainty by any stretch, but it tells you what is LIKELY to happen if you tested a lot more examples. And if you evaluate 2 samples and both fail? Yes, you could just be really, really, really, really REALLY “lucky”. But the odds are against that. In this case the odds very strongly suggest that scope model will fail at a very high rate—and since I have other good options I DONT NEED certainty, that probability is plenty for me to stay away until I have better info. With nothing to lose and everything to gain, someone tell me why exactly I should care about certainty in this case, anyway? Sorry, its on the scope manufacturer to convince me at this point, Im not the FAA, Im just a guy looking to decrease my chances of a scope failure.
Haha just to be clear—I wasn’t mad when I posted this. I was honestly just trying to stir the pot a little and see what kind of thoughtful responses it would bring out—a bit of friendly trolling. And yours (along with the previous replies) definitely delivered—really solid points and a well-reasoned take.

I do agree that two failures in a row can raise serious concerns, and I totally respect the logic of minimizing risk when there are other solid options available. That said, I think where we might differ is in how much confidence to place in such a small sample size.

What happens if you test two more scopes and both pass? Suddenly you're at 50/50, and things start to look a lot less clear. That’s the tricky part—when the sample size is that small, a few extra data points in either direction can completely change the story. Without a broader set of data, it’s hard (at least for me) to feel confident we’re seeing a reliable pattern versus just randomness or variability.

So I totally get your stance—and in many ways, it’s a smart and cautious approach. I just lean more toward holding off on judgment until there’s more consistent evidence across a wider set. Either way, I really appreciate your perspective—it’s helping push the discussion in a good direction.
 
This is not a safe assumption to make unless the retailer you are talking to is actually looking at failure RATE, ie the ratio of total number of failures versus total sold, based on actual data. It highlights the differences between where you need a ratio to say something, and where you really dont, so worth pointing out in this context.
I guarantee most customer service people at a retailer are not going to be looking at data, they are going to base their reaction to this question on their feel for what they see the most. at best they are going to track returns, but rarely will they compare that to total sales to get a rate. Very few retailers actually track return rates.
Example, one of our customer service people came to me a month or so ago saying based on warranty replacements they had provided recently they thought there was a high return rate on a specific product we make, and they were worried there was a problem we needed to address. So we sat down and looked at it with numbers. They had indeed offered several warranty replacements in quick succession, more-so than any other item. However, it was right at the "later-middle" part of the season when the majority of product use was going to happen on a winter ski-season item (ie the period of highest breakage), and based only on that season's sales combined with the totqal number of returns for the year, the return rate was less than 1%. This is the whole reason for compliling data and relying on it, while NOT relying on gut-reaction, because unless you have the big-picture visibility you dont have enough info to say one way or another.
Since it always comes up , also remember that despite that^^, it's still true that if you objectively evaluate 1 specimen, it is unlikely to fail unless there is a high failure rate, and if you get 2 failures in a row it is almost a certainty that there is an extremely high rate of failure...because with a low failure rate of even 5% (which is MUCH higher than the products I'm familiar with fail), you still have a 95% NON-FAILURE rate, so the odds of getting 2 failures in a row are extremely low. Perhaps someone can calculate the actual odds of failing 2 consecutive evalautions if a product has a 5% failure rate, 10%, 20%, etc...suffice to say its EXTREMELY unlikely if the failure rate is truly low.
Thanks for sharing that detailed perspective—it’s a really important distinction between anecdotal impressions and data-driven conclusions. I absolutely agree that relying solely on gut feeling or limited warranty return counts without context can lead to misleading assumptions.

That said, I’d argue that while comprehensive failure rates based on total sales are ideal, they’re often not available to most consumers or even many retailers. In those cases, individual evaluations and small sample testing do serve as meaningful proxies—especially when we’re dealing with products where failures have a significant impact.

Regarding the math behind consecutive failures: yes, the probability of two consecutive failures happening by chance ifthe failure rate is truly low is small. But real-world conditions are rarely ideal or perfectly random. Manufacturing inconsistencies, handling damage, or even batch-specific issues can increase localized failure likelihood without reflecting the full product line’s overall failure rate.

Also, when you test two samples that both fail in independent evaluations, it raises a red flag precisely because it’s unlikely to be coincidence—but it doesn’t automatically mean the entire model or production run has a catastrophic failure rate. It means more investigation is warranted, not necessarily that you have a definitive failure rate.

So while I agree the best approach is large-scale, data-driven failure rates, I think it’s important not to dismiss small-sample testing outright—especially when buyers don’t have access to comprehensive stats and need to make real-time decisions about risk.

Ultimately, both approaches are valuable—data for big-picture confirmation, and small samples as early warning signals. And honestly, the best thing people can do is test it for themselves. I think we all need to stop taking everything at face value and get hands-on experience where possible. That’s how you really know what you’re dealing with.
 
So I totally get your stance—and in many ways, it’s a smart and cautious approach. I just lean more toward holding off on judgment until there’s more consistent evidence across a wider set. Either way, I really appreciate your perspective—it’s helping push the discussion in a good direction.

So while I agree the best approach is large-scale, data-driven failure rates, I think it’s important not to dismiss small-sample testing outright—especially when buyers don’t have access to comprehensive stats and need to make real-time decisions about risk.
I think we’re actually agreeing more so than disagreeing. I am saying that the small-scale evals done here ARE valuable, both because the probability suggests they will align with larger-scale testing, AND because in the (very unlikely) event they dont, I have nothing to lose by staying away from scopes that fail an eval. Assuming the tests are consistent (I think clearly the biggest variable in this case) even at 50:50 the odds still suggest very strongly there is a problem with that scope being able to reliably sustain heavier use without failing. Obviously larger-scale testing would provide a more quantitative look, I just dont think thats necessary to accomplish what I want, and as you said no manufacturer is sharing this info and very few entities even track it in a useable way (and they arent sharing it either). So as consumers we have the choice of paying attention to an eval like this in its context, or going with zero objective info at all. I dont see any other option.

I’ve personally had 4 scopes lose zero consistently by around 2moa. 3 were the same brand/model in different magnifications. It was infuriating and has caused me to have a chip on my shoulder on the topic, but that has made me very willing to rely on this probability because it aligns precisely with my personal experience, and because I have nothing to lose by erring on the side of caution. IMO the ONLY people who have anything to gain by dismissing the evals without a huge data set is the manufacturers making unreliable scopes.
 
This is not a safe assumption to make unless the retailer you are talking to is actually looking at failure RATE, ie the ratio of total number of failures versus total sold, based on actual data. It highlights the differences between where you need a ratio to say something, and where you really dont, so worth pointing out in this context.

I understand. Certain vendors just sell a lot more scopes than others so even identical failure rates can appear to be much larger simply because they outsell 10:1 vs. others and get noticed. But, because there is no way the average shooter can access a sample size of thousands of scopes some use of anecdotes and reputation need to be considered.

I went to an F-Class match once and at the time every single rifle at the match ran NF scopes. It was 100% by my count in the rack and firing line. I thought they surely all must be sponsored by NF as the reason. I asked the shooters why they were the most common scope and the answers always were along the lines of they were the most reliable based on their experience, and no they weren't sponsored. Of course NF scopes do have issues, but to the shooters that spend a lot of time and money traveling and shooting they just weren't interested in taking the risk. None of these shooters are tracking the kind of data we're talking about here, but I don't think ignoring the collective wisdom is a good idea.
 
I understand. Certain vendors just sell a lot more scopes than others so even identical failure rates can appear to be much larger simply because they outsell 10:1 vs. others and get noticed. But, because there is no way the average shooter can access a sample size of thousands of scopes some use of anecdotes and reputation need to be considered.

I went to an F-Class match once and at the time every single rifle at the match ran NF scopes. It was 100% by my count in the rack and firing line. I thought they surely all must be sponsored by NF as the reason. I asked the shooters why they were the most common scope and the answers always were along the lines of they were the most reliable based on their experience, and no they weren't sponsored. Of course NF scopes do have issues, but to the shooters that spend a lot of time and money traveling and shooting they just weren't interested in taking the risk. None of these shooters are tracking the kind of data we're talking about here, but I don't think ignoring the collective wisdom is a good idea.
Sure, but context doesnt just matter, it’s everything. “The collective wisdom” writ large would steer everyone to whatever is the most popular, and my personal experience says thats a mistake. It also contradicts what those f-class folks in your example say. There has to be some consideration of the source AND WHAT’s BEHIND THAT SOURCE. In your example you are talking about a high-level competitive venue that is highly equipment-dependent—it is scored, and it is precise, and the result is visible for anyone to see if they’re there in person. That’s a very different proposition than suggesting people rely heavily on a internet forum where “mine has never lost zero”—WITHOUT any detailed info of whether that was actually checked and how that was checked, or whether an individual handles and uses their equipment in a way that is similar to you—is what passes for wisdom. In other words there is no MEASURE of any claim. To me, you are entirely correct, its just that the context required to vet any claim is often not possible to get. Sponsorship and the prize table also cloud some competitive venues. Most of the visible people in the conversation have a financial interest in the outcome. The result is that the “common wisdom” that is accessible to most people is highly suspect in most cases. Hence the desire for something objective thats cuts through the context.
 
Back
Top