This article follows a previous study on insurance fraud by Dionne and Belhadji (1996). It uses the same data bank. Eighteen companies have contributed to the survey of this study, representing 70% of the Quebec automobile insurance market in 1994. Claim adjusters randomly reopened 2,509 closed files, or 2,772 coverages, to evaluate the significance of insurance fraud. This study was financed by the Insurance Bureau of Canada.
Results from this study showed that 3 to 6.4% of all claim payments contained fraud, representing 28 to 61 million dollars in 1994-1995. This evaluation was a minimum since it was limited to observed fraud only. Their definition of fraud included build-up, opportunistic fraud, and planned fraud (see Weisberg and Derrig, 1993 for a detailed discussion of different fraud definitions; for recent studies on insurance fraud, see the “References” section).
The objective of this paper is to apply a statistical method to estimate the total fraud level in the industry. From the data, investigators found 19 established fraud cases out of the 2,772 coverages, and 123 suspected cases with a degree ranging on a scale from 1 to 10, where 10 means that the case was suspected of having a probability of being fraudulent close to one.
Problems and method
In standard statistical evaluations of a ratio, both the numerator and the denominator are perfectly observable. With some subset of the population one can get a robust estimator of the seeked proportion. The major problem when we have to evaluate the significance of fraud in a given market, is one of estimation.
We cannot find easily a proportion of fraud over all coverages because the numerator of this proportion is hidden information. In other words, we do not know with certainty the value of this numerator even in the sample. Consequently, we have to resort to a count data estimator of some hidden phenomenon. The major statistical problem associated with these estimators is their lack of robustness.
Therefore, our assumption is that the detection process of fraud follows a Bin(n,p), with “n” and “p” being the unknown parameters. A method to estimate these parameters is the Method of Moments. Since there are two parameters in this estimation process, one then needs at least two moments, E[X] and Var[X]. Since our objective is to compute a variance between each group; consequently we need more than one group.
For that reason, one has to use a stochastic process to put the data into a number of sets Sl’ S2′ …, SIC There is a trade-off in the choice of the number of sets (K). When K is large, the moments are more stable and precise. But as K increases, it becomes more difficult to maintain the Binomial assumption that each set has the same Bin(n,p). The “p” parameter does not change, but the more groups we have, the fewer elements we have in each group and hence, the less we can say that there is the same number “n” of total fraud cases in each group.
We first present the results of one experiment done with six sets of 462 coverages. This experiment represents the average of a thousand estimations with the method described above. Each estimation is not stable, but when we take an average of a hundred or so, the results become much more reliable
The second and third columns present the results of the claim adjusters as taken directly from the data bank, and the percentages of fraud proportion. These figures are the observed cardinals of set E plus set (S and F). For example, in the first detection assumption, only 19 cases in the data bank were Established as fraudulent, which yields a 0.69% fraud proportion. These two columns show the results presented in Dionne and Belhadji (1996).