Isolation Forest (iForest) stands out as a widely-used unsupervised anomaly detector, primarily owing to its remarkable runtime efficiency and superior performance in large-scale tasks. Despite its widespread adoption, a theoretical foundation explaining iForest's success remains unclear. This paper focuses on the inductive bias of iForest, which theoretically elucidates under what circumstances and to what extent iForest works well. The key is to formulate the growth process of iForest, where the split dimensions and split values are randomly selected. We model the growth process of iForest as a random walk, enabling us to derive the expected depth function, which is the outcome of iForest, using transition probabilities. The case studies reveal key inductive biases: iForest exhibits lower sensitivity to central anomalies while demonstrating greater parameter adaptability compared to $k$-Nearest Neighbor. Our study provides a theoretical understanding of the effectiveness of iForest and establishes a foundation for further theoretical exploration.
翻译:暂无翻译