Khiops is an open source machine learning tool designed for mining large multi-table databases. Khiops is based on a unique Bayesian approach that has attracted academic interest with more than 20 publications on topics such as variable selection, classification, decision trees and co-clustering. It provides a predictive measure of variable importance using discretisation models for numerical data and value clustering for categorical data. The proposed classification/regression model is a naive Bayesian classifier incorporating variable selection and weight learning. In the case of multi-table databases, it provides propositionalisation by automatically constructing aggregates. Khiops is adapted to the analysis of large databases with millions of individuals, tens of thousands of variables and hundreds of millions of records in secondary tables. It is available on many environments, both from a Python library and via a user interface.
翻译:Khiops 是一款专为挖掘大型多表数据库而设计的开源机器学习工具。该工具基于独特的贝叶斯方法构建,已在变量选择、分类、决策树和协同聚类等主题上发表了20余篇学术论文,引起了学界广泛关注。它通过数值数据的离散化模型和分类数据的值聚类技术,提供变量重要性的预测性度量。所提出的分类/回归模型是一种融合了变量选择与权重学习的朴素贝叶斯分类器。针对多表数据库场景,Khiops 通过自动构建聚合特征实现命题化处理。该工具适用于分析包含数百万主体、数万变量及次级表内数亿记录的大型数据库,可通过Python库和用户界面等多种环境部署使用。