基于最小公共祖先推断有向无环图与系统发育网络 (Inferring DAGs and Phylogenetic Networks from Least Common Ancestors)

A least common ancestor (LCA) of two leaves in a directed acyclic graph (DAG) is a vertex that is an ancestor of both leaves and has no proper descendant that is also their common ancestor. LCAs capture hierarchical relationships in rooted trees and, more generally, in DAGs. In 1981, Aho et al. introduced the problem of determining whether a set of pairwise LCA constraints on a set $X$, of the form $(i,j)<(k,l)$ with $i,j,k,l\in X$, can be realized by a rooted tree whose leaf set is $X$, such that whenever $(i,j)<(k,l)$, the LCA of $i,j$ is a descendant of that of $k,l$. They also presented a polynomial-time algorithm, BUILD, to solve this problem. However, many such constraint systems cannot be realized by any tree, prompting the question of whether they can be realized by a more general DAG. We extend Aho et al.'s framework from trees to DAGs, providing both theoretical and algorithmic foundations for reasoning about LCA constraints in this broader setting. Given a collection $R$ of LCA constraints, we define its $+$-closure $R^+$, capturing additional LCA relations implied by $R$. Using $R^+$, we construct a canonical DAG $G_R$ and prove that $R$ is DAG-realizable if and only if it is realized by $G_R$. We further adapt this construction to phylogenetic networks, defining a canonical network $N_R$ and prove that it is regular, i.e., it coincides with the Hasse diagram of its underlying set system. Finally, we show that for any DAG-realizable $R$, its classical closure - comprising all LCA constraints that hold in every DAG realizing $R$ - coincides with its $+$-closure. All constructions are computable in polynomial time, and we provide explicit algorithms for each.

翻译：在有向无环图（DAG）中，两个叶节点的最小公共祖先（LCA）是同时为这两个叶节点祖先的顶点，且不存在任何真后代也是它们的公共祖先。LCA 刻画了有根树乃至更一般的 DAG 中的层次关系。1981 年，Aho 等人提出了如下问题：对于集合 $X$ 上形如 $(i,j)<(k,l)$（其中 $i,j,k,l\\in X$）的成对 LCA 约束集合，能否通过一个叶集为 $X$ 的有根树实现，使得每当 $(i,j)<(k,l)$ 时，$i,j$ 的 LCA 是 $k,l$ 的 LCA 的后代。他们还提出了一个多项式时间算法 BUILD 来解决该问题。然而，许多此类约束系统无法由任何树实现，这引发了它们能否由更一般的 DAG 实现的问题。我们将 Aho 等人的框架从树扩展到 DAG，为在这一更广泛背景下推理 LCA 约束提供了理论和算法基础。给定 LCA 约束集合 $R$，我们定义其 $+$-闭包 $R^+$，以捕获由 $R$ 隐含的额外 LCA 关系。利用 $R^+$，我们构造了一个规范 DAG $G_R$，并证明 $R$ 是 DAG 可实现的当且仅当它由 $G_R$ 实现。我们进一步将此构造推广到系统发育网络，定义了一个规范网络 $N_R$，并证明它是正则的，即与其底层集合系统的哈斯图一致。最后，我们证明对于任何 DAG 可实现的 $R$，其经典闭包（包含在所有实现 $R$ 的 DAG 中都成立的 LCA 约束）与其 $+$-闭包一致。所有构造均可在多项式时间内计算完成，我们为每个步骤提供了明确的算法。