基于多层概要结构的数据流的频繁项集发现算法
Finding Frequent Items of Data Streams Based on Hierarchical Sketch
-
摘要: 利用一类基于异或运算的两两相互独立的哈希函数族,实现了对多层结构流数据进行“概括”的概要数据结构.应用该多层概要数据结构,实现了面向数据流的多层频繁项集的动态近似查找算法.实验结果表明,该算法以亚线性的时间和空间消耗,在统计意义上达到了几乎100%的查找和估计精确度.Abstract: A new hierarchical sketch data structure was implemented,which summarized the hierarchical structure in stream data and the key component of which was a pairwise independent family of hash functions based on exclusive-or operator.Applying the hierarchical sketch,an algorithm that finds hierarchical frequent items over data streams dynamically and approximately was implemented.The algorithm has sub-linear time and space costs and its accuracy is almost perfect in statistical meaning.
下载: