D2 histosketch: discriminative and dynamic similarity-preserving sketching of streaming histograms
Published in:
- IEEE Transactions on Knowledge and Data Engineering. - 2018, p. 1–1
English
Histogram-based similarity has been widely adopted in many machine learning tasks. However, measuring histogram similarity is a challenging task for streaming histograms, where the elements of a histogram are observed one after the other in an online manner. The ever-growing cardinality of histogram elements over the data streams makes any similarity computation inefficient in that case. To tackle this problem, we propose in this paper D2HistoSketch, a similarity-preserving sketching method for streaming histograms to efficiently approximate their Discriminative and Dynamic similarity. D2HistoSketch can fast and memory-efficiently maintain a set of compact and fixed-size sketches of streaming histograms to approximate the similarity between histograms. To provide high-quality similarity approximations, D2HistoSketch considers both discriminative and gradual forgetting weights for similarity measurement, and seamlessly incorporates them in the sketches. Based on both synthetic and real-world datasets, our empirical evaluation shows that our method is able to efficiently and effectively approximate the similarity between streaming histograms while outperforming state-of-the-art sketching methods. Compared to full streaming histograms with both discriminative and gradual forgetting weights in particular, D2HistoSketch is able to dramatically reduce the classification time (with a 7500x speedup) at the expense of a small loss in accuracy only (about 3.25%).
-
Faculty
- Faculté des sciences et de médecine
-
Department
- Département d'Informatique
-
Language
-
-
Classification
-
Computer science and technology
-
License
-
License undefined
-
Identifiers
-
-
Persistent URL
-
https://folia.unifr.ch/unifr/documents/307782
Statistics
Document views: 30
File downloads: