METRICS, SIMILARITY, AND PROXIMITY

Approaches

1. Traditional metrics (distances, by Freshe's axioms)
2. "Minimal cost" transformation: one object into another object (edit disctance)
3. Common part of the initial (compared) objects/structures (e.g., subgraph, substructure, subsequence)
4. Vectro-like proximity

Applied Topics

  • Classification Theory/Clustering
  • Data Envelopment Analysis (DEA)
  • Comparison of Complex Technical Systems
  • Comparison of genoms, classification hierarchies (Mathematical Biology)
  • Comparison of chemical structures
  • Matching of information fragments (Information Engineering)
  • Image processing
  • Comparison of architectural decisions (Architecture)
  • Comparison of companies (economical/financial analysis)
  • Comparison of personnel (e.g., in personnel management)
  • Aggregation of modular solutions

    Journals

  • J. of Classification
  • J. of Algorithms
  • SIAM J. on Computing
  • Theoretical Computer Science
  • Eur. J. of Operational Research
  • Information Processing Letters
  • Pattern Recongnition
  • Pattern Recongnition Letters
  • IEEE Trans. SMC, Part A
  • IEEE Trans. PAMI
  • Algorithmica
  • J. of the ACM
  • Discrete Applied Mathematics
  • Foundations of Computing and Decision Science

    Bibliography

  • Books
    1. J.-P. Barthelemy, and A. Guenoche, Trees and Proximity Representation, Chichester: J.Wiley & Sons, 1991.
    2. D. Gusfield, Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge Univ. Press, New York, 1997.
    3. M.Sh. Levin, Combinatorial Enineering of Decomposable Systems, Dordrecht: Kluwer, 1998 (Chapter 4)
    4. B.G. Mirkin, Mathematical Classification and Clustering, Boston: Kluwer, 1996.
    5. P.A. Pevzner, Computational Molecular Biology: An Algorithmic Approach. Bradford Books, 2000.
    6. G. Valiente, Algorithms on Trees and Graphs. Springer, Berlin, 2002.

  • Surveys
    1. L. Bergroth, H. Hakonen, T. Raita, A survey of longest common subsequence algorithms. In: Proc. of the Seventh Int. Symp. on String Processing and Information Retrieval SPIRE'00, 39-48, 2000.
    2. P. Bille, A survey on edit distance and related problems. Theoretical Computer Science, 337(1-3), 217-239, 2005.
    3. X. Jiang, H. Bunke, J. Csirik, Median strings: A review. In: M. Last, A.A. Kandel, H. Bunke, (Eds.), Data Mining in Time Series Databases. World Scientific, 173-192, 2004.

  • Basic Papers
    1. T. Akatsu, M.M. Halldorsson, On the approximation of largest common subtrees and largest common point sets. Theoretical Computer Science, 233(1-2), 33-50, 2000.
    2. A. Amir, D. Keselman, Maximum agreement subtree in a set of evolutionary trees: metrics and efficient algorithms. SIAM J. Comput., 26(6), 1656-1669, 1997.
    3. M. Arnold, E. Ohlebusch, Linear time algorithms for generalizations of the longest common substring problem. Algorithmica, 60(4), 806-818, 2011.
    4. K.P. Bogart, Preference structures I: Distance between Transitive Preference Relations. J. Math. Soc., 3, 49-67, 1973.
    5. H. Bunke, On relation between graph edit disctance and maximum common subgraph. Pattern Recong. Lett., 18(8), 689-694, 1997.
    6. H. Bunke, K. Shearer, A graph disctance metric based on the maximal common subgraph. Patern Recogn. Lett., 19(3-4), 255-259, 1998.
    7. W. Chen, New algorithm for ordered tree-to-tree correction problem. J. of Algorithms, 40(2), 135-158, 2001.
    8. L.P. Chew, K. Kedem, D.P. Hutenlocher, J. Kleinberg, Fast detection of geometric substructure in protein. J. of Comput. Biology, 6(3-4), 313-325, 1999.
    9. R. Connor, F. Simeoni, M. Iakovos, R. Moss, A bounded distance metric for comparing tree structure. Information Systems, 36(4), 748-764, 2011.
    10. W.D. Cook, L.M. Seiford, Priority ranking and consensus formation. Management Science, 24(16) 1721-1732, 1978.
    11. W.D. Cook, L.M. Seiford, M.Kress, A General Framework for Distance-Based Consensus in Ordinal Ranking Models. European Journal of Operational Research, 96(2) 392-397, 1996.
    12. L. Eghe, C. Michel, String similarity measures for ordered sets of documents in information retrieval. Information Processing and Management, 38(6), 823-848, 2003.
    13. M. Farach, T. Przytycka, M. Thorup, On the agreement of many trees. Inf. Process. Lett., 55(6), 297-301, 1995.
    14. M. Ferrer, E. Valveny, F. Serratosa, K. Riesen, H. Bunke, Generalized median graph computation for means of graph embedding in vector space. Pattern Recognition, 43(4), 1642-1655, 2010.
    15. C.R. Finden, A.D. Gordon, Obtaining common pruned trees. J. Classif., 2(1) 255-276, 1985.
    16. A.D. Gordon, Constructing dissimilarity measures. J. of Classification, 3(2), 335-348, 1986.
    17. J.C. Gower, Measures of similarity, dissimilarity, and distance. In: Encyclopedia of Stat. Sci., 5, S. Kotz, N.L. Johnson (Eds.), Wiley, New York, 397-405, 1985.
    18. J.C. Gower, Measures of Similarity, Dissimilarity, and Distance. In: S. Kotz, and N.L. Johnson, (eds.), Encyclopedia of Statistical Sciences, Vol. 5, New York: J. Wiley & Sons, pp. 397-405, 1985.
    19. S. Hannenhalli, P. Pevzner, Transforming men into mice (polynomial algorithm for genomic distance problem). 36th Ann. Symp. on Foundations of Computer Science FOCS 1995, Milwaukee, Wisconsin, IEEE Computer Society, 581-592, 1995.
    20. E. Herrera-Viedma, F. Herrera, F. Chiclana, A consensus model for multiperson decision making with different preference structures. IEEE Trans. SMC, Part A, 32(3), 394-402, 2002.
    21. L. Hubert, and P. Arabie, Comparing Partitions. J. of Classification, Vol. 2, No. 3, pp. 193-218, 1985.
    22. X. Jiang, A. Munger, H. Bunke, On median graphs: properties, algorithms, and applications. IEEE Trans. PAMI, 23(10), 1144-1151, 2001.
    23. T. Kohonen, Median strings. Pattern Recognition Letters, 3(5), 309-313, 1985.
    24. V.I. Levenshtein, Binary codes capable of correcting deletions, insertions and reversals. Cybernetics and Control Theory, 10(8), 707-710, 1966
    25. M.Sh. Levin, Towards comparison of decomposable systems. In: Data Science, Classification, and Related Methods, Springer-Verlag, Tokyo, pp. 154-161, 1998.
    26. D. Maier, The complexity of some problems on subsequences and supersequences. J. of the ACM, 25(2), 322-336, 1978.
    27. B.G. Mirkin, and L.B. Chernyi, On a Distance Measure Between Partitions of a Finite Sets, Automation and Remote Control, 31/5, 786-792, 1970.
    28. A.M. Rappoport, Measurement of Distance Between Weighted Graphs of Expert Judgment. In: Multicriteria Choice for Solving of Ill-structured Problems. Inst. for System Analysis, Moscow, No. 5, pp. 97-108, 1978 (in Russian)
    29. S.M. Selkow, The tree-to-tree editing problem. Information Processing Letters, 6(6), 184-186, 1977.
    30. K.-C. Tai, The tree-to-tree correction problem. J. of the ACM, 26(3), 422-433, 1979.
    31. V.G. Timkovsky, Complexity of common subsequence and supersequence problems. Cybernetics and Systems Analysis, 25(5), 565-580, 1989.
    32. A. Tversky, Features of Similarity, Psychological Review, 84(4) 327-352, 1977.
    33. R.A. Wagner, M.J. Fisher, The string-to-string correction problem. J. of the ACM, 21(1), 168-173, 1974.
    34. W.D. Wallis, P. Shoubridge, M. Kraetz, D. Ray, Graph distances using graph union. Pattern Recognition Letters, 22(6-7), 701-704, 2001.
    35. K. Zhang, A constrained edit-distance between unordered labeled trees. Algorithmica, 15(3), 205-222, 1996.
    36. K. Zhang, J.T.L. Wang, D. Shasha, On the editing distance between undirected acyclic graphs. Int. J. Foundations of Computer Science, 7(1), 43-57, 1996.

  • Papers
    1. M.S. Bansal, D. Fernandez-Baca, Computing distances between partial rankings. Information Processing Letters, 109(4), 238-241, 2009.
    2. M.P. Beck, B.W. Lin, Some heuristics for the consensus ranking problem. Computers and Operations Research, 10(1), 183, 1-7, 1983.
    3. V. Berry, F. Nicolas, Maximum agreement and compatible supertrees. In: Proc. of the 15th Annual Symp. on Combinatorial Pattern Matching CPM-2004, LNCS 3109, Springer, Berlin, pp. 205-219, 2004.
    4. C. Elzinga, H. Wang, Z. Lin, Y. Kumar, Concordance and consensus. Information Sciences, 181(12), 2529-2549, 2011.
    5. S. Guillernot, F. Nocolas, Solving the maximum agreement subtree and the maximum compatible tree problems on many bounded degree trees. In: M. Lewenstein, G. Valiente (Eds.), Proc. of the Int. Conf. on Combinatorial Pattern Matching CPM 2006, LNCS 4009, Springer, 165-176, 2006.
    6. M.T. Hallett, C. McCartin, A faster FPT algorithm for the maximum agreement forest problem. Theory of Computing Systems}, 41(3), 539-550, 2007.
    7. T. Jiang, M. Li, On the approximation of shortest common supersequences and longest common subsequences. SIAM J. on Computing, 24(5), 1122-1139, 1995.
    8. T. Jiang, V.G. Timkovsky, Shortest consistent superstring computable in polynomial time. Theor. Comput. Sci., 143(1), 113-122, 1995.
    9. T. Jiang, L. Wang, K. Zhang, Alignment of trees - an alternative to tree edit. Theoretical Computer Science, 143(1), 137-148, 1995.
    10. M.Sh. Levin, Vector-like proximity estimates for structures. In: All-Russian Conf. "Problems and Methods of Decision Making in Organizational Management Systems", Moscow, Inst. for System Analysis, 116-117, 1988 (in Russian).
    11. M.Sh. Levin, Common part of preference relations, Foundations of Computing and Decision Sciences, 28(4), 223-246, 2003.
    12. A. Marzal, E. Vidal, Computation of normalized edit distance and applications. IEEE Trans. PAMI, 15(9), 926-932, 1993.
    13. A. Robles-Kelly, E.R. Hancock, string edit distance, random walks and graph matching. Int. J. of Pattern Recognition and Artificial Intelligence, 18(3), 315-327, 2004.
    14. E.M. Rodrigues, M.F. Sagot, Y. Wakabayashi, The maximum agreement forest problem: Approximation algorithms and computational experiments. Theoretical Computer Science, 374(1-3), 91-110, 2007.
    15. D. Sasha, K. Zhang, Simple fast algorithms for the editing distance between trees and related problems. SIAM J. on Computing, 18(6), 1245-1262, 1989.
    16. C. Solnon, J.-M. Jolion, Generalized vs set median string for histogram based distances: Algorithms and classification results in image domain. In: F. Escolano, M. Vento (Eds.), Proc. of the 6th Workshop on Graph Based Representation in Pattern Recognition GbPrP 2007, LNCS 4538, Springer, 404-414, 2007.
    17. E. Tanaka, K. Tanaka, The tree-to-tree editing problem. Int. J. Pattern Recogn. and Art. Intell., 2(2), 221-240, 1988.
    18. J. Tarhio, E. Ukkonen, A greedy approximation algorithm for constructing shortest common superstrings. Theor. Comput. Sci., 57(1), 131-145, 1988.
    19. Gabriel Valiente An efficient bottom-up distance between trees. In: Proc. of the 8th Int. Symp. String Processing Information Retrieval SPIRE 2001, IEEE Computer Science Press, Laguna de San Rafael, Chile 212-219, 2001.
    20. W.D. Wallis, P. Shoubridge, M. Kraetz, D. Ray, Graph distances using graph union. Pattern Recognition Letters, 22(6-7), 701-704, 2001.
    21. J.T.-L. Wang, K. Zhang, Finding similar consensus between trees: An algorithm and distance hierarchy. Pattern Recognition, 34(1), 127-137, 2001.
    22. C. Whidden, R.G. Beiko, N. Zeh, Fast FPT algorithms for computing rooted agreement forest: Theory and experiments. In: Proc. of the 9th Int. Symp. on Experimental Algorithms SEA 2010, LNCS 6049, Springer, 141-153, 2010.

  • Electronic Preprints
    1. T. Kelsey, L. Kotthoff, The exact closest string problem as a constraint satisfaction problem. Electronic preprint, CoRR. abs./1005.0089, 2010.
    2. M.Sh. Levin, Aggregation of composite solutions: strategies, models, examples. Electronic preprint. 72 pp., Nov. 29, 2011.
    http://arxiv.org/abs/1111.6983 [cs.SE]

    3. M.Sh. Levin, Multiset estimates and combinatorial synthesis. Electronic preprint. 30 pp., May 9, 2012.
    http://arxiv.org/abs/1205.2046 [cs.SY]

    4. C. Whidden, R.G. Beiko, N. Zeh, Fixed parameter and approximation algorithms for maximum agreement forests. Electronic preprint, 12 Aug. 2011, arXiv: 1108.2664v1 [q-bio.PE]

  • PhD Theses
    1. Bruno T. Messmer, Efficient Graph Matching Algorithms for Preprocessed Graphs. PhD Thesis, Inst. of Comp. Science and Applied Mathematics, University of Bern, 1995.