Arkanjo 0.2
A tool for find code duplicated functions in codebases
Loading...
Searching...
No Matches
Similarity_Table Class Reference

Represents a similarity graph between functions (paths). More...

#include <similarity_table.hpp>

Public Member Functions

 Similarity_Table (double _similarity_threshold)
 Constructs with custom similarity threshold.
 
 Similarity_Table ()
 Constructs with default similarity threshold.
 
void load ()
 
void update_similarity (double new_similarity_threshold)
 Updates similarity threshold.
 
double get_similarity (const Path &path1, const Path &path2)
 Gets similarity between two paths.
 
bool is_similar (const Path &path1, const Path &path2)
 Checks if two paths are similar.
 
const std::vector< Path > & get_path_list () const
 Gets list of all known paths.
 
int get_number_lines_in_pair (const Path &path1, const Path &path2)
 
std::vector< Pathget_similar_path_to_the_reference (const Path &reference)
 Gets paths similar to reference path.
 
std::vector< std::tuple< double, Path, Path > > get_all_path_pairs_and_similarity_sorted_by_similarity ()
 Gets all similar path pairs with scores, sorted.
 
std::vector< std::pair< Path, Path > > get_all_similar_path_pairs_sorted_by_similarity ()
 Gets all similar path pairs, sorted by similarity.
 
std::vector< std::pair< Path, Path > > get_all_similar_path_pairs_sorted_by_line_number ()
 Gets all similar path pairs, sorted by line count.
 
std::vector< Clusterget_clusters ()
 Generate clusters of similar functions using DFS on the similarity graph.
 
std::vector< ClusterInfoget_clusters_info (bool sorted)
 Returns detailed information about all clusters found in the similarity table.
 

Detailed Description

Represents a similarity graph between functions (paths).

Each node corresponds to a function (identified by a Path). Stores pairs of similar functions with their similarity scores.

Internally:

  • paths stores all known functions.
  • path_id maps a Path to its unique node ID.
  • similarity_graph is an adjacency list representation: node -> [(neighbor_id, similarity), ...]
  • similarity_table stores pairwise similarity for fast lookup.

Graph interpretation:

  • Nodes = functions
  • Edges = similarity >= threshold (filtered at query time)
Note
The graph is undirected.

Definition at line 99 of file similarity_table.hpp.

Constructor & Destructor Documentation

◆ Similarity_Table() [1/2]

Similarity_Table::Similarity_Table ( double _similarity_threshold)
explicit

Constructs with custom similarity threshold.

Parameters
_similarity_thresholdInitial threshold value

Definition at line 50 of file similarity_table.cpp.

◆ Similarity_Table() [2/2]

Similarity_Table::Similarity_Table ( )
explicit

Constructs with default similarity threshold.

Definition at line 53 of file similarity_table.cpp.

Member Function Documentation

◆ get_all_path_pairs_and_similarity_sorted_by_similarity()

std::vector< std::tuple< double, Path, Path > > Similarity_Table::get_all_path_pairs_and_similarity_sorted_by_similarity ( )

Gets all similar path pairs with scores, sorted.

Returns
vector<tuple<double,Path,Path>> Similar pairs with scores

Definition at line 105 of file similarity_table.cpp.

◆ get_all_similar_path_pairs_sorted_by_line_number()

std::vector< std::pair< Path, Path > > Similarity_Table::get_all_similar_path_pairs_sorted_by_line_number ( )

Gets all similar path pairs, sorted by line count.

Returns
vector<pair<Path,Path>> Similar path pairs

Definition at line 146 of file similarity_table.cpp.

◆ get_all_similar_path_pairs_sorted_by_similarity()

std::vector< std::pair< Path, Path > > Similarity_Table::get_all_similar_path_pairs_sorted_by_similarity ( )

Gets all similar path pairs, sorted by similarity.

Returns
vector<pair<Path,Path>> Similar path pairs

Definition at line 118 of file similarity_table.cpp.

◆ get_clusters()

std::vector< Cluster > Similarity_Table::get_clusters ( )

Generate clusters of similar functions using DFS on the similarity graph.

Only edges with similarity >= threshold are considered.

Returns
vector of Cluster objects

Definition at line 168 of file similarity_table.cpp.

◆ get_clusters_info()

std::vector< ClusterInfo > Similarity_Table::get_clusters_info ( bool sorted)

Returns detailed information about all clusters found in the similarity table.

Returns
std::vector<ClusterInfo>

Definition at line 205 of file similarity_table.cpp.

◆ get_number_lines_in_pair()

int Similarity_Table::get_number_lines_in_pair ( const Path & path1,
const Path & path2 )

Definition at line 160 of file similarity_table.cpp.

◆ get_path_list()

const std::vector< Path > & Similarity_Table::get_path_list ( ) const

Gets list of all known paths.

Returns
vector<Path> All paths in table

Definition at line 90 of file similarity_table.cpp.

◆ get_similar_path_to_the_reference()

std::vector< Path > Similarity_Table::get_similar_path_to_the_reference ( const Path & reference)

Gets paths similar to reference path.

Parameters
referencePath to compare against
Returns
vector<Path> Similar paths

Definition at line 94 of file similarity_table.cpp.

◆ get_similarity()

double Similarity_Table::get_similarity ( const Path & path1,
const Path & path2 )

Gets similarity between two paths.

Parameters
path1First path to compare
path2Second path to compare
Returns
double Similarity score

Definition at line 64 of file similarity_table.cpp.

◆ is_similar()

bool Similarity_Table::is_similar ( const Path & path1,
const Path & path2 )

Checks if two paths are similar.

Parameters
path1First path to compare
path2Second path to compare
Returns
bool True if paths are similar

Definition at line 85 of file similarity_table.cpp.

◆ load()

void Similarity_Table::load ( )

Definition at line 56 of file similarity_table.cpp.

◆ update_similarity()

void Similarity_Table::update_similarity ( double new_similarity_threshold)

Updates similarity threshold.

Parameters
new_similarity_thresholdNew threshold value

Definition at line 60 of file similarity_table.cpp.


The documentation for this class was generated from the following files: