0.6.0 • Published 3 years ago

javascript-clone-detection v0.6.0

Weekly downloads
-
License
MIT
Repository
github
Last release
3 years ago

JavaScript Clone Detection - (v0.6.0)

Academic study project on JavaScript code duplication using AST parsing with text similarity.

Usage

Run:

make init
clone-analisys <PATH> <SIMILARITY INDEX>
// clone-analisys src/api-server 0.85

Current Process

We select a piece of code to convert it into an Abstract Syntax Tree (AST) representation. Then, the cleaning and normalization phase is carried out, in which we remove unwanted attributes and apply a standardization between similar structures, such as the example of an arrow function for a regular function.

// the both code snippets are characterized as type 2 clone

const arrowFunction = (value) => {
  const { type } = value
  return type
}

function regularFunction(value) {
  // this is a regular function
  const { type } = value
  return type
};

To perform a representation of code snippets in AST, we have good libraries like:

LibraryVersion
espree7.3.1
@babel/parser7.14.7
abstract-syntax-tree2.19.1

In this project we are using abstract-syntax-tree because it is a library that offers greater facilities to manipulate an AST.

Similarity between ASTs

To perform the comparison between ASTs, even in this current version, we had two options, namely: i) Comparison between pure ASTs where we only have the return if they are identical or not, or; ii) Convert the ASTs to text (string) and use libraries that check the textual similarity between the code snippets.

LibraryVersionType
ast-compare2.1.0Compare ASTs
string-similarity4.0.4Compare strings
string-comparison1.0.9Compare strings

The decision to compare ASTs directly seems to be the most coherent decision, but so far lib ast-compare can only identify whether the pieces are identical or not. In this scenario, using the representation of Abstract Syntax Trees still gives us the advantage of being a uniform and easy-to-manipulate representation for pre-processing and normalizations, in addition to transforming it into text so that it can be compared as a textual element.

Results

Using the code snippets examples above, we have:

No pre-processing and normalization

ast-compare:  false
string-similarity (Dice):  0.925351071692535
string-comparison (Cosine):  0.9672041516493517
string-comparison (Levenshtein):  0.9072164948453608
string-comparison (Longest Common Subsequence):  0.9357933579335793
string-comparison (Metric Longest Common Subsequence):  0.9337260677466863

With pre-processing and normalization (v.0.3.1)

ast-compare:  true
string-similarity (Dice):  1
string-comparison (Cosine):  1
string-comparison (Levenshtein):  1
string-comparison (Longest Common Subsequence):  1
string-comparison (Metric Longest Common Subsequence):  1

To learn more about the issues addressed, read: ESTUDO EMPÍRICO SOBRE DUPLICAÇÃO DE CÓDIGO EM APLICAÇÕES REACT.JS.