Blast

Blocking with Loosely Schema-Aware Techniques

View the Project on GitHub Stravanni/blast

What does Blast do?

It helps to scale Entity Resolution: it efficiently extracts loose schema information, and uses this information to group together records that most likely will match.

Basically, instead of comparing all possible paris of records, you only compare subsets of them.

P.S.: Blast employs only unsupervised techniques.

When to use Blast?

When you have semi-structured data to clean, but you cannot do schema-matching to apply traditional blocking techniques.

Current Project Version

Here the code of "BLAST: a loosely schema-aware meta-blocking approach for entity resolution", please cite: [1].

The approach is implemented on top of the Blocking Framework, A framework for blocking-based Entity Resolution [2].

Where to start

Take a look to Experiments -> Test_metablocking.java

References

[1] Simonini, Giovanni, Sonia Bergamaschi, and H. V. Jagadish. "BLAST: a loosely schema-aware meta-blocking approach for entity resolution." Proceedings of the VLDB Endowment 9.12 (2016): 1173-1184.

[2] sourceforge.net/projects/erframework/