It is not uncommon for security researchers and anti virus companies to deal with malicious corpus on a regular basis. Most of the corpus is usually unlabelled or sometimes given generic labels (or even mislabeled!). It will be interesting/useful to quickly cluster similar samples within the corpus as a pre-processing step before further analysis. Finding clusters within the corpus scan have many
