Fast Forensic Analysis in Computer Inspection using Clustering Methods

Fast Forensic Analysis in Computer Inspection using Clustering Methods
Authors:L. JAYASREE , A. NARAYANA RAO

Abstract: Hundreds of thousands of files are usually examined in computer forensic analysis. The data in those files consists of unstructured text. Algorithms for clustering documents can facilitate the discovery of new and useful knowledge from the documents under analysis. In this paper we are discussing an approach that applies document clustering algorithms to forensic analysis of computers seized in police investigations. The proposed approach is explained by carrying out extensive experimentation with six well-known clustering algorithms (K-means, K-medoids, Single Link, Complete Link, Average Link, and CSPA) applied to five real-world datasets. Experiments have been performed with different combinations of parameters, resulting in 16 different instantiations of algorithms.. The experiments show that the Average Link and Complete Link algorithms provide the best results for our application domain. Two relative validity indexes were used to automatically estimate the number of clusters. 

Keywords: Clustering, Text Mining and Forensic Computing. 

INTRODUCTION 
 It is estimated that the volume of data in the digital world increased from 161 hexa bytes in 2006 to 988 hexa bytes in about 18 times the amount of information present in all the books ever written and it continues to grow exponentially. This large amount of data has a direct impact in Computer Forensics, which can be broadly defined as the discipline that combines elements of law and computer science to collect and analyze data from computer systems. Clustering algorithms are typically used for exploratory data analysis, where there is little or no prior knowledge about the data. our datasets consist of unlabeled objects—the classes or categories of documents that can be found are a priori unknown. Assuming that labeled datasets could be available from previous analyses, there is almost no hope that the same classes (possibly learned earlier by a classifier in a supervised learning setting) would be still valid for the upcoming data, obtained from other computers and associated to different investigation processes, the new data sample would come from a different population. In this context, the use of clustering algorithms, which are capable of finding latent patterns from text documents found in seized computers, can enhance the analysis performed by the expert examiner. Algorithms find that objects within a valid cluster are more similar to each other than they are objects belonging to a different cluster.

                                                                                 Read More....


No comments:

Post a Comment