The screenshots’s data source is a subset of onion domain websites scrapped by AIL Processing on datasetsĮach picture’s content of each dataset was hashed to “humanly readable” name to allow a unified and readable reference system for image’s naming convention. Data sourcesĭifferent tools collected the dataset presented on this page.
Be aware of the log scale of the second frequency chart. This classification is per part and will be improved and updated as soon as classification operations had been achieved. Only one label-classification (DataTurks direct output) is provided along with the dataset. Around 37500 pictures are in this dataset to date. This dataset is named circl-ail-dataset-01 and is composed of AIL’s scraped onion websites.
Our contribution about this problem is the provision of datasets to support research effort in this direction. Datasets are part of the foundation needed to construct such tool. Even partial automation would reduce the burden of this task on security teams. Ideally, the extraction of links or correlation between these images could be fully automated. No open-source tool provides easy correlation on pictures, without regard to the technology used. Image correlation for security event correlation purposes is nowadays mainly manual. The dataset presented on this page is strongly associated with other projects, which are an evaluation framework provided as Carl-Hauser and the open-source library provided as Douglas-Quaid. It can be used, for example, for data leak prevention.
MISP is an open source software solution tool developed at CIRCL for collecting, storing, distributing and sharing cyber security indicators and threats about cyber security incidents analysis.ĪIL is also an open source modular framework developed at CIRCL to analyze potential information leaks from unstructured data sources or streams. This paper includes the release of two datasets to support research effort in this direction. A quick-lookup mechanism for correlation would be necessary and part of this library. Our long-term objective is to build a generic library and services which can at least be easily integrated in Threat Intelligence tools such as AIL, and MISP - Malware Information Sharing Platform. Image-matching algorithms benchmarks already exist and are highly informative, but none is delivered turnkey.
However, a classification of this kind of pictures needs to be addressed. Less research about image matching and image classification seems to have been conducted exclusively on websites screenshots. on average 10000 screenshots of onion domains websites are scrapped each day in AIL - Analysis Information Leak framework, an analysis tool of information leak - and analysts need to classify, search and correlate through all the images.Īutomatic tools can help them in this task. CERTs such as CIRCL and security teams collect and process content such as images (at large from photos, screenshots of websites or screenshots of sandboxes).ĭatasets become larger - e.g.