{"id":242,"date":"2021-06-18T22:50:35","date_gmt":"2021-06-18T22:50:35","guid":{"rendered":"https:\/\/www.plutohash.com\/?p=242"},"modified":"2021-06-25T18:07:55","modified_gmt":"2021-06-25T18:07:55","slug":"clustering-bitcoin-ransomware-python","status":"publish","type":"post","link":"http:\/\/www.plutohash.com\/2021\/06\/18\/clustering-bitcoin-ransomware-python\/","title":{"rendered":"Clustering Bitcoin Addresses Used in Ransomware with Python"},"content":{"rendered":"\n

In this article we will see how you can track bitcoin addresses<\/strong> used in ransomware attacks<\/strong> using Python and the PlutoHash<\/strong><\/a> platform, which provides up-to-date data extracted from the blockchain.<\/p>\n\n\n\n

Ransomware is a type of malware <\/strong>that encrypts the files contained in the victim\u2019s device and demands to pay a ransom to unlock them. Businesses, universities, financial institutions and health organizations are the preferred targets of criminals. This is because they are organizations willing to pay to recover sensitive data. In any case, bitcoin payments are not completely anonymous as often believed, but each transaction leaves a trace in the blockchain.<\/p>\n\n\n\n

The Bitcoin protocol can be described as pseudo-anonymous<\/strong>. Sending and receiving payments in bitcoin is like writing books under a pseudonym, if the identity of the writer is revealed all the books would be linked to that specific writer. The same logic can be applied to bitcoin transactions, where discovering the identity of the owner of an address would link back to him all transactions made and received at that address. To better preserve the integrity of transactions, bitcoin wallets can also generate different addresses for each transaction. As described in the Bitcoin white paper:<\/p>\n\n\n\n

As an additional firewall, a new (address) should be used for each transaction to keep them from being linked to a common owner. The risk is that if the owner of a (address) is revealed, linking could reveal other transactions that belonged to the same owner.<\/em><\/p>Satoshi Nakamoto<\/em><\/cite><\/blockquote>\n\n\n\n

In this notebook we will see how it is possible to link multiple bitcoin addresses to the same person<\/strong> by analyzing multi-input transactions. We\u2019ll start by analyzing an address (the seed address) that we know for sure has been used in a ransomware attack. Comparing input and output addresses of transactions we will see how to link these new addresses to the same owner of the seed address.<\/p>\n\n\n\n

Using the PlutoHash<\/strong><\/a> platform, we have at our disposal all the data contained in the bitcoin blockchain and a dataset containing the seed addresses of several families of ransomware attacks. This data, along with the BlockSci libraries, is all we need for our analysis. The dataset containing seed addresses was created from an academic research regarding ransomware payments [1].<\/p>\n\n\n\n

import blocksci\nimport pandas as pd\nimport warnings\nwarnings.filterwarnings(\u2018ignore\u2019)\n#instantiate the chain object\nchain = blocksci.Blockchain(\u201c\/BlockSci\/config_file\u201d)\n#load the dataset containing the ransomware seed addresses\nseed_addresses = pd.read_csv(\u201c\/data\/datasets\/ransomware_addresses_list\/blockchain\/seed_addresses.csv\u201d)<\/code><\/pre>\n\n\n\n

As a reminder, the dataset containing seed addresses and data on bitcoin transactions and addresses are available in the PlutoHash platform. Simply register for our Beta Test Program<\/strong><\/a> to get started.We have imported libraries and instantiated the chain object. Let\u2019s take a look at the dataset containing the seed addresses. For each ransomware family, let\u2019s see how many seed addreses have been collected.<\/p>\n\n\n\n

seed_addresses[\u2018family\u2019].value_counts()\n<\/code><\/pre>\n\n\n\n
\"bitcoin<\/figure>\n\n\n\n

We can see that for some ransomware families several addresses have been collected (for example, for the Locky ransomware there are over 7000). We will conduct our analysis starting from a single address, even if of course the clustering logic for addresses would remain the same.<\/p>\n\n\n\n

Let\u2019s take in this case the seed address that belongs to CryptXXX <\/strong>ransomware.<\/p>\n\n\n\n

CryptXXX_seed_address = seed_addresses.loc[seed_addresses[\u2018family\u2019] == \u2018CryptXXX\u2019]\nCryptXXX_seed_address<\/code><\/pre>\n\n\n\n

Now we extract only the bitcoin address from the dataset and save the variable in string format. With the address in string format we can use the BlockSci libraries to create a so-called address object and, always using the BlockSci libraries, display all transactions received and made from this address.<\/p>\n\n\n\n

#extract bitcoin address and convert value to string\nCryptXXX_seed_address = str(CryptXXX_seed_address.iloc[0][\u2018address\u2019])\n#create the address object from the string\naddress_obj = chain.address_from_string(address_string = CryptXXX_seed_address)\naddress_obj<\/code><\/pre>\n\n\n\n
\"bitcoin<\/figure>\n\n\n\n

Before continuing, it is important to understand the methodology that allows us to associate different addresses to the same person. These are the conditions that we are going to check in order to be able to say that an address is linked to the ransomware attack or not.<\/p>\n\n\n\n

Disclaimer: I am not an expert on blockchain analytics. If you have any doubts or believe there is an error please feel free to leave a comment! \u00a0\ud83d\ude09\u00a0<\/p>\n\n\n\n

Methodology for Linking Addresses<\/h2>\n\n\n\n

In this section we will look at two blockchain-based heuristics (Common Spending<\/strong> and One-Time Change<\/strong>) that allow us to connect different addresses to the same actor. These heuristics that we will use to identify ransomware wallets have already been used in various academic research for clustering bitcoin addresses. To start, let\u2019s define a bitcoin transaction as a triplet of elements:<\/p>\n\n\n\n

t = (A, B, c)<\/em><\/p>\n\n\n\n