A Crypto-ransomware Detection Model For The Pre-encryption Stage Using Random Forest Algorithm
Abstract
Cryptographic ransomware is a challenging cybersecurity threat that encrypts the victim's files and demands a ransom in exchange for the decryption key. Traditional signature-based protection methods, such as antivirus and anti-malware, have proven in-effective at preventing crypto-ransomware attacks, therefore the production of ransomware is on the rise. Additionally, crypto ransomware incorporates advanced encryption algorithms causing irreversible effects even if the victim chooses to pay the ransom. Given the magnitude and variety of threats we face today, it is critical to have solutions in place to effectively analyse and detect crypto-ransomware attacks during the pre-encryption stage before encryption happens. Only if these threats are identified during the pre-encryption phase can they be adequately mitigated. Existing methods for early detection of crypto ransomware rely on a timing thresholding methodology to set the border of the pre-encryption stage. However, the fixed time threshold strategy, suggests that the samples begin encryption at the exact moment. This is not always the case since timing varies between crypto-ransomware families as a result of the obfuscation techniques used to evade detection. Furthermore, scarcity of data during an attack's initial stages reduces the ability of feature extraction algorithms in early detection solutions to discover attack features lowering detection accuracy. This research, therefore, proposed development of a Dynamic Crypto-Ransomware Detection Model (DCRDM). DCRDM monitors the pre-encryption stage for every case separately relying on the initial appearance of any APIs related to cryptography to establish the pre-encryption stage boundary, whereby features are extracted and used in training a prediction model using the Random Forest machine learning algorithm. The sample data was obtained from widely used ransomware repositories. The model achieved a detection accuracy of 98.6% with False Positive Rate of 1.9%.