I started to write a post about malware analysis in the context of OSINT and threat intelligence for a long time. It is one of the most widely used sources of information and a common goal of analyst research, but at the same time a technically complex issue. If we are talking about advanced static analysis (of the file itself) and dynamic (observing the behavior of the file after its launch), this is a topic not even for a series of posts, but for several books. For those interested in exploring the topic, recommended items can be found in the Reading Room section. At this point, however, I would like to approach the problem in a different way, so as to allow OSINTers to collect a lot of information about a malware sample, even if the assembly view in IDA Pro or Ghidra makes you look like that:
Fortunately, in the field of malware analysis, we can find many services that will both show us information about the file itself and allow us to understand its behavior. There are advantages and disadvantages to using third party providers. The main disadvantage is that we will only be talking here about samples that have already been found and shared among the analyst community. So if we are working on an incident and find new, unknown implants, uploading them anywhere is not a good idea, because in this way we will inform both the attackers and the rest of the world that we are dealing with this case. In the case of an incident response team, however, we will usually work with malware analysts and reverse engineering specialists who will thoroughly investigate the properties of the malware to support the incident. Here we will deal with another example. Let's say we are a threat intelligence analyst tracking the campaigns of specific groups. Since we want to stay ahead of the actions of our opponents, we will naturally focus our attention on attacks made against other entities, so as to provide as much information as possible to the producers of detection and monitoring in order to prepare for the attack before the attackers target our environment. So we want to take advantage of the power of OSINT, go beyond our environment and get information about the malware that has already revealed itself. In this case, using external services will help us collect data quickly, and because we will not have to collect the sample ourselves, we eliminate the risk of accidental infection. So let's go 🙂
As an example, we will use the HermeticWiper campaign, which was activated in Ukraine shortly before the invasion of Russian troops. It may seem funny, but if a malware or campaign has already been dubbed a specific name by the infosec community, I usually start with a Twitter search. In this way, I can quickly find reports, analyst comments and, above all, information that will allow you to locate the sample - usually hashes.
Once we have the sample hash, in this case from GitHub repository linked in the tweet above, we can go to any websites that have analyzed the files for us and will allow us to collect information. The most popular of them is VirusTotal. The website, which initially was mainly used to compare the effectiveness of antivirus programs and allowed to check a suspicious file on many websites, has grown significantly over time and will now show us a lot of information about the file itself, and even reports from sandboxes in which the sample was fired. The downside of VirusTotal is that the free version does not prevent you from downloading files. So let's enter the "com.exe" hash in the search engine and see what we get. The first page that will appear are the results of the file analysis by individual antivirus engines:
Sometimes this can provide some clues as to the file classification, belonging to a specific malware family. However, due to the limitations of detection systems and frequent mixing of signatures, I would be very careful in following this classification. Often the descriptions will not be particularly useful:
Much more interesting information can be found in the following tabs: Details, Relations and Behavior. In the first one, we find information about the file itself, such as hashes in various formats, analysis dates on VT and information related to the file structure. So we can also find information about compilation, electronic signature, if we find one, and the table of imports. Especially the last two items take us to the world of malware analysis and detection. Information about the electronic signature used will allow us to identify potentially related samples - using the same certificate, to determine whether the attackers managed to obtain a real certificate of a given organization and whether it is still valid. In this case, we can see that the validity of the certificate has been revoked by the issuer:
The table of imports gives an insight into the functionality of the available sample. Malware, in order to function effectively, most often uses the Windows API to interact with the system and perform operations. An insight into what functions are imported will allow you to roughly determine what the purpose of the software may be. Here you will find, among others, items related to Windows services and registry manipulation:
Sometimes function names are quite clear like RegOpenKey, but it's always worth checking what exactly it can do in official API documentation from Microsoft.
In the Relations tab, we will see derivative artifacts of the sample's activity - domains and IP addresses that it connected to or files that were dumped by it to the disk. It is therefore an ideal resource for threat intelligence analysts looking for C2 infrastructure and links with other malware, potentially leading to attribution to a specific group of activity. Information about the files that triggered the malware (execution parents) and dropped files stored on the disk by it, in turn, allows you to determine the infection chain and extend the analysis with files that precede the execution of the actual sample and additional files used in the course of the infection. We can also use the Graph function, which, similar to Malt's, allows for a graphical representation of the relationship between artifacts.
After all, the most interesting tab for people who would like to understand how malware works but do not feel confident in self-analysis is Behavior. Here we will find reports on the execution of the sample in various sandboxes, which will analyze and save the behavior for us. A particularly interesting feature of VT is that we often find the results of the work of various sandboxes, which will translate into a more detailed analysis. In the case of our HermeticWiper, we can find even traces of the list of disks from which the data was to be deleted.
Registry changes lists, executed commands, process trees or network activity are an ideal source of information both for creating behavioral detections and grouping activities. When analyzing and cataloging the behavior of samples, we no longer have to rely on signatures, judging by ourselves whether the behavior and characteristics of the file fit the given group.
After all, the last tab on VT are comments, their value varies, but it's always worth a look. Especially that sometimes we can find information from the person who uploaded the file, as in the case of those from the US Cybercommand, here is a comment on the X-Agent implant sample:
VirustTotal provided us with a lot of information, but unfortunately two important features are only available to paid users - the ability to download files and search for samples using YARA rules. However, these functionalities are also available free of charge for registered users, in HybridAnalysis. It is also a website that offers a database of information about malwares, and it is based mainly on the original Falcon Sandbox (the owner of Hybrid-Analysis is CrowdStrike). So let's look for our hash again, this time in HA:
There is one hash but we found three results - as I mentioned HA is based on its own sandbox, so the items refer to different sandbox configurations as we can see in the Environment column. Click on the middle record and see what the report looks like. The first thing that catches your eye are the aforementioned additional functions related to the possibility of sampling and even dump of memory and captured packets during network communication. Hence it is very valuable source if we want to continue with more in-depth analysis:
As Hybrid-Analysis is a sandbox rather than a tool for "dry" listing of file information, the data we obtain here will be closely related to potential malicious and suspicious activity. So we will see what techniques commonly used by malware are used by the sample, what information it collects and how it tries to modify the system. Below you will also find a list of text data in a file (strings) - which is usually one of the first steps of a malware analyst. Often, many variables and information are compactly contained in a file in unencrypted form and looking at them can provide clues to functionality and purpose. As before in the report of one of the VirusTotal sandboxes, we saw traces of the list of disks that were to be cleared of data, here in the file itself he can find where the variable is passed to list the next disks:
Another feature that is available in Hybrid-Analysis for free, unlike VirusTotal, is the ability to search for samples using YARA rules. YARA is a tool that allows you to classify and search for samples based on file characteristics, such as size, and the data it contains. I will devote a separate post to writing the YARA rules because it is one of my favorite tools, but now let's see how it works in practice on the example of a rule built on the basis of the above "PhysicalDrive%u":
We start writing a rule from the name and then specify its content in curly braces. The first meta section contains data that helps you understand the rule, but does not affect the scope of the search itself. This is not a mandatory section, but adding it helps to understand the context and is highly recommended. Next in "strings" we specify the strings we want to search for. YARA does not limit us to "plain" text data, we can also specify hexadecimal data (ie popular hexes) and regular expressions. Here, however, I have limited myself to including the fragment we found earlier. Additional modifiers "nocase", "ascii" and "wide" ensure that the search will not distinguish between lowercase and uppercase, and the search will include text encoded in ascii and unicode. Finally, the "condition" section determines how the data contained in the previous section will be used. Search for any value in the strings "any of them" section in addition to the any statement. I added two more parameters:
- uint16 (0) == 0x5a4d may sound very puzzling, however it just means that the file is to start with the characters "MZ" identifying the executable files.
- filesize> 100KB means that we are looking for files larger than 100KB.
We copy our rule to the Advanced Search field:
It is worth noting that HA allows you to add parameters such as file size from the search menu, but I prefer to include them in the rules themselves so that they are not dependent on the external properties of the search engine. YARA rules are commonly used - we can even scan files on disks or memory images, so it's worth including everything you want to use as a search element in one place. So let's look at the results:
In the public HA database, we found 38 files, additionally HA, encouraging the purchase of a commercial version, informs that in this case we would see an additional 1050 files. As for free tools and a simple YARA rule based on text data, it is pretty good 🙂 We can now start browsing the files for similarities in functionality or connection with the network infrastructure and thus expand the database of related implants.
As we can see, using open sources and sample databases, we managed to obtain quite a lot of information, and most importantly, we can start looking for links with other files that are part of the same campaign, or simply belong to the arsenal of the same group. Since we can get so much data online, are the skills in the field of malware analysis losing their raison d'être? Of course not - first of all, as I mentioned at the beginning, these techniques apply to already known and analyzed samples. For threat intelligence analysts the most valuable will be those that have been detected as part of the incident analysis, but are not yet widely known, and therefore are more likely to be in use all the time. Therefore, creating a way of detecting them (such as the YARA rules) gives the defenders a significant advantage. Secondly, we are dependent on the technical capabilities of sandboxes or other methods of detecting file properties. Such automation may not be sufficient if the malware uses more advanced analysis avoidance techniques that require manual intervention, for example, to modify the file so that it executes in the environment. However, if we are looking for new threads of an already known activity or we are expanding our knowledge about malware found in our own environment, then the use of open sources is highly recommended.
We can now start browsing the files for similarities in functionality or connection with the network infrastructure and thus expand the database of related implants.