One of the most common tasks related to OSINT and threat intelligence is the analysis of Internet domains in terms of infrastructure behind them and information about entities responsible for their creation. Domains are an important element of cyber operations, when they can be used for C2 communication, malware delivery and information operations, delivering crafted content. And certainly their analysis is an important element of threat intelligence in the analysis of security incidents, and allows you to expand your knowledge about the attacker,which try to map the actions taken, for example, on the Diamond Model. In a more general context, information about the domain owners, services that are available on the server or where they have been registered, helps in finding links between the activities and intentions of the authors. For example, allegedly unrelated websites promoting certain content may be the work of the same organization, and tracing common points in the characteristics of domains may help in discovering the true scale of activity.
Due to the important function of domain analysis, we have many tools at our disposal - including many free ones. On the other hand, it is impossible not to notice that the scope of information has changed in recent years, which necessitates changes in the search methodology. This is related to the introduction of GDPR, which forced domain registrants to hide personal data related to registration, but also an ordinary arms race between analysts and domain registrants. Regardless of the legal regulations, the practice of using private registration that hides the registrant's details, using web hosts that are not of much interest when used by malware operators, and so on has become commonplace. As a result, many domains used for hostile activity have similar general characteristics, and analysts have to consider more elements such as DNS servers, TLS certificates, or the syntax of the domain name itself in order to find associations. Getting down to business, we'll take a look at the domain information we can collect and the tools to help with that.
Starting from the basics, a domain is a string that identifies resources via DNS. So when we type the address into the web browser, it queries DNS servers for records that allow the connection - such as the IP address. Using the collected data, the browser allows us to interact with the resources using names that are easy to remember, such as wikipedia.org. In order to obtain a domain and be able to edit its properties so that it directs it to the resources selected by us, we must register it using the services of a domain name registrant accredited for a specific top-level domain (e.g. .pl). After registration, we can edit its records, such as A (IPv4 address), AAAA (IPv6 address) or MX (e-mail server), directing visitors to the appropriate resources. This is, in a very simplified manner, the mechanism of operation of domains, now let's see how much information about the domain can be obtained from the outside.
The most basic tool is WHOIS, i.e. a system for obtaining information about domain registration data. In the simplest version, we can use the whois tool in the Windows or Linux terminal and see the data about the domain we are interested in:
On the example of wp.pl, we can see basic information such as name servers, when was it created and when to renew the domain or the address of the registrar. We also see that the domain is not using DNSSEC, i.e. an extension that enables cryptographic verification of data correctness and their origins. If we do not want to use the tool locally, we can use one of the many services that will ask for information for us, for example: Domain Tools. In addition to the dry record, Domain Tools will also show the domain profile enriched with additional information:
So we also see here the IP address along with the information on how many pages are hosted there, the physical location of the server, ASN and server type. Domain Tools provides such data because in addition to the usual WHOIS query tool, it also offers a more advanced analysis tool - Iris - so it provides a foretaste of the possibilities here. However, let's stick to the generally available information for now. As I mentioned earlier, the configured DNS records determine where the domain will direct us. So we can look at the information collected in them using, for example, mxtoolbox.com. One of the more interesting values is TXT - it allows the owner to add any text values. It was originally intended to add human-readable comments, but nowadays we can often find data used by external services. Let's take my own domain as an example - if you are following me on Twitter, you know I use the nickname lawsecnet, and the mailbox kamil.bojarski@lawsec.net. A look at the contents of the TXT field will in this case determine that the service provider for my mailbox is ProtonMail:
There we will find the key that I received from ProtonMail and put it in the DNS record to confirm that I am actually the owner of the domain. If we go back to our previous example from wp.pl, we will also find similar entries there:
This time, however, relating to the services of Facebook and Google. In both examples we also find an SPF record that protects against spoofing the sender of e-mails and identifies servers authorized to send messages on behalf of the domain.
Another noteworthy values are the MX and NS entries, responsible for the e-mail servers and name servers for the domain, respectively. In the practice of threat intelligence tracking the infrastructure used in attacks, the values contained in these records can often serve as a starting point for finding potentially related domains. For wp.pl, the records look like this:
For a long time my favorite tool for analyzing network infrastructure was PassiveTotal, also available in the free Community Edition. Unfortunately, however, the capabilities of the free version have been severely limited in terms of how far back and what data is available, which has significantly lowered the value of this source. However, it may still be worth taking a look at the possibilities that it can offer us (for free after all 😛). One of the most characteristic elements of the interface is a calendar showing when the domain was available and to which addresses it led:
Below you will find tabs that lead to views that show detailed information. Here, however, we can experience a slight disappointment, because, as I mentioned, the possibilities of the free version are very limited. However, the following views are still available:
- Resolutions - IP addresses to which the domain directed.
- WHOIS - WHOIS data preview. In the paid version, we can also track how and when the data contained changed.
- Certificates - SHA-1 hashes and TLS certificate data for the domain and subdomains. A very useful tool in the work of a threat intelligence analyst, because the reuse of the certificate allows you to track the creation of C2 infrastructure - here is an example from ThreatConnect analysts.
- Subdomains - subdomains, also very useful when we confirm that a given domain is interesting for us and we want to take a closer look at what it hides.
- OSINT - information whether the domain was found in public sources, e.g. reports describing malware campaigns.
- Hashes - hashes of files related to a domain, a very interesting source in the field of finding connections with campaigns monitored by us.
- DNS - as the name suggests, a set of DNS records for a given domain.
A tool such as PassiveTotal or SuperToolbox works well when we need to check a specific domain, but if we compare many domains and search for new ones, for example by linking name servers or used TLS certificates, it will be much more convenient to use a tool that will graphically present the results. Maltego, which I already described in one of the posts, can be such a tool. The great advantage of Maltego is that we can combine various sources and use several services simultaneously, which will allow you to avoid manual writing of data to an external database, such as an Excel spreadsheet. So we add the domain object and right-click to list the available transformations:
Starting with the basic ones, supplied from Maltego itself, we can, for example: search for name servers for a given domain, discover other domains using the same server, and then find e-mail servers and names for the newly found domain:
and using the PassiveTotal module we add a list of subdomains:
Maltego works here as a notebook, constantly adding new information to the chart, which is certainly very convenient, but the real power of such an analysis lies in the possibility of querying several domains simultaneously and capturing common points. Let's take three domains as an example listed as being used by APT35 in the Checkpoint report. After adding them to the chart and performing the "To entities from WHOIS [IBM Watson]" transformation, we immediately see common points:
Then we can compare the data with another source, this time we will use the '[PT] Get Whois Details' transformation from PassiveTotal. For clarity, it shows only a part of the chart:
The black globe icons with a question mark represent the objects added by PassiveTotal. We can see that we found an additional connection here through the name servers "ns7.dynu.com" and "ns1.dynu.com". Now that we have common points, we can search for other domains not mentioned in the report. Using the reverse transformation, we find domains that use the "ns7.dynu.com" server:
Then we perform searches on the domains thus found, collecting the data again by "[PT] Get Whois Details". Now the graph looks quite intricate, let's see its simplified view:
Despite the large number of objects, the contact points of groups of objects are clearly visible, which indicate common elements. In our case, as we can see, these are name servers and the country of registration:
Since domains are identifiers of network infrastructure elements, we can also analyze what is behind them in the context of what resources are behind them - in a similar way as I have already described on counterintelligence.pl. So the next step might be to use the transformation from Shodan and get an IP address for each of them. As we can see in this case, however, these addresses are not tangent points (objects marked with external bold):
In this way, we have looked at how, using free tools, it is easy to find a lot of information about domains and find links between them. At this point, however, the methodological aspects should be mentioned, which in my opinion are no less important than the technical ones. Maltego greatly accelerates finding common points, but an attractive form of graphical presentation of the results can often lead to overinterpretation of data and searching for a connection that in reality does not exist. Domain analysis is particularly susceptible to this trap, because by performing subsequent transformations, we can extend and extend the scope of the search until we find a connection with any element of the infrastructure in the world. And even on a smaller scale of the investigation, we should carefully investigate whether it can actually be concluded that the items are related. The above graphs are a good example of this phenomenon. We managed to find a number of other domains using the same name servers, but if we follow APT35 campaigns, can we actually consider them group related? Probably not, these are regular domains that are registered with the same provider and therefore use these servers. Similarly, the fact that domains lead to the same IP address most often proves that it is the address of the hosting provider's server. To establish true connections we need to find more common points, preferably a repeating pattern. It is best to start with domains of the nature of which we are sure (e.g. all of them were found during C2 communication analysis of malware samples) and create a profile consisting of individual elements that, in our opinion, resulted from a specific decision of the domain registrant. It can be a naming convention, place of registration, registrant, and so on. Thus, for the purposes of the analysis, the domain ceases to be a single object and becomes a set of characteristics that make up the "fingerprint". These concepts were perfectly presented by Joe Slowik in the post "Analyzing Network Infrastructure as Composite Objects". This approach will allow you to avoid chasing apparent relationships and falling into the rabbit hole of fancy connections. So let's go back to our example domains and see how we could build such a characteristic:
- Naming Convention - All domains begin and end with "0", consist of two English words, and are embedded in the TLD "xyz".
- In the WHOIS data in the registrar and organization fields, we will only find the word "text".
- The states of the USA are indicated as the city of registration, even though in the case of one of the domains the country was defined as Germany.
- Common name servers ns1.dynu.com and ns7.dynu.com.
By combining these four elements, we obtain a much more detailed description that we could successfully use for further research.