Improving Technical Interviews

I have conducted easily over 100 interviews during my career, almost all of which have been technical, and I have seen some great and terrible actions from both sides of the table. In this post, I want to cover off a few hints, tips and tricks that will help candidates and employers to get more out of their interview experience.

There are many sites out there offering questions and answers at varying levels of depth. I am hoping that by reading this you don’t just get the answer, but an understanding of why it is being asked in the first place. This knowledge benefits both the interviewer and the candidate.

We will focus on technical interviews for this post, with some fun horror stories added to make sure you keep reading!

Let’s get some personal annoyances out of the way early:

Memory Tests

I have a shocking memory, so I hate asking people to recall data, as I think it adds very little value. For example, asking for specific port numbers for applications (which is silly anyway, but I digress), or asking for Event IDs.

Ask yourself ‘What understanding does the candidate need to demonstrate?’

Using my one of my examples above, let’s try to make a better question by looking at what we need the candidate to demonstrate:

  • List the ports associated with <insert network application here>

Chances are you are trying to see if the candidate has experience with monitoring networks and understands that certain common network applications may be vulnerable, or mis-used for attacks. So how about instead asking

  • Can you talk me through some common network applications, specifically any that can be mis-used by an attacker

This is an open-ended question that passes the power to the candidate, they can now answer to their strengths and may even have some good examples of a time when they caught a Zone Transfer going to a workstation, or how they spotted lateral movement over SMB using PSExec with legitimate credentials. It can also be built upon by the interviewer to dig out more details and to push the limits of the candidates technical knowledge.

Which is much better than answering “Port 443 for SSL assuming no proxy in use”

Gatekeeping

No one likes gatekeeping, stopping someone based on silly requirements. What am I talking about? These questions:

  • Tell me about your home lab?
  • What Cyber stuff do you do in your spare time?
  • Do you have a blog?
  • What language do you code in?

If you are asking these questions, there is a good chance you are (unintentionally) gatekeeping. I have asked the first two questions in interviews previously, until it was pointed out to me that the security industry has changed, and we need to change with it.

We no longer purely need the uber geeks who live, eat, sleep and breath ‘Cyberz’, you don’t need to know a coding language to provide value to this industry.

It is OK to employ someone who sees this as a job and not a lifestyle.

The Technical Interview

I will skip the standard interview etiquette, as that has been done many times and isn’t specific to the security industry. All I will say is be on time and be professional.

A quick caveat to these questions is that my background is blue team, I will try to keep the questions as neutral as possible, but it wouldn’t be very professional of me if I didn’t recognise my own bias.

And now for the part you all scrolled down to (damn you!)

Questions!

I will put down notes to some of these questions as I go, they are not meant to be definitive answers, but should be research prompts and talking points depending on which side of the table you are. Some questions may have answers, but it’s my blog, my rules.

I have also added “Other discussion points” as ideas for what else can be talked about. These aren’t required and are more for if the candidate is stuck, or absolutely annihilating your questions.

These answers/talking points assume the interview is for a highly technical role, and not a management, or less technical position.

Where do you go to get information on vulnerabilities/exploits or to keep up to date with security?

This one is not gatekeeping, it is making sure the candidate knows where and how to get information. For example if a new vulnerability is found a SOC analyst would need to know what software is effected, how to detect any potential attack as well as understand if something is a false positive.

A pentester would need to know more in-depth information on how the vulnerability works to build an exploit, or look for proof-of-concept exploits to use as a baseline

A ‘bad’ answer for a technical person would be to simply state a news site. If someone is applying for a technical role and answers “The Register”, or worse yet “Instagram” (yes someone said Instagram to me once) without going into context as to why then these are bad. Not because the source is poor or disreputable, but because they are designed to be easily consumed at a less technical level.

Answering with “I get high level information from The Register, then follow the source links, or further reading links to get in-depth technical data” is much better.

  • Other discussion points:
    • CVE scores.
    • POC code use.
    • Reliability of blog posts/news articles

What is the difference between an ‘exploit’ and a ‘vulnerability’

This is a base level question in my mind, we talk about these on a daily basis and I would expect a technical candidate to understand the difference between them. Explaining it can be tricky, so don’t mistake someone struggling to find the right words with not understanding.

If the person is really struggling try asking pointed questions like “If I left a window open on my house, would that be considered a vulnerability or an exploit?”

You can change this around to something more like “If my window lock was found to be defective….” as that’s technically closer to being a ‘vulnerability’.

  • Other discussion points:
    • CVE Scores.
    • POC Code use.
    • Is the misconfiguration of a system a vulnerability?

Encoding vs Encryption. What are they and in which order would you apply them to a file

This can be separated into two questions, as the second half can push people into thinking they are for files only.

Encoding: The candidate could talk about sending unencoded data across the network and the restrictions that would need to be put in place to stop the data accidentally being interpreted as a command by the application. SMTP being a prime example, and its use of Base64 for attachments.

Encryption: The candidate should talk about how encryption protects data and can only be unencrypted by a the intended owner, or key holder. Talk about the different types of ‘key’ whether that be a password or a public/private key pair. Try not to get too bogged down in specific methods of encryption unless it is part of the role.

  • Other discussion points:
    • SSL vs HTTPS.
    • SSH Certificate handling.
    • Tools to decode (CyberChef).

What is the difference between an IDS and an IPS and which is better?

IDS stands for Intrusion Detection System, IPS is Intrusion Prevention System. The acronyms do give a big clue to the answer, but if this is a junior role you can talk instead about if prevention is better than detection. The easiest answer is to say “yes of course”, but what happens if the system is not finely tuned and you are flooded with False Positive preventions?

This can lead into business risk acceptance conversations, or stay at a technical level

Often the difference between IDS mode and IPS mode is a tick box on the device itself. If you are talking Network based IDS/IPS then you can also discuss if the device is in-line or mirrored. Most systems are in-line now, but historically the risk was deemed too great and security products where given a copy of the traffic rather than being able to manipulate the traffic directly.

  • Other discussion points:
    • True/False Positive/Negative.
    • Network architecture and sensor placement.
    • Business risk and risk acceptance.

Which is better, network based security or host based?

There are a few “it depends” style answers to this question, my typical go-to on this is that all network traffic needs to originate from somewhere and is being sent somewhere. Meaning host based is superior. However, not all host based technologies work on all devices (think printers, or routers). Meaning if you purely chose endpoint you may be blind to someone attacking a device that can’t have host based monitoring.

A good candidate will explain that this shouldn’t be one or the other, but how to implement BOTH network and host based security. This naturally leads into ‘Defence-in-Depth’ models where no single security product should be relied upon. Even users are considered part of model; with the right training and support, users can report when something isn’t right.

  • Other discussion points:
    • How to monitor for lateral movement.
    • How to secure ‘non-standard’ devices SCADA/IOT etc.
    • Defence in depth.

What is the difference between a Vulnerability Assessment, a Penetration Test and a Red Team Engagement?

This is less about the dictionary definition of each of these, but instead the understanding of depth. A VA will typically be very superficial and some security products will ignore it as noise.

A penetration test (pen test) is often done from a position of strength, meaning the tester may have been given an account with a high level of privilege and are testing a specific target. The scope of a pen test varies massively, but it is important to understand that these are done in a controlled and deliberate manner.

A Red Team Engagement, or a Black Box test are more akin to what an attacker would look like and have the highest potential risk for a business. However, they also provide the most accurate picture of how an attacker would get access to the environment. These are also the types of engagement where physical access may be attempted to the customer site.

A non-technical customer may use these terms interchangeably, but as a security professional, we need to understand there is a difference. Even if they do overlap on a Venn diagram.

  • Other discussion points:
    • Value provided by each of these engagements
    • Detection options for someone with physical access
    • Scope creep and out of scope detections

What are the different ‘families’ of HTTP response codes and give some common examples? Also explain which could indicate an attack or preparations for an attack?

Informational responses (100–199)
Successful responses (200–299)
Redirection messages (300–399)
Client error responses (400–499)
Server error responses (500–599)

The above is from the Mozilla Developer blog, and is a memory test, which we have already said we don’t like. So instead, make this list a bonus answer, but provide it to the candidate if needed and get them to explain which could indicate an attack, or recon, and why.

There are a couple of options on this one, and it is mostly based off context. For example seeing a ‘404 Not Found’ response to a request is nothing of real interest, but if there are thousands of them from the same IP address, then something weird might be going on.

Same if you are seeing a ‘500 internal server error’ on your database server, this may be an IT related problem, or someone may be attempting to compromise that server.

You would need to be looking at the log files and looking into more detail around what was actually being requested.

  • Other discussion points:
    • Webshells
    • ‘Low and slow’ attacks vs ‘smash and grab’
    • Weblogs

Can you list some artefact locations, or methods, that an attacker or malware would go to maintain persistence on a host?

This question is a borderline memory test, however it is important for both Red and Blue teams to understand in order to do their job well.

Registry, Services, Scheduled Tasks, LNK Files, Startup Folder, and WMI are the most common ones. and can each be broken down. You can also talk about user initiated persistence like rogue accounts, or infected removable media devices. There are also Kernel level attacks like Rootkits or BIOS tampering/replacement.

This is less about getting a whole definitive list, and more about understanding why persistence is used. For a defender, seeing that malware re-infect, or the attacker popping back up means that something has been missed. For an attacker if the victim reboots the host, or applies patches, how will they get back onto that host?

  • Other discussion points:
    • Persistence via the network (WMI/Active Directory/Remote Accounts)
    • Windows Registry in general
    • Asking the ‘first place’ the candidate would look for persistence

Explain, in your own words, how a DoS attack would work. Specifically how the attacker would hide their identity and how they could ensure a larger payload was received than was sent?

Most people are aware that a Denial of Service (DoS) attack is generally about sending too much data to the victim in order to overload the system. But less people understand how this is achieved. For example there was a vulnerability in a web server (I forget which one, sorry) where if data was put into the initial GET request it would cause excessive load on the webserver, meaning this required very small amounts of data to be sent and didn’t need a huge botnet or other forms of Distributed Denial of Service methods.

The less subtle types of DoS is a Distributed Denial of Service (DDoS) attack. This typically means overwhelming the target with ‘legitimate’ data.

The attacker may also used a reflected and amplified attack. For example you could send a DNS request to a server with a spoofed source IP meaning the response would go to your victim (who you spoofed). This is a reflective attack. The amplification comes in the type of DNS request you would send, for example a DNS request with ‘any’ as an argument will provide all DNS names at that level. Meaning the response will be larger than the request.

Any service that uses UDP and provides a response larger than the request can be used in a reflective amplification DDoS attack.

  • Other discussion points:
    • Ways to detect if your systems are being used as amplifiers
    • Mitigations against DoS attacks
    • Which services should be publicly and anonymously accessible

What are the limitations to using ‘atomic’ IOCs (IP addresses, host names etc) and what can be done to overcome these limitations?

IOCs like IP addresses, domain names, file hashes are by their very nature historical, or backwards looking. Meaning that we are only looking at what has been seen before. They are also trivial for the attacker to change.

Instead identifying ways to look for the Tactics, Techniques and Procedures, or TTPs of the attacker allows us to look forward and proactively detect a new attack.

If asked this in an interview, respond by talking about what the attacker does that is less likely to change. What tools do they use, and what artefacts are left behind when they use them. Does the attacker favour a particular persistence method? Or do they put their tools into a specific folder? These types of indicators are still subject to change, but are less likely. If the attacker needs to use a different tool each time they attack, they will run out of tools very quickly. Instead they may iterate their current tooling, which will give an indication of timeline.

Everything an attacker does can and should be noted. If it’s relevant enough it can then be turned into an indicator and used against them.

For more advanced teams there is also the ‘Indicator of Attack’ which is looking at commands or actions taken to see if they constitute a malicious act. For example ‘whoami’ alone is a weak IOA, however ‘whoami’, ‘locate nc’, ‘ip addr’ and ‘netstat’ all being run in short order certainly raises the suspicion level.

  • Other discussion points:
    • Retroactive hunting for IOCs
    • Threat Intelligence methodologies
    • Sharing of TTPs responsibly

If you were to ping Google and watch the response on your host in a network capture tool, such as Wireshark, whose MAC address would be seen as the ‘sender’?

This one is a bit of a mind twister, but basically we are seeing if the candidate understands networking. The sender’s MAC on the return trip would be the last ‘hop’, or last routing device.

A basic switch does not have a MAC address for each port, but a router would have.

The answer that a lot of people typically give is “It would be Google’s MAC address”. Which is incorrect, it would be Google’s IP address, but not their MAC address.

We aren’t going to talk about load balancers or any other type of device here, as this is a thought experiment question, rather than a network architecture question, we use ‘Google’ as it requires no explanation, but feel free to swap this with something more unique to the role if needed.

  • Other discussion points:
    • Other information in the packet
    • How the TTL field can help identify the OS of the attacker
    • Other network tools such as Tracert

If you were to hack into a company, how would you do it?

Depending on the role this can be made simpler or more difficult. For a junior role, you would say that no patching is done and every user is a Domain Admin. This is checking to see if the candidate has a base level understanding of what tools could be useful.

For more senior levels, throw in a fully patched environment and escalate difficulty as needed if the role is for a pen-tester.

Dig into each step only if where appropriate for the role you are interviewing for. For example a junior SOC Analyst most likely doesn’t need to know which command line arguments are needed to make Mimikatz work, especially if the SIEM only collects Windows Event Logs.

Typical answer at a basic level would be to look for publicly facing, exploitable, systems such as websites or exposed RDP servers. You could also start with a phishing attack or other social engineering attacks to get that initial foothold.

From here following something like the ‘Cyber Kill Chain’ would give you the process, and looking at pen-test tooling such as the Metasploit Framework, Nessus or Mimikatz to name just a few.

  • Other discussion points:
    • Objectives of an attacker
    • What if this was an inside threat?
    • Exfiltration of data and how to detect it

If malware is seen attempting to execute on a host, talk about what actions you would take?

This one is a little ‘woolly’ by design, it is open ended in the sense that the obvious route to take is to remediate the host. However we said the malware was blocked.

Questions should be asked at this point; what details do you have? For example, if this was blocked from the user’s downloads folder the chances are this is a quick chat with the user and delete the file, did it run from a temp directory? If so which one? Was it associated with a Zip file, Outlook, Web Browsing?

Understanding how the malware got on the host in the first place is more important in this scenario than removing it. This gives the opportunity to talk about forensic artefacts like Alternate Data Streams to see where a file was downloaded from, LNK files if this was a masquerading as a document.

There are very few questions that give you insight into the analytical mind, so when a question like this is presented it is important for both sides to understand why it is being asked and probe into the possibilities of the answer.

  • Other discussion points:
    • What if it wasn’t blocked?
    • When should a host be re-imaged vs remediated
    • Is there ever a time when malware should be allowed to run?

Question ideas

We have gone through a number of questions (I was aiming for 10, then I stopped counting). We could continue infinitum with these questions and we could even take each question and spin it a different way, but let’s face it. No one wants to read that much!

So instead I will drop some questions below that you can use in interviews, or lookup answers for. All I will say is try to keep in mind the point of this post; to understand WHY you are asking or being asked these questions. Rather than just simply dropping “answer = this”

  • If you were to steal data from a company without being caught, how would you do it?
    • How would you mitigate this if you wanted to catch yourself?
  • Talk about why we use hashes for files
  • Talk about password hashing and how they help prevent password cracking
    • What is a rainbow table and how can it be mitigated?
    • Are rainbow tables still a viable option for an attacker?
  • Give some examples of how you could prove a user (opened a file || executed an application)
  • What is the NTDS.dit file?
    • Which registry hive would you need if you wanted to steal this and crack it?
    • How would you mitigate the theft of this file?
    • How would you steal this file as it is always locked?
  • Talk about methods of lateral movement and how to detect them from a host and network perspective
  • If a system cannot have security patches applied, how would you mitigate the risk to this system?

Conclusion

Interviews are a way to ensure the right person is being put into the right job. This works for both the candidate and the employer. This is why the correct questions are critical, they need to be relevant to the role in order to tick both of those boxes.

Look at what you ask your candidates and make sure you are asking the right question

Just to make you feel old, “i, Robot” was released in 2004. 18 years ago…. if you got this reference, you are old like me. Congratulations 🙂

Posted in General, Interviews | Tagged , , , , | 1 Comment

AnyDesk Forensic Analysis and Artefacts

Click here to view this research on my YouTube channel

What is AnyDesk?

AnyDesk is a legitimate, non-malicious piece of software that is used by companies world wide to manage their IT systems, and can be used for free to help family members with IT issues.

It is the scammer’s tool of choice because it is very easy to download and set up. There are also several options within AnyDesk that will allow the attacker to lock the victim out of their own machine, giving complete control to the attacker.

AnyDesk have taken steps to present the victim with alerts, and even disallow certain actions by the remote device, but the scammers utilise social engineering techniques to bypass these easily.

AnyDesk can be downloaded from the AnyDesk website here

Technical Aspects

AnyDesk can be run in two ways; installed like a tradition piece of software, or as a portable executable (PE). The way the software is run, will dictate the artefact locations on the disk. However the artefacts discussed here will be the same in both locations.

PE

C:\Users\<username>\Appdata\Roaming\AnyDesk\

Installed

C:\ProgramData\AnyDesk\

To check which way this application is being run, look at the file properties of the AnyDesk application. If this is from a scammer, it is most likely going to be in the Downloads folder, or the Desktop.

GCAPI.DLL

GCAPI.DLL is required for AnyDesk to run and for PE version may be seen in the same location as the executable, or will be seen in the Temp folder.

For user temp (PE) you will be looking by default in

c:\users\<username>\appdata\local\temp

For system temp (installed) files you will be looking

c:\windows\temp

AnyDesk will search for this file and download it if it is not already present. Later versions uses hash check due to search order DLL vulnerabilities, which will be covered later.

 Connection_trace.txt

The Connection_trace.txt file shows incoming requests with a status of

  • User – for the local user accepting the request
  • Passwd – a password being entered by the remote system
  • Token – a ‘remember password’ check box being checked by the remote system
  • REJECTED – if the local user has rejected the request

The ‘Passwd’ and ‘Token’ settings are only relevant if the local system has set a password for unattended access. It is important to note that this is for incoming requests only, and does not display any outgoing connection data.

This provides you with time stamps and IDs to help focus your investigation and give you that initial hook that the baseline of a DFIR investigation. Below is an example that I have redacted to stop you all trying to connect to my lab!

Connection_trace.txt

User.conf

The user.conf file is useful if an attacker has attempted to transfer files. You will get to see a directory path which, by default, will give away the username.

The connection from this host was ‘totally not a hacker’ which we can see is an alias for the local admin account.

This tells us a couple of things, for one the account ‘local-admin’ is not created by Windows, which means this account was intentionally created with that name, and based off the name is most likely a local account and not part of a domain.

It also shows that the attacker is running (most likely) as an administrator on that host. This type of information helps to build a bigger picture, and when law enforcement are involved these small pieces of evidence help them to prove that this system was the one the attacker used.

Ad.Trace

The ad.trace file is a verbose log and will be where the majority of effort will be spent. Use the timestamps and IDs from the Connection_Trace file to focus your investigation.

If this host was a scamming victim that has never used AnyDesk before, this file will still be heavily populated, but at least the investigation will be simpler.

So what can we see in this file?

Well straight off the bat we can see the version of AnyDesk that is installed. This can be important if the attacker is intentionally using an old version. We can also see the version of Windows used by the victim. Which is more useful if the host is not available to the forensicator. Finally on this screenshot we also can confirm that AnyDesk was not installed when it was run.

When looking at the version of AnyDesk the victim was running, we need to ask; why use that version?

Maybe it has features that don’t exist later? Maybe there are security vulnerabilities they can exploit? AnyDesk has worked to make the attackers life more difficult by adding features to highlight scam connections and disabling certain features. If an older version is being used, maybe it pre-dates these features.

Next up we look for the ID we picked up in the Connection trace file that was active at the time we need to investigate

First up 40410xxxx, which has been redacted for privacy (I don’t want you connecting to me… sorry.)

As previously seen, the alias of this is ‘totally not a hacker’, this alias field is typically the same as the local username, meaning this person has knowledge of the tool and what information is presented to the victim. While this alias is obviously done in jest, it could easily be ‘Microsoft Support’ or whichever agency the attacker is attempting to pass themselves off as.

This also shows us the remote version of AnyDesk and the Operating System in use

When setting this up I created an external AnyDesk system, rather than simply being on my own LAN, this was because I could see references to my own public IP address, but couldn’t prove if this was an artefact of the victim or attacker.

This screenshot shows the external IP address of the attacker, meaning we are able show where they connected from.

Obviously the attacker could take steps to use a remote proxy device or a VPN, but this is still another artefact to add to the collection that we are building around this user. If this is not their IP, could the IP owner be contacted? Are they aware that their systems are being used in this way? Do they keep logs?

Ad_Svc.Trace

As the attacker in this scenario forced a reboot of the remote machine, we are presented with a new artefact, the ad_scv.trace file. Typically you would only expect to see this on an installed version of AnyDesk, because it is the log file for the AnyDesk service.

We are able to see that the restart was initiated remotely and the volatile service was created on the victim machine.

What does this mean? Simply means that the attacker set up unattended access on the victim machine and then restarted the host. This unattended access means the attacker can connect at anytime by using a password they set, or tricked the user into setting, on the host.

Next we see that the attacker reconnects with the same ID and same IP address. This could prove useful if the attackers are using a ‘speak to my manager/support/customer retentions’ method. It will show if that person is using the same host, or is using the same IP address as the initial connector. This may also prove useful if the victim reports unusual activity involving the mouse moving on its own, or other indicators someone has control of the host.

Chat

Anydesk has a built in chat feature which allows the remote and local user to communicate over text messages. The log of this file is held in a subfolder of the AnyDesk directory

This will have a text file named as per the client ID of the connecting AnyDesk. In here will be both sides of any chat conversation that was had. This could be very useful if the attacker used that to communicate with the victim. However most scammers will use the telephone.

These chat logs may not be used too often by Scammers, but they may prove useful in other forensic investigations.

Thumbnails

Finally there is the Thumbnails folder, which keeps a small image of any machine connected to from this AnyDesk session, typically this will show the wallpaper in use by that machine, however is only available when an outgoing session has been established.

This would prove most useful to the law enforcement agencies whose job it is to prove the machine they are investigating is the one responsible for the connections.

Vulnerabilities

Earlier in the post we talked about using a specific version of AnyDesk and what that may mean.

There have been several vulnerabilities relating to the product, but mostly these would not be of interest to a scammer. Instead these are more interesting if you see old versions of AnyDesk downloaded as part of a malware campaign, or an attack. Especially if the GCAPI.dll file is provided as well.

Below are a list of CVEs relating to older versions of AnyDesk that could be exploited. The local privilege escalation in CVE-2021-40854 is trivial, but requires GUI access to the chat logs. the tl;dr is that you open the chat logs in a SYSTEM level notepad by default, then use File > Open to run cmd.exe.

CVE-2020-13160

AnyDesk before 5.5.3 on Linux and FreeBSD has a format string vulnerability that can be exploited for remote code execution.

CVE-2020-27614

AnyDesk for macOS versions 6.0.2 and older have a vulnerability in the XPC interface that does not properly validate client requests and allows local privilege escalation.

CVE-2020-35483

AnyDesk for Windows is vulnerable to a DLL hijacking attack, where an attacker places a malicious “gcapi.dll” file in the application directory and then sets the read-only attribute to prevent the file from being overwritten. AnyDesk will then attempt to load this DLL file when executed.

If an attacker can place a “gcapi.dll” file in the application directory, AnyDesk will run any malicious code contained in that file. The code will run with normal user privileges, unless the user specifically runs AnyDesk as administrator.

Note: this is especially critical for the portable version of AnyDesk, which is likely to be started from an unprotected non-system directory like the browsers download directory.

The vulnerability affects AnyDesk for Windows from versions 5.4.2 to 6.0.8.

CVE-2021-40854

AnyDesk before 6.2.6 and 6.3.x before 6.3.3 allows a local user to obtain administrator privileges by using the Open Chat Log feature to launch a privileged Notepad process that can launch other applications.

CVE-2018-13102

AnyDesk before “12.06.2018 – 4.1.3” on Windows 7 SP1 has a DLL preloading vulnerability.

CVE-2017-14397

AnyDesk prior to 3.6.1 on Windows has a DLL injection vulnerability.

AnyDesk Advice

https://support.anydesk.com/knowledge/abuse-management

AnyDesk offer advice to users, however the sad truth is that most home users who would be scammed have probably never even heard of AnyDesk before they are scammed out of their money. As such it is the job of people like you, reading this blog post, to help ensure that when this does happen. We can provide as much detail to law enforcement as possible to help them catch the scumbags behind this.

Posted in anydesk, Windows Forensics | Tagged , , , , , , , | 3 Comments

Log4J/Log4Shell Video Glossary

The video has been created to explain Log4J and a little about the Log4Shell vulnerability.

Click here to view the video

TermDefinition
Apache LicenseThe Apache License is a permissive free software license written by the Apache Software Foundation
APIApplication Programming Interface – A way to allow software to communicate using pre-agreed standards and language
Burp SuiteA Web Application attack tool/suite of tools
Defence In Depth*The idea that you have a layered security approach, rather than relying on a single technology or system for defence
HTTP HeaderServer/Client communications portion of web browsing traffic
JavaJava is a high-level, class-based, object-oriented programming language
JNDIJava Naming and Directory Interface
LDAP*Lightweight Directory Access Protocol – The protocol the attacker is using or simulating in order to deliver the malicious code
Lookup*Utilising a built in commands to allow variables to be retrieved
Open SourceThe original source code is made freely available and may be redistributed and modified
RMIRemote Method Invocation – Java API
Sanitised InputsValidating the input is what is expected for the field, further reading – https://cwe.mitre.org/data/definitions/20.html
Security AssessmentThis point could be it’s own video. However in short: this can range from a Vulnerability assessment, simulating an attacker through to working with your internal teams to help threat hunting and defending
Vulnerability*A weakness in Software code that can be used by an attacker to exploit that system

*In the context of the Log4J video. With a different context the definition may change.

Do I have software that is vulnerable?

Check out this page from the Dutch NCSC around known software and it’s vulnerability status, and a post from Bleeping Computer that shows a list of known vulnerable applications

Video and Glossary Refences & thanks:

Posted in Uncategorized | 1 Comment

HatsOffSecurity on YouTube

We are re-engaging the Hats Off Security blog content onto YouTube. This means that instead of simply reading my insanity, you get to hear me say it instead.

Please head over to YouTube and take a look at this videos, and subscribe if you like the content.

Wait… Why aren’t there many videos?

This is a new project for me, and it takes significantly longer to create and edit videos than it does to write a blog post. I basically have to write the blog post, then record it, and edit it.

Over time more videos will be uploaded, but I am going to focus on making each video a higher quality than the previous one. The first ones may be a little rough, and there will 100% be some duds in there too (like the 2nd password short video)

Does that mean this blog is dead?

No.. although it hasn’t exactly been a hotbed of activity from me for some time. I will continue to use this platform for long form content that will enhance the YouTube experience.

Translated into English, that means I will be putting the technical (or boring) parts on here, as further reading for anyone who needs or wants it.

So what will this look like now?

Blog posts will be shorter and some may seem out of place without the video context. For example, I may put up a glossary of terms that I use in the video on here.

But I love reading your stuff!

Don’t worry you can still read my amazingly articulated anecdotes, while also listening to the soothing sound of my voice.

Posted in Uncategorized | Tagged , | 1 Comment

How to Create a Good Security CTF

I have been creating network and computer security ‘Capture the Flag’, or ‘CTF’, challenges for a number of years now. My latest job had me doing this full-time for events that would attract several thousand players. During this time my team and I have learnt many important lessons on what to do and more importantly what not to do.

I want to share some of these lessons here as I see more and more CTFs being created. Note. My specialty is in forensics and the blue team, I have working knowledge of most disciplines, but you will notice most of my examples will be around the blue side of the house. They should all translate quite happily to the red side of the fence.

Why do you want to make a CTF?

Answering this will help guide your decision making. Bear in mind that these are not mutually exclusive and may bleed into one another. Some of the most common reasons include:

  • Business reason – This may be a trade show, or a way to get your company name out there. This reason obviously needs the most care and attention as your reputation is on the line
  • Teaching – You may want to show off some new tools or techniques, or maybe you feel there is a skills gap that needs to be addressed. This can be separate or joined with the Business element
  • Conference – There are many security events around the world, and whether your CTF is on-site only or for people who aren’t able to physically make it, this type of event allows everyone to feel that they are part of the action
  • Fun – You just want to make something because you love the topic and you want to share that with the community. This might be a solo undertaking or may be something that you get a peer group together for

I have tried to put these in order of ‘required structure’. What I mean by that is; if you are doing this as a business it should follow the same pattern as any project. There needs to be requirements, deliverables and all of the usual planning that would be put into a large piece of work. If you are doing this for fun, you should still have these, but they may simply be in your head. You will be given a lot more leeway by the people playing your CTF if they know it was a labour of love and is not trying to be a representation of what your company is offering.

Target Audience

Think about who you are going to be presenting this CTF to. If you are teaching people, then you will need to have entry challenges that are easier to solve. Whereas if you are looking professionals at a conference, you will most likely want to pull out the big guns and have some really difficult challenges. Look at the demographic in terms of discipline too. Are you presenting this to pen-testers, forensicators, developers? We will discuss discipline of challenges later in this post, but it is important to know who will be playing your CTF, or at least who you are targeting it at.

Difficulty

Most CTFs have a progressive difficulty. For example, with forensics your first question might be “what is the hash value of the forensics image”. This is typically generated by the capture tool and stored as a text file with the image. It is a simple ice breaker that allows people to ensure they understand the flag format for submission, and proves the file they are using is intact and correct. Bear in mind that if your target audience is entry level with regards technical skill, your entry challenges may need to be even simpler.

I have previously made trivia questions that prompt the user to think about their environment. For example, “Which command allows you to list files on a Linux workstation?”. Or “In Volatility v2 which plugin will show you the operating system the memory file was taken from?” Difficulty is best measured in 4 categories; easy, medium, hard and extreme.

Easy will be the ice-breaker and trivia type questions. Extreme will be something that only someone with a very in-depth knowledge of the subject would be able to do. An example might be having to carry out several different techniques in sequence in order to find the answer. Hard and Medium will sit in between. Think about the spread of difficulty in terms of weighting. If you are teaching you will probably want 35% easy 35% medium 25% hard 5% extreme (or 35/35/25/5). If you are looking at a conference or highly technical you may change to 10/25/40/25. Be honest about your own ability too, making an extreme challenge that is fun, engaging and realistic is not always easy. This is your CTF, you are making the rules!

Expertise/Discipline

As you can tell by now, my expertise means I am a lot more comfortable creating forensic challenges (network/host/memory) than I am making something like cryptography or malware reversing. You need to be honest about what you are able to do and play to those strengths. If you have a multi-skilled team like I had, then you will able to have a diverse discipline set.

Think about why you are making the CTF, if you are at a conference that is focused on pen-testing, or coding, then forensics challenges will most likely not go down too well. Conversely having a lot of challenges based on a single language also may not be ideal. Target audience is important here. You can have a mix of disciplines, but seek validation of any challenges you make that are not in your discipline. For example, if someone solves a challenge you wanted to be ‘extreme’ using strace, then you will look a little bit silly.

Learning Objectives

This is often overlooked, or lost, when creating challenges. Even if you are creating the CTF for fun, you are still teaching, or reinforcing, a skill. Ask yourself what it is you are teaching, and what the real-world application would be.

Something my team and I discovered is that we were pushing out too many challenges that had very limited real-world application. Examples of this were steganography challenges, mostly using the same tool but with different ways of hiding the password. Another was putting challenges on Twitter which were either simple XOR, rotation cipher or Base64/32/85 encoded. While these are interesting for an ice breaker, they were being over used and detracting from the overall experience.

Even at the high end; I got sick of hearing “CBC bit flip” whenever it came to difficult challenges. Looking again at relevance, do your challenges represent the real world? A perfect example of forensics would be to have everything based on Linux. While Linux forensics is an important skill and should be in a CTF, it should be put into the correct context. Perhaps the Linux image is from a web server that was compromised. Linux desktops are quite rare when held against Windows and Mac.

Narrative & Easter Eggs

A narrative isn’t strictly necessary, but it can be the difference between an OK CTF and a great CTF. If you have a general underlying story then it allows the player to play along in their heads. Quite often forensic challenges will be around stealing company data, you can enrich this with Easter eggs; have some emails, documents, web-browsing etc that plays into the narrative.

SANS DFIR team do an excellent job of this. When playing their ‘capstone’ events you can see the huge amount of time and effort they put into generating the evidence. While this is above and beyond what the average CTF will contain, remember that they are able to re-use this data for years with new artefacts being found each presentation. Putting a little extra effort into the challenges early will make for a more enjoyable experience later.

I would often post amusing (I thought they were funny at least) messages to Pastebin which were never referenced in the challenge questions, afterwards I would have people telling me they found, and it would become a conversation piece over beers. Having extra data in a forensics challenge also raises the difficulty. If I say “which docx file was opened on this date” and there are only 3 files, why bother looking up the data in the intended way when you can simply brute force it?

Flag Format

Traditionally a flag would be formatted as “flag:text_here” or some derivative of this. Using the word “flag” is not the best idea, as people can search, grep or otherwise look for that string instead of actually completing the challenge. You can have the player manually append the word “flag” in the submission field if need be.

Challenges that I have made recently now include dates from the evidence. For example, “What time/date did xx happen?”.

If you use this method don’t be afraid to over explain the format expected. In my previous example I would need to say “format is yyyy/mm/dd HH:MM:SS and in UTC”. Often new CTF creators make the assumption people will know what the flag should look like. This simply isn’t the case!

If you do use the traditional flag format, I would also recommend adding some fake flags to stop people finding alternative ways to find them. I have put 25,000 lines of random flag strings to stop people using forensic tools to search for the word “flag”. The truly evil part of that is that I didn’t use the word flag in any of the answers. Just be careful not to troll your players too hard with fake flags, only use them to discourage trivial challenge bypass!

Hard != Esoteric

Esoteric is a word I have come to use a lot when planning CTF challenges.

esoteric

/ˌɛsəˈtɛrɪk,ˌiːsəˈtɛrɪk/

adjective

1. intended for or likely to be understood by only a small number of people with a specialized knowledge or interest.

I have played many CTFs which had challenges involving a simple concept with an esoteric element added to it, to then claim it as a difficulty level. While this is a possible way of making something harder, it should not be relied upon.

A silly example would be “Guess the password on this zip file. Hint: The password is my date of birth, followed by my parent’s anniversary”. There is no way I would expect any of you to know that. I don’t even know what my parent’s anniversary was! (they divorced <redacted> years ago).

Other examples could be using a vulnerable PHP version, then tweaking the vulnerable piece of code manually. This then means you have changed it from a simple out of the box challenge that could be solved with metasploit/burp, to a challenge that needs a custom exploit based on a patch that was implemented and never published. Or a Vigenère cipher where you expect them to brute force the key. This only works if the key is easily guessed after getting the word ‘flag’ at the start.

This point deserves far more discussion that what I have written here, a general rule of thumb is to remember something I have said to my team many times “The challenge should be hard because the subject is hard, not because you’re being a dick”.

Evidence & Scope

Do you have a really cool forensic challenge, something that’s really exciting, revolutionary? All you need to do is download this 500GB file.

It’s probably not going to get the attention it deserves!

A trick I started to use on Forensic challenges was to use a tool like Kape to copy all of the important data to a USB stick, then copy extra ‘fluff’ data across, this was typically the contents of program files, and user appdata. I would then capture the USB stick which was 8GB in size. Since the unallocated space was empty this compressed down into a couple of hundred MBs.

With memory images reduce the RAM on the VM to something more manageable. You may have a bad day due to slow responsiveness, but it’s better than 8GB of memory just to capture a PID or two.

With regards to scope, make sure you have permission from the service provider to carry out the CTF, especially if you are hosting web applications. Azure and AWS have allowed CTFs to be carried out using their infrastructure. But I would recommend contacting them, or you could find your CTF being cut short.

Documentation

We all love writing documentation, right?… right?

You may think you don’t need to document your challenges because you can remember them, and even if you don’t you can re-solve them. No. This is a bad mindset to have. When you are getting 50 questions and complaints that a challenge isn’t working, you don’t want to be solving your own challenge, trying to remember the arguments to a tool, or trying to remember which offset the important thing as on. Write. It. Down!

By having good, simple walkthroughs you can test your challenges easily. You can get them validated easily! And you can re-use aspects of them in the future.

Testing & Validation

This proves to be one of the most difficult parts of creating a CTF; getting someone to test it. As we all know, testing your own work is never a good idea. You need external validation.

This is typically harder for a business to do than an individual (unless the business is a security consultancy or similar), as they would need to employ a small QA team to go over the challenges. With an individual, you can ask friends, or peers to help out. But in my experience, the uptake on actual testing is very low, especially if the discipline or difficulty is outside what your peer group is comfortable with.

Shouting out on social media, like Twitter, can really help. Asking for volunteers to test your challenges. Validation is more around confirming the difficulty. Does someone with the target demographic skill set agree with the difficulty? Do your peers agree? Who is right? No one and everyone! The stats at the end of a CTF will often be the real truth-teller. Did everyone solve your hard challenge? This happened to my team with a crypto challenge; it took ages to build and minutes to solve because there was a tool that had been developed for a very similar challenge used on a different CTF.

Do all of your challenges work? If you have documented the challenges then you can get people to carry out functional test. These are, as the name implies, a simple test to make sure the challenge works and can be solved in the way you intended. The testers will have your walkthrough and will be following it step by step. They can then feed back to you if they thought your route to solve made sense and if they think it would be reasonable to expect a player to take that route.

Platform

Where and how are you going to present your CTF?

There are engineering considerations to take into account when choosing a platform. Having a really popular CTF may seem like a great problem to have, but when people aren’t able to play, they may get frustrated quickly. Think load balancers and how to host these systems. Should you be looking at hosting on a cloud service provider rather than the old trusty 386DX you have lying around in the cupboard?

The current leader in open source CTF platforms seems to be CTFd. This platform can be cloned from GitHub, or installed via Docker allowing for quick set ups. It also has the ability to work alongside external platforms such as Major League Cyber.

There are other open source platforms out there and companies are now starting to pop-up with CTF Platform-as-a-Service. But these are still few and far between, as this is a growth (sub) industry, I expect to see a lot more managed CTF platforms going forward.

When choosing a platform look at if they have case sensitivity in the submission field. This type of detail is often overlooked when creating flags. Does the service offer any sort of post-CTF stats? Can you look at how many people participated? How many challenges were solved? etc etc.

Write-Ups

Players will write about your CTF, they will spoil, or burn your challenges. Accept this as part of the process and encourage it. I have seen companies claim that doing this was an infringement of intellectual property. I am no legal expert, but I doubt that would hold up in court!

By having people write up your challenges you are getting free and unfiltered feedback and your CTF will get even more publicity. Meaning if you run a second CTF you will have more players. Also, these write-ups will create more footfall to your website (if you have one!).

Retrospectives

Regardless of whether you run the CTF for profit generation or for run, you should run a retrospective analysis on how the CTF went. As this can be a blog post in itself I will advise you to read up on it here, or any other project management blog.

Conclusion

There is a lot to take in from the post, and not as many pictures as normal. I hope that you find this helpful when creating your own CTF. Keep creating and keep pushing what we know.

Posted in Competition, CTF, General, Introduction | Tagged , | 1 Comment

NoScript Plugin Forensic Investigation – Firefox/ToR Browser

In this blog post I plan to show that using the NoScript plugin it is possible to glean information about what sites, or files, a user accessed while in a private browsing session and also whilst using the TOR browser. There are a number of assumptions that need to be made, most importantly that the user is actively using the NoScript plugin as it was intended, or in regards TOR that the user has “safer” or “safest” set as the security setting.

Other assumptions will be covered throughout the post, but these are not outside the realms of what would be considered expected user behavior.

What is NoScript?

NoScript is a browser plugin that enables the blocking of scripts/plugins (JavaScript, Flash, Java etc) from running on pages. These scripts are often used to correctly display a page to a user, but can also be used for more nefarious, or intrusive, purposes. NoScript looks to protect users by allowing them to decide which sites are allowed to run these scripts.

But surely there aren’t that many scripts?

This image is from the Reddit home page. I would say it is safe to assume that Reddit would never knowingly deliver malicious content to visitors (and there is nothing in this image that would make me think that’s the case!). That is a very quick way to kill off a business. Like many sites they use tools to track site usage, and to deliver adverts.

For the purposes of this post, I won’t be going into what each entry means or why there are a number of different domains. The point here is that for a well know site there are several entries, not just one.

Plugins & Private Browsing

By default, plugins installed on both Firefox and Chrome do not persist to private browsing mode. Note that setting the browser to never remember your history is different to private browsing when it comes to installing plugins.

This means that the user will need to have enabled this feature for private browsing if we are to get artefacts from there.

What is Private Browsing?

Private browsing, or ‘incognito mode’ on Chrome, are designed to allow a user to browse the internet without leaving session data behind. This can be very useful if using a shared computer, for example in a library, or if you need to log onto your bank from a friends house. It is also useful for developers to fault find as it ignores existing cookies and deletes any newly cookies created at the end of the session.

It can also be used to hide evidence of malicious or criminal activity. Some users believing that anything done in private browsing is untraceable.

Private browsing can cause issues for a forensicator, and unless you capture the memory from a running machine, it can be very difficult to show what was done during that browsing session.

Private browsing does not protect against what the destination server is tracking. This is where plugins that protect a users privacy come in.

Browser Fingerprinting

‘Browser fingerprinting’ is a way of identifying your browser on the internet. Sites like AmIUnique.org can check your device to see how unique you are. JavaScript is often used to glean more information about your browser that will increase the likelihood of your browser being unique. By using NoScript you disable this additional layer of potential invasive behaviour.

Browser fingerprinting is worthy of its own blog post, so instead I will add screenshots from AmIUnique with NoScript enabled and disabled. Both will be done in private browsing on Firefox to avoid cookie contamination.

Firstly with JavaScript enabled (NoScript disabled)

The bottom line tells me that my fingerprint is unique across almost 2 million browser fingerprints. This means that beyond reasonable doubt this site can uniquely identify me.

Now with JavaScript disabled (NoScript enabled)

With Noscript doing its job we can see that I am now one of 17 browser fingerprints out of almost 2 million. While this is still a crazy small amount, it adds an element of doubt that was not there previously. In a criminal case, this element of doubt could be the difference in this evidence being critical to the case.

The main use of browser fingerprinting, and the main reason NoScript exists is not because of criminal investigations, but to help prevent aggressive and invasive advertising. If you are looking to purchase a gift for a loved one, you do not want every advert you see to be attempting to sell you that gift. There are obviously many other examples but  I will leave that to your imagination.

Recording of Settings

NoScript very helpfully remembers what you set, this helps if you visit a page on a regular basis and don’t want to have to keep adding it to the whitelist. This data persists across private browsing.

I visited cnn.com using Firefox private browsing with NoScript installed. I then manually allowed all scripts and allowed the page reload. Next I closed the private browsing screen and browsed to cnn.com. The settings from NoScript had been preserved. This means this data is being stored somewhere and can be extracted for our use.

Allowing Embedded Media

Certain investigations concentrate heavily on proving if certain media was viewed. If the user downloads the media then traditional forensics methods can be used to identify them. If the user views embedded media in a webpage whilst in private browsing, this becomes more of a challenge.

NoScript is able to allow a specific file embedded in a page without allowing scripts to be enabled on the whole site. This can often give a sense of security to the user as they only want to view a specific file and not have to worry about hidden scripts spying on them.

When a user visits a page where the script used to load the media have been disabled they will be presented with the above image in place of the media. When they click on the hyperlink in the center of the video the following pop-up is presented.

Whichever of the top two buttons the user presses, we will be able to see evidence in both live and captured systems.

From a Live System

In Firefox type about:addons into the address bar, this will take you to the built-in extension management page.

You can see from the image above that the private browsing icon is displayed, which indicates this plugin has been given permission to run in private browsing mode.

By clicking on ‘options’ from the ellipses menu we are presented with a page with 4 tabs

This can also be accessed from the ‘options’ button from within the plugin.

The ‘Per-site Permissions’ and ‘Advanced’ tabs are what we are most interested in here.

Per-Site Tab

From this page we can see which sites have been allowed access to play, as well as any individually files.

We can see here that a file named “150106Doomsday_desk.mp4” was allowed from “theguardian.tv”. I will go over how this works from the user’s perspective shortly.

Additionally on this screenshot we can see that “sans.org” has been temporarily trusted. Temporary trusts are per session only. So we know that sans.org was visited since the browser was launched, and that the user interacted with the plugin on that site.

The site “test.com” was added manually from this screen. The padlock icon is “match HTTPS only” and the only time I could make it appear this way was by manually adding it via this web interface. When the padlock is red this means that the permissions do not apply when HTTPS is in use. I would recommend reading the NoScript forums for more details on this feature if it is critical to your investigation.

Advanced Tab

The advanced tab places all of the sites into a structured format and places the sites into order of last changed. You can see under “hello.com” where I was re-testing Reddit in order to find a link to an embedded video. “test.com” and “1234n6.com” (site of a friend, worth a visit!) were to test the ‘HTTPS Only’ feature.

From the hierarchy of the file “untrusted” and “custom” are siblings, do do not look at the MP4 as being a child of “untrusted”. “gfycat.com” was also used to test embedded media, except this time, I pressed the second radio button on the pop-up and enabled media across the site.

Finally the section symbol “§” appears to be used to donate the padlock symbol from the previous page.

Account Sync (live)

You can check if account sync is active on in the browser, the top right of the page will have the three lines that allow you to access the menu, when the menu is displayed you will be either asked to sign in:

or you will see the email address of the currently logged in account:

If you browse to

about:preferences#sync

you will see which items have been set to sync

From a Captured Image (Dead Box)

Before we can acquire this data from a forensic image, we need to know where it is stored. This is not documented anywhere that is easy to find and some dynamic analysis to see which files were being accessed by this plugin. Eventually this led me to “storage-sync.sqlite” database held at the following location

%appdata%\Mozilla\Firefox\Profiles\<profile>\storage-sync.sqlite

If you have a system with multiple profiles, you can find the default profile from the profiles.ini file which sits under the “Firefox” folder in this structure.

Account Sync

Account sync is the synchronisation of data between multiple devices. By default this will include data from plugins.

The data still appears in this location regardless of whether account sync is enabled or not. Be aware though, that if account sync is turned on this activity may not have occurred on this device, or even this user account.

A quick way to check if account sync is enabled is to see if key data is included in the same storage.sync database.

Below we can see what the data looks like without any sync information:

And now a different test user account with Sync enabled

As you can see there is an additional entry called “storage-sync-crypto” which shows that synchronisation is in play.

Interrogating SQLite databases

There are several tools that will interrogate a SQLite database, with varying degrees of success. I opted to use a Powershell module called PSSQLite.

Instructions on how to install it can be found here. On RamblingCookieMonster’s Github page.

Once installed I created the following script.
Notes:
– The export CSV is going to drop the file into the current directory. Bear this in mind when executing.
– You will need to complete the $database variable path yourself as I don’t know where you store your data!

Import-Module PSSQLite

#set variable with path to SQLITE database
$database = "<path-to-extracted-file>\storage-sync.sqlite"

#show columns
Invoke-SqliteQuery -DataSource $database -query "SELECT 1,sql,2,3,4,5 FROM sqlite_master WHERE type!='meta' AND sql NOT NULL AND name NOT LIKE 'sqlite_%' AND name ='collection_metadata'"
#show data
Invoke-SqliteQuery -DataSource $database -query "SELECT record FROM collection_data WHERE record LIKE '%key-policy%'" |
    Export-Csv -Path .\output.csv

This allow you to manipulate the data from the CSV how you see fit. With a little more time I am sure this data can be formatted and prettified. If anyone wants to make a tool from this, please do so, but add a credit line linking back here.

TOR Browser

The TOR browser also works off a more secured version of Firefox. When the settings are changed to disable Javascript it loads the NoScript plugin in the background. So does this mean we can discover browsing activity from the TOR browser? Yes…. sort of.

Obviously the user would need to have set their security settings to either “Safer” or “Safest”, this is what will the plugin.

Another key point here is it appears the browser clears the profile data each time it loads. So the best we appear to be able to retrieve is anything that was accessed on the last browsing session.

The profile data is saved here:

<installed location>\Tor Browser\Browser\TorBrowser\Data\Browser\profile.default\storage-sync.sqlite

And when interrogated using the same script as earlier, with an updated file location looks like this:

Here we can see evidence that I visited, and allowed scripts on, two sites. The first is DuckDuckGo’s onion site, the second is the onion version of Facebook.com. I didn’t want to risk attempting to find embedded videos on TOR for obvious reasons, but I am confident if I had viewed embedded videos, they would also be allowed here.

 

Posted in Browser Forensics, Firefox, TOR, Windows Forensics | Tagged , , , | Leave a comment

Keybase.io Forensics Investigation

What is Keybase.io?

I was first introduced to Keybase a few years ago. It was explained to me as a place to validate your identity with regards to sharing public keys for email encryption. Showing that a Twitter account is controlled by you, or that a github repo is truly yours.

It is a good way to view the ‘web of trust’ around a person, especially if this is a person you would not expect to meet face to face.

I personally never paid much attention to it, not because of anything bad around Keybase, just because it didn’t solve any problems that I was facing.

Forensics on a Website?

At first glance, or if you used Keybase historically you may think that this is simply a website offering a web of trust style service. However Keybase was brought to my attention by a friend. They asked if I had ever carried out an investigation or researched it.

I was very confused, so I went back to see what had changed. I saw that there is now a downloadable app which offered encrypted chat, file transfer and groups. This piqued my interest as this is the criteria bad guy would look for if they wanted to discuss bad things!

Is Keybase Bad?

Just to get this out of the way first, at no point in this post am I attempting to say Keybase is anything other than a legitimate company/app offering a service. Anyone abusing this service is the same as those who abuse other communication mediums. I am carrying out this investigation purely as I feel it could benefit a forensicator that hasn’t come across this before.

Lab, Set-up and Basic Scope

I will be using virtual machines for this investigation. One is Windows 10 (1703) the other is Ubuntu 16.04. I will be creating 3 accounts on Keybase:

  1. Windows 10 App
  2. Linux App
  3. Chrome Browser

My primary focus will be on the Windows machine, but I will be cross checking with Linux to see if similar artefacts exist.

I will create a ‘team’ and also ‘follow’ my accounts. I will look to see if I can find proof that the accounts are connected from host based artefacts alone. I will also look to see if I can capture files that were transferred privately or via the team and any other information that might be available.

The images I will use will be from unsplash.com or something I create in Paint (I am a terrible artist!!)

I intend to look at browser extensions, but that may wait for another blog post.

I won’t be looking at mobile apps, I do not presently have the set up to do so. I hope that someone reading this will pick up that research. I would be very interested to see how we could tie a user to a device.

Signing Up

Initial sign up is very easy from the app. You only need to provide a username, password and a name for your device. All of these are public. You do not need a valid email address and testing has shown you can use throw away services such as Mailinator.

Artefacts

Now the part you are really here for. Lets start going through the artefacts at what they mean.

Firstly the application itself is in the Local Appdata folder (I will refer to this from the ‘Local’ root folder in future file paths)

<drive>\Users\<username>\AppData\Local\Keybase\Gui

There is also a Roaming AppData folder created (I will refer to this from the ‘Remote’ root folder in future file paths)

<drive>\Users\<username>\AppData\Roaming\Keybase

There is no sign of this program in the Program files Directory.

Avatars

On both Linux and Windows there is an ‘Avatars’ folder which contains the profile picture of anyone who has appeared in the app. Unfortunately the app auto suggests people to you, as such this would be difficult to prove a connection. It does however indicate how active a person has been. 10-20 avatars were recorded from a couple of clicks after creating the account

Windows:

Local\Keybase\avatars

Linux

~/.cache/keybase/avatars

The files are named ‘avatar<9-digits>.avatar’ on both operating systems and after testing known profile images I can say that the 9 digits do not match. I confirmed the MD5 of two profile images. While the hashes matched, as did the image, the filename did not.

Within a few clicks I had this directory over 1,500 avatars.

Viewed Images & Avatars

The following folder contains a cache of image files that were viewed within Keybase. This includes Avatars and any images sent or received by the machine being investigated. It does not store all files, for example documents, that are transferred. Instead it is my belief this is simply caching any image that was viewable in the app.

\roaming\keybase\cache

This location is probably the most important to criminal investigations as the files in here show what images were shared. These files no not have a file extension, but are in their native format. In order to filter out Avatars from Images I would suggest sorting by size as the Avatars are typically no larger than 60kb in size, where as images are more in the region of 100+kb

Location of last sent file

It is possible to see the location of the last file that was sent over Keybase chat. There is no obvious difference from this artefact alone as to whether it was sent in a team chat or a 1:1 chat.

The following file does not have a file extension, but can be read using a normal text editor.

\AppData\Roaming\Keybase\Preferences

This could be useful if you suspect external media, encrypted media, or network locations may be in use. The artefact should read

{"selectfile":{"last_directory":"<path\\to\\file\\location"}}

Team Creation

If a team is created on the machine being investigated it will be recorded in the keybase.sevice.log file.

\AppData\Local\Keybase\keybase.service.log

and will be easily identified by the log entry:

YYYY-MM-DDTHH:MM:SS.mmmmmmZ - [DEBU keybase teams.go:63] 1053 + TeamCreate(TeamName) [tags:TM=hXqL-F0Xwsfw]

The ‘1053’ appears to be an iterative Hex value for the log entry. The ‘go’ value and the makeup of the ‘tags’ field require further investigation.

Added to a Team

I believe it is important to know that a person can be added to a team without their explicit consent. I simply clicked on the individual I had previously spoken to (in my case my test account) and added them. I then got a notification to tell me I had been added.

When a person is added to a team the same keybase.service.log file will record the following entry

YYYY-MM-DDTHH:MM:SS.mmmmmmZ - [DEBU keybase team_handler.go:248] 3e64 teamHandler.newlyAddedToTeam: team.newly_added_to_team unmarshaled: [{Id:107282d36081bcbb018874d93e097824 Name:TeamName}] [tags:GRGIBM=WT3UeztTfELk]

Incorrect Team Name

When trying to join a new team, the name must be accurate, you do not appear to be able to search for Teams from a direcory, but instead must know the Team name. There are several log entries relating to a Team not existing such as this one

<date/time> - [DEBU keybase teams.go:4180] 5b34 - TeamAcceptInviteOrRequestAccess -> Root team does not exist (error 2614) libkb.AppStatusError [time=238.3055ms] [tags:TM=IL6Xi8j8UVec]

Unfortunately the team name they searched for does not appear in this log if they were not successful.

Requesting to Join a Team

When a user requests to join a team the following log entry will appear in keybase.sevice.log

<date/time> - [DEBU keybase teams.go:4180] 5b48 - TeamAcceptInviteOrRequestAccess -> ok [time=258.2293ms] [tags:TM=YW_eGpfUPiAX]

There does not appear to be any correlation, in this log file, between the invitation and acceptance to a group. The name of the group is only disclosed once an invitation has been accepted.

Team Chat Members to Team Name

I found that “Chat-Trace” was very useful in linking team members to team names. I have not done extensive research on this artefact to guarantee fidelity, however I was able to use the term to identify the users of the public team chat I joined.

<date/time> - [DEBU keybase user.go:333] 7d64 + Store user UserName [tags:chat-trace=M9L0hrvapdnb,platform=windows,LU=SwGnJf0KDRR7,CHTUNBOX=5ObRJV-ynRzY,CHTLOC=Sk6db9l6bidX,apptype=desktop,user-agent=windows:Keybase CLI (go1.13):5.1.1,CHTLOCS=KKNAAQHUSra6]
<date/time> - [DEBU keybase user.go:355] 7d67 - Store user UserName -> OK [tags:CHTUNBOX=5ObRJV-ynRzY,CHTLOC=Sk6db9l6bidX,apptype=desktop,user-agent=windows:Keybase CLI (go1.13):5.1.1,CHTLOCS=KKNAAQHUSra6,chat-trace=M9L0hrvapdnb,platform=windows,LU=SwGnJf0KDRR7]

Once you have a chat-trace you can use that to pivot either to or from the team name

<date/time> - [DEBU keybase teams.go:432] 7445 ++Chat: + TeamsNameInfoSource: DecryptionKey(TeamName,67c99659bdc24920b56ccec3a42dd424,false,1,false,<nil>) [tags:chat-trace=M9L0hrvapdnb,platform=windows,CHTUNBOX=5ZTfEkNGaDqv,CHTLOC=Sk6db9l6bidX,apptype=desktop,user-agent=windows:Keybase CLI (go1.13):5.1.1,CHTLOCS=KKNAAQHUSra6]

Once you have filtered the log by chat-trace you can then filter again by “keybase user.go” to get a list of users. This list appears to be online users only, further testing required to confirm. In my test the channel info reported 11k users, but the log shows 84 unique usernames.

Leaving a Team

When a user leaves a team the following log entry will be seen

<date/time> - [DEBU keybase teams.go:420] 1e235 + TeamLeave(TeamName) [tags:TM=aByOmRdtM1XJ]

Conclusion

The Keybase app leaves a lot of useful information behind for forensicators to use in their investigations. While currently it is not possible to capture the chat history, we can see Usernames, Team Names and importantly what images where shared in these groups.

Posted in Keybase, Linux Forensics, Windows Forensics | Tagged , , , , , , | Leave a comment

When is Index.dat not Evidence of Browsing

It is easy to fall into familiar habits as a human being, we see patterns in what we do and expect those patterns to persist. However when these patterns can be the difference between a person keeping or losing their job, we need to make sure we are being as vigilant as possible.

During the course of creating a forensics CTF which would be made available to 28,000 14-18 year olds, an image was taken of a Windows 7 machine. The learning objective of this challenge was to show program execution. In order to make the image more authentic a controlled amount of user activity was scripted and carried out by the author.

The challenge was created, tested and then made available to the target audience. During the event we received a message from one of the players telling us they had found “evidence of inappropriate browsing”. While the player reporting this was doing so in a ‘tongue-in-cheek’ way, we took this very seriously.

While I was confident the author had not acted inappropriately, proof was needed to show she hadn’t.

The following screenshot was all of the evidence that was provided:

We know that the various Index.dat files track user browsing activity, so looking at the screenshot, we can see why this was the players first assumption.

The player reported that they had run autopsy against the image and then looked through the URLs that Autopsy reported. This URL was listed alongside legitimate browsing activity as the search function used was a regular expression looking for anything matching a that structure.

My initial observation was looking at ‘ietld’, this seemed an odd thing to be at the beginning of a URL and was the first focus of the investigation. Additionally we could see ‘Internet Explorer’ was being reported as being related to this artefact by Autopsy.

IETLD

The first step was to see if there was a quick win for this. Had it been seen before? Did Microsoft have a knowledge base article on this?

As many people in the forensics world will be familiar with; the results were not as helpful as we had hoped, with Yahoo answers providing

While this is factually accurate, it is not overly helpful.

This Forensics site shed the most light on the situation, however as you can see by the screenshot below that was still not a complete answer, but at the very least we could see that it had been seen before and was the first confirmation that this file was expected behaviour and as you can see not indicative of user browsing.

This find was a relief, but was not a good enough answer for a company that prides itself on going that extra mile when it comes to all things security. From here I was determined to find out why this file exists and what it is used for.

Domain

Before following the rabbit hole, I decided to prove that browsing was not possible to this domain as I had a pretty good idea that the TLD portion of IETLD was a thing we like to refer to as a clue

The next logical step was to see if the domain had a IP address active on it. It did not.

Looking back through historical DNS records I could see no evidence of an IP address ever being associated with this domain. I knew that the domain hadn’t been visited by the content author already, but was the final nail in that coffin.

Index.dat

When you begin to research index.dat using the evidence provided above, you can see when someone not well versed in the nuances of Windows would jump to the conclusion this is browsing activity. There are many articles, forum posts and Q&A sites that indicate anything in index.dat is evidence of browsing history. Fortunately we know this is not the case, and the location of this index.dat makes it different to its namesakes.

This file was located in %APPDATA%/Roaming/Microsoft/Windows/IETldCache. Again notice the final folder name, this is indicating that this is not the browser history that we are looking at.

Additionally this file was filled with a list of domains, none of which would have been visited by the author.

Further Research

Carrying out more research, including looking up the various domain names we found in this file, we began to notice that these domains appeared on a list called the Public Suffix List, this list was originally created started by Mozilla in an attempt to stop TLD level cookies

Public Suffix List

From – https://publicsuffix.org/

A “public suffix” is one under which Internet users can (or historically could) directly register names. Some examples of public suffixes are .com, .co.uk and pvt.k12.ma.us. The Public Suffix List is a list of all known public suffixes.

The Public Suffix List is an initiative of Mozilla, but is maintained as a community resource. It is available for use in any software, but was originally created to meet the needs of browser manufacturers. It allows browsers to, for example:

  • Avoid privacy-damaging “supercookies” being set for high-level domain name suffixes
  • Highlight the most important part of a domain name in the user interface
  • Accurately sort history entries by site

We maintain a fuller (although not exhaustive) list of what people are using it for. If you are using it for something else, you are encouraged to tell us, because it helps us to assess the potential impact of changes. For that, you can use the psl-discuss mailing list, where we consider issues related to the maintenance, format and semantics of the list. Note: please do not use this mailing list to request amendments to the PSL’s data.

It is in the interest of Internet registries to see that their section of the list is up to date. If it is not, their customers may have trouble setting cookies, or data about their sites may display sub-optimally. So we encourage them to maintain their section of the list by submitting amendments.

History of Public Suffix List

The Public Suffix List was originally a Mozilla project before being open/crowd sourced. As such we can also find relevant references on the Mozilla Wiki page

Mozilla Wiki

From – https://wiki.mozilla.org/Public_Suffix_List

Purpose(s)

Previously, browsers used an algorithm which basically only denied setting wide-ranging cookies for top-level domains with no dots (e.g. com or org). However, this did not work for top-level domains where only third-level registrations are allowed (e.g. co.uk). In these cases, websites could set a cookie for co.uk which will be passed onto every website registered under co.uk.

Clearly, this was a security risk as it allowed websites other than the one setting the cookie to read it, and therefore potentially extract sensitive information.

Since there is no algorithmic method of finding the highest level at which a domain may be registered for a particular top-level domain (the policies differ with each registry), the only method is to create a list of all top-level domains and the level at which domains can be registered. This is the aim of the effective TLD list.

As well as being used to prevent cookies from being set where they shouldn’t be, the list can also potentially be used for other applications where the registry controlled and privately controlled parts of a domain name need to be known, for example when grouping by top-level domains.

TLD vs ccTLD

There is some confusion about how to name the various parts of a URL or domain name. This is relevant when looking at ‘country code top level domains’ or ccTLD. Traditionally the “letters after the last dot” where considered to be the ‘top level domain’, with the word before that dot being called the ‘root domain’ and finally anything before the root domain being called a ‘child domain’ or more commonly with internet based systems the ‘sub domain’.

This logic begins to get confusing when we look at domain names ending ‘.co.uk’ for example. While ‘.uk’ is technically the ccTLD and ‘.co’ is the ‘second level domain’, it is generally accepted that ‘co.uk’ is the ccTLD.

Browsers are able to natively detect ccTLDs as these are heavily documented and are regulated under ISO standards. Non-standard TLDs are not regulated in the same way.

Conclusion

The domains contained within this particular index.dat are not evidence of browsing, they are simply evidence of Internet Explorer. This list is part of a publicly available list that is designed to allow domain owners to declare that they are controlling a TLD.

Posted in Browser Forensics, Internet Explorer | Tagged , , , , , | 1 Comment

HTTP Methods

In this post we are going to look at different types of HTTP/1.1 methods. We will leave HTTP/2 methods for another day.

This will be a summary of each method, it is possible to go into great detail with some of these points, but that would get tiresome to read (and write).

What is an HTTP ‘Method’?

With HTTP communication there is often a lot of information being sent backwards and forwards. Some of this is completing previous requests, some is new and occasionally some is erroneous. The client and server both need to quickly be able to see what is happening and how to deal with this communication.

The HTTP method is at the very start of the HTTP communication

As we can see on the above screenshot the very first word on the top line is GET. GET is an HTTP method. This is the same for any new communication as it allows the server to process the request in an appropriate way as efficiently as possible.

GET

This is the most common request from a user’s perspective. When you loaded this page your browser issues a GET request to the host “hatsoffsecurity.com” with a request for the specific resource. If you are viewing this from the main page the request would look similar to above requesting a forward slash. If you were clicking a link you would be requesting ‘/2019/04/09/http-methods’ as the resource.

Separate GET requests are generated for additional resources on that page, even down to the favicon which shows up in the tab on most common browsers.

Results of a GET request can be held in the browser cache and will appear in the browser history. The GET request itself should not be transmitting any data, only the header.

Side note – this used to display next to the URL, but browser creators removed it to stop people using padlocks to trick people into thinking the site was secure when it was’t

POST

A POST request is used if you are sending data to the server. An example of this could be that you complete a simple form. This data is then sent form the client to the server where it is dealt with depending on the configuration of the server.

This process is slightly more secure than a GET as the data can be sent inside the request. With a GET this data would be sent in the URL which would mean it was recorded in the server logs and in the browser history.

POSTS can also be used to exfiltrate data by an attacker. While this is a very noisy method, it could be used in a ‘smash and grab’ attack; where the attacker has no interest in being stealthy, but instead just wants to be fast.

HEAD

A HEAD request works the same basic way as a GET request, the difference is with the response. The major difference being that the HEAD request only requires the return header and no data.

This is seen quite often with bots. Consider a search engine may index your entire page on an hourly basis, this requires a lot of time and energy on the part of the bot. If they instead send a head request and check for details like Content-Length, Content-MD5, ETag or Last-Modified, then there would be far less work required for sites that haven’t been updated.

PUT

PUT is a way of interacting with the server without causing successive changes. For example if you wanted to create a family tree on a website you could PUT the date of birth on server as this would not change. Even after a family member passes away their birth-date remains the same.

This idea of not changing the data on the server is referred to as ‘idempotent’ and Mozilla have a good Glossary on it here

DELETE

More commonly expected to be seen with an API than with normal user activity, this HTTP method simply deletes the resource that has been identified in the API. Like PUT this is idempotent and after the first time the delete request is actioned by the server subsequent requests will simply return a 404 not found response.

Connect

The CONNECT is most likely to be seen connecting to proxy servers in order to authenticate before the users encrypted browsing session begins. This allows non-SSL breakout proxy servers to monitor web activity. This can also be used for non-encrypted traffic in the same way.

OPTIONS

The OPTIONS header is a way of asking which HTTP methods are allowed by that site. This is now mostly seen in pre-flight CORS checks, which I won’t be covering here.

It is important to realise that OPTIONS should not contain any data and the response should be quite short, simply stating which HTTP methods are accepted. This would be a good candidate for command and control traffic.

TRACE

This is the HTTP version of Ping or Traceroute in the sense that it creates an ‘application layer loopback’ to the recipient sever. It has a ‘Max-forwards’ field that can be decremented at each forward. The recipient simply responds with the same method with 200 as the response code.

This would be unusual to see in a normal user environment, this type of activity should really only be seen in a dev or web-dev environment, or for testing proxy chains for infinite loops.

PATCH

Another API heavy HTTP method, this could be used after a PUT has created a resource on the server. You don’t want to replace the resource, but you do need to update it. As such you can send the amendments using the PATCH method. This is very unlikely to be seen in a user browsing context.

Why do I care?

This is a case of ‘knowing normal’, if you know why a POST might appear unusual vs a GET, you suddenly see a spike in OPTIONS or a domain controller suddenly starts sending out lots of encrypted traffic surrounded by CONNECTS, then you can begin to investigate with the knowledge that it is unexpected behaviour.

It is important to realise that in the immortal words of Phil Hagen; there is no RFC police. These methods are generally agreed upon principles, and there is nothing stopping a developer from using these methods in an unexpected way. Whether that be via an API or a new browser.

Understanding the different methods also allows us to build our knowledge of HTTP and how the internet works from a browsing perspective.

Posted in Network Analytics | Tagged , , , | Leave a comment

Wireshark – More Basics

I have been approached recently about explaining some of the fundamentals of how Wireshark can be used.

Let’s have a look at some traffic that I captured for a challenge I created recently.

Here we can see an example of HTTP traffic that has already been captured. There are some things we can immediately pick up on from this view alone. We can see that we are not looking at HTTPS traffic, either this is a non-encrypted site or it has been decrypted by some other method.

The IP addresses in use are RFC1918 (not routable on the internet), meaning this was either internal to internal traffic, maybe a company intranet server on a small network. Or maybe we are looking at traffic from behind a NAT device. All of this information becomes important when you are doing this in anger, but for now it is simply for consideration.

Finally, we can also see 3 ‘GET requests’ in this traffic. This shows that the client requested something from the server (quick note, server simply means the machine dealing with the request, don’t get confused with Microsoft terminology). We can also see a couple of the responses with ‘200 OK’ meaning something that was requested was also served back to the client.

The problem we have here is that there are multiple streams of information, there are multiple requests all with their own responses. What if we want to single one of these out?

Follow Stream

 

In older versions of Wireshark you could only follow the TCP stream, this meant if the traffic was encoded in anyway you would not be able to see what the user would see in their browser after it was decoded.

Above you can see the ‘right click context menu’ that lets you see how to follow a stream. If you were to follow a stream on a encoded stream, this is what you would see:

The top part is the header which ends with the ‘Date:’ field. All after the new line is encoded text that you aren’t able to decode from here (one it’s just plain hard, two there are non-ASCII characters which are represented by dots)

If we follow the HTTP stream however

We will see the decoded stream:

While this particular case may not be easy to read, it is typical of what you might encounter. As a network analyst you may need to work with the malware reversing team to fully understand some of the data you are looking at. Javascript is often obfuscated, which means the developer either doesn’t want someone looking at their code or they are trying to reduce the size of the data being transferred.

Why do I care?

If you are lucky enough to be working with full packet capture you need to be able to know how to use one of the most commonly used analyst tools. Wireshark has is limitations, but for looking at a small sub-set of traffic and wanting to know exactly what happened, it is excellent.

As an example, we can see in this traffic stream above the contents of the web page without having to visit it. We can see that the page that was served did not have any malicious scripts or file downloads on it. It simply had a header with a flag in it (this is a real flag, so I hid it 🙂 )

Conclusion

We have covered the basics around following a HTTP stream and a TCP stream and why each is different in the context of HTTP traffic.

This may seem like simple to a lot of people, but to new people entering the industry, this could be the thing stopping you from winning that competition, or failing a technical test. Network analysis is important, there is a lot of cool information you can see on the wire!

Posted in Network Analytics, Network Forensics, Wireshark | Tagged , , , | Leave a comment