Steganographic Data File Searches

  • Thread starter TimeSkip
  • Start date
  • Tags
    Data File
In summary: When you read the file, you're reading each character and number one by one from the file, as 8-bit bytes.But... when you save the file, you're not writing down the characters and numbers in 8-bit bytes.You're writing down the characters and numbers as Unicode (or ANSI, or UTF-8) characters.Each character in the text file has a unique Unicode (or ANSI, or UTF-8) code point.And... when you read the file again... each character and number is read in as a Unicode (or ANSI, or UTF-8)
  • #1
TimeSkip
44
4
<moved to General Discussion, posts that ask for thoughts are not hard science>

Summary:: Soon?

I've been thinking whether in the present time if not near future, or even already something common nowadays amongst intelligence agencies would be the use of steganographic data file searches.

Just as the term means, it would consist of looking at the data-file type and seeing what other files on the internet would posses such a similar format among many other files and then looking at their steganographic data signature. I'm pretty sure one would have to rootkit the memory controller to do a hard drive scan and be able to perform a search of child pornography or terrorism related content.

Just as an example I think one could write in an .txt format some kind of statement, and instead of scanning the file signature, one would look at the actual content of the file based on its steganographic data file content in 1's and 0's as saved on some cloud or hard drive or SSD.

Is this something that goes on nowadays or will be possible in the near future to thwart known databases or online outlets of child pornography with a comparison to known content in law enforcement agencies?

The only way I can imagine how to circumvent this would be encryption or compression (to a lesser extent).

Thoughts?
 
Last edited by a moderator:
Computer science news on Phys.org
  • #2
There are too many ways to hide covert information in a computer file, or in a transfer protocol. Every type of file has a statistical fingerprint, and most can be tested for integrity. The files that fail an integrity test, identify the covert operators.
Start here; https://en.wikipedia.org/wiki/Steganography

Traffic analysis will lead you to covert communication links faster than the examinination of file contents.

TimeSkip said:
The only way I can imagine how to circumvent this would be encryption or compression (to a lesser extent).
Do you want to circumvent the use of steganography by others, or do you want to use steganography without being detected?
Who are you investigating? Are they the suspect?
What are you trying to hide? Are you the criminal?
 
  • #3
Anything hidden using well known publicly available software is likely to be found by detailed forensic analysis by well informed security teams.
 
  • #4
Baluncore said:
There are too many ways to hide covert information in a computer file, or in a transfer protocol. Every type of file has a statistical fingerprint, and most can be tested for integrity. The files that fail an integrity test, identify the covert operators.
I'm not sure we're on the same page; but, can a sound file, such as a phone conversation in widely known format have a steganographic signature?

Let me give an example.

Person 01 is talking about some terrorism. If we scan the data file of the cell conversation for data file signatures containing "terrorism" being mentioned, then that could lead to a hit. Is it possible for the data file to inform the investigator of what is being talked about?

Similarly, assuming that we can scan .wav or even compression algorithms for a mention of "CP", then could this lead to a hit?

I'm not talking about hiding information steganographically, but rather content found through scanning a data file for information (itself) with a signature of the terms leading to a hit?
 
  • #5
This is all too hypothetical, and there are too many possibilities.
Almost anything is possible in some unlikely situation.
 
  • #6
TimeSkip said:
steganographic

You keep using that word. I do not think it means what you think it means.
 
  • Like
Likes hmmm27
  • #7
Baluncore said:
This is all too hypothetical, and there are too many possibilities.
Almost anything is possible in some unlikely situation.
Well, let me provide an example:

You have a .txt file with the word "terrorism" written in it, and then you save it on your hard drive or SSD or whatever storage system you may have. Does the information saved on your storage drive have a unique information signature?

Vanadium 50 said:
You keep using that word. I do not think it means what you think it means.
The above is all I mean by a steganographic signature based on the unique signature of the information saved on a storage drive.

Thanks.
 
  • #8
First let me state that what I'm about to describe has nothing to do with steganography. Steganography is something altogether different.

TimeSkip said:
Well, let me provide an example:

You have a .txt file with the word "terrorism" written in it, and then you save it on your hard drive or SSD or whatever storage system you may have. Does the information saved on your storage drive have a unique information signature?

I'm not sure what you mean. But every unique block of information has a unique signature, if you define the "signature" as being the block of information itself. If that sounds trivial, it may be because I'm not sure what you mean by "signature."

But I'll try to help.

Let's assume that your text file is stored in some sort of simple, 8-bit ASCII based format (examples include "ANSI", "Unicode" or "UTF-8").

Each letter corresponds to pattern of binary 1s and 0s.

't' = 0x74 = 0111 0100
'e' = 0x65 = 0110 0101
'r' = 0x72 = 0111 0010
'r' = 0x72 = 0111 0010
'o' = 0x6f = 0110 1111
'r' = 0x72 = 0111 0010
'i' = 0x69 = 0110 1001
's' = 0x73 = 0111 0011
'm' = 0x6d = 0110 1101

So, if you happen to know a priori that the file stores text data in simple, 8-bit ASCII based format, and you want to know if the file contains the word "terrorism," just look for the bit pattern,

0111 0100 0110 0101 0111 0010 0111 0010 0110 1111 0111 0010 0110 1001 0111 0011 0110 1101

Okay, so now suppose that the file format is not a simple, 8-bit ASCII based format, but maybe some 16-bit format (where each character is represented by 16 bits), and the format doesn't resemble ASCII at all. Well, if you know what the file format is, just create the bit pattern that corresponds to "terrorism" in that format (whatever that happens to be) and search for that.

None of this has anything to do with steganography though. It's just simple pattern matching.
 
  • #9
Steganography in computing terms normally means hiding one file inside another, usually a picture file but it can be any file. If you want to keep data secure then your better off going down the encryption route.
 
  • #10
TimeSkip said:
You have a .txt file with the word "terrorism" written in it, and then you save it on your hard drive or SSD or whatever storage system you may have. Does the information saved on your storage drive have a unique information signature?
Yes. Statistical analysis will indicate strongly that it contains ascii text.

Steganography is something quite different.
 
  • #12
collinsmark said:
First let me state that what I'm about to describe has nothing to do with steganography. Steganography is something altogether different.
I'm not sure what you mean. But every unique block of information has a unique signature, if you define the "signature" as being the block of information itself. If that sounds trivial, it may be because I'm not sure what you mean by "signature."

But I'll try to help.

Let's assume that your text file is stored in some sort of simple, 8-bit ASCII based format (examples include "ANSI", "Unicode" or "UTF-8").

Each letter corresponds to pattern of binary 1s and 0s.

't' = 0x74 = 0111 0100
'e' = 0x65 = 0110 0101
'r' = 0x72 = 0111 0010
'r' = 0x72 = 0111 0010
'o' = 0x6f = 0110 1111
'r' = 0x72 = 0111 0010
'i' = 0x69 = 0110 1001
's' = 0x73 = 0111 0011
'm' = 0x6d = 0110 1101

So, if you happen to know a priori that the file stores text data in simple, 8-bit ASCII based format, and you want to know if the file contains the word "terrorism," just look for the bit pattern,

0111 0100 0110 0101 0111 0010 0111 0010 0110 1111 0111 0010 0110 1001 0111 0011 0110 1101

Okay, so now suppose that the file format is not a simple, 8-bit ASCII based format, but maybe some 16-bit format (where each character is represented by 16 bits), and the format doesn't resemble ASCII at all. Well, if you know what the file format is, just create the bit pattern that corresponds to "terrorism" in that format (whatever that happens to be) and search for that.

None of this has anything to do with steganography though. It's just simple pattern matching.
This is precisely what I mean; but, down to the very way the ASCII information is stored as information on a storage device. I will concede that steganographic searches were meant to assume that every instance of, for example, utilizing the words such as "I'm going to bomb, (such) country." in a cell phone conversation saved in something like .mp3 or .wav or .flac all have unique signatures for each data file format.

All that then one ought to do is run a search query for the unique signatures in the device storage through the memory controller. One doesn't necessarily have to have the file available to open or inspect necessarily.
 
  • #13
I can't edit the OP, otherwise I would substitute something instead of "steganographic", and haven't come across this type of method of analysis yet.
 
  • #14
TimeSkip said:
This is precisely what I mean; but, down to the very way the ASCII information is stored as information on a storage device. I will concede that steganographic searches were meant to assume that every instance of, for example, utilizing the words such as "I'm going to bomb, (such) country." in a cell phone conversation saved in something like .mp3 or .wav or .flac all have unique signatures for each data file format.

All that then one ought to do is run a search query for the unique signatures in the device storage through the memory controller. One doesn't necessarily have to have the file available to open or inspect necessarily.
Of course the way one might say something on a phone conversation would have unique characteristics due to different speech patterns. Yet, this more pertains to a stable input medium like .txt or computerized speech with hard phonetic's and proper grammar form.
 
  • #15
I feel like you are still misunderstanding how these things work. The "signature" that @collinsmark mentioned is nothing else but the text itself. Yes, it has a specific representation (which is what Collin tried to explain to you) but in no way this representation is a "signature" - when looking for the word "terrorism" you just search the text for that word, not for any specific property/signature/whatever which is not a simple content of the file.

So no, even assuming you didn't mean steganography there is still nothing of the kind you are looking for, there is just the content of the file, the message itself.
 
  • #16
TimeSkip said:
All that then one ought to do is run a search query for the unique signatures in the device storage through the memory controller. One doesn't necessarily have to have the file available to open or inspect necessarily.
This doesn't really make sense. The memory controller is not what is used for reading files.
You should be thinking harder about how you would go about getting access to the data.
The usual scenario is that you run a program on the device. There are programs that look for childporn, by computing hashes from all the files and comparing them to hashes of known childporn. A childporn collector is almost certain to have some files that have been seen before.
Audio conversations will be rarely saved on a phone or computer. You will have to listen into some network. If you are law enforcement, you'd get a warrant for a wiretap to serve on the internet provider. You'd have to use speech recognition software to look for words.
 

FAQ: Steganographic Data File Searches

What is steganography?

Steganography is the practice of hiding information or data within another file, such as an image or audio file, in order to conceal its existence. This can be done by altering the file's data or by using special software to embed the hidden information.

How do steganographic data file searches work?

Steganographic data file searches involve using specialized software or algorithms to scan files for hidden data. This can include analyzing the file's metadata, examining the file's binary code, or looking for patterns or discrepancies within the file's data that may indicate the presence of hidden information.

What types of files can be used for steganography?

Almost any type of digital file can be used for steganography, including images, audio files, videos, and even text documents. However, certain file formats may be more commonly used due to their ability to conceal data more effectively.

Why would someone use steganography?

Steganography can be used for a variety of reasons, such as hiding sensitive information from prying eyes, embedding secret messages for covert communication, or bypassing censorship or surveillance. It can also be used for data protection and digital watermarking.

How can steganographic data file searches be detected?

Detecting steganography can be a difficult task, as it often requires specialized tools and techniques. However, some common methods include analyzing file sizes and comparing them to expected sizes, using steganalysis software to scan for hidden data, or using visual inspection to look for anomalies in the file's appearance.

Back
Top