[Bash] Check if file is binary or text?

Need help with C, C++, perl, python, etc?

[Bash] Check if file is binary or text?

Postby thamarok » 2007-03-16 17:17

Hello!

Is it possible to have a Bash script which would check if a file is either in a binary form (ELF Executable, PE Executable, Linked Library, etc..) or textual form (Any file which contains plain text.. like /var/log/dmesg)?

Thanks in advance!
thamarok
 

Postby Scotti » 2007-03-16 18:28

Interesting. I'm curious myself.

I found this, not sure if it's what you're looking for: http://tldp.org/LDP/abs/html/fto.html
:?
Scotti
Moderator Team Member
 
Posts: 312
Joined: 2005-11-08 01:13

Postby thamarok » 2007-03-16 18:37

Scotti wrote:Interesting. I'm curious myself.

I found this, not sure if it's what you're looking for: http://tldp.org/LDP/abs/html/fto.html
:?
Nothing found on that site either :?
Also note that not every application has execution permissions, so watching the permissions of the file won't help much.
thamarok
 

Postby lacek » 2007-03-16 19:26

Use the 'file' command. It is contained in the package named 'file'.
It's output is like this:
Code: Select all
~# file *
atisysteminfo-report.txt: ASCII English text
c.txt:                    ASCII text
Mail:                     directory
MyTest.class:             compiled Java class data, version 46.0
MyTest.java:              ISO-8859 Java program text
portdef.props:            ASCII text
vpd.properties:           ASCII text, with very long lines
xiclotl-wake:             Bourne-Again shell script text executable

You can use this information to decide what kind of file you are looking at.
lacek
Moderator Team Member
 
Posts: 769
Joined: 2004-03-11 18:49
Location: Budapest, Hungary

Postby thamarok » 2007-03-16 19:40

Ever heard "programmatically"? I don't want to be rude, but I wouldn't be asking for a Bash script if I wouldn't be developing something.. Also note that there are millions of file formats that are binary.. so making an extremly slow interpreter with millions of checks wouldn't be very professional.
Last edited by thamarok on 2007-03-16 22:05, edited 2 times in total.
thamarok
 

Postby lacek » 2007-03-16 19:50

Ok, this depends on what is "programmatical". I thought that this:

Code: Select all
[ -n "`file $1|grep text`" ] && {
    echo "$1 is a text file"
} || {
    echo "$1 isn't a text file"
}

is a programmatical approach. After all, it is a program which makes the guess. It you are against using external programs in bash scripts, keep in mind that even 'cd' is an external program... :-)
lacek
Moderator Team Member
 
Posts: 769
Joined: 2004-03-11 18:49
Location: Budapest, Hungary

Postby thamarok » 2007-03-16 19:54

"file" doesn't understand every file format. In Windows there is a simple API that check if the file is binary or not, so I am sure it can be done on Linux too. Also, some file formats which contain only plain ASCII text don't have "text" in their "file" description.
Last edited by thamarok on 2007-03-16 22:05, edited 1 time in total.
thamarok
 

Postby lacek » 2007-03-16 20:22

Also, some file formats which contain only plain ASCII text don't have "text" in their "file" description

The -i switch can be of help. It makes 'file' slightly faster as well.

'file' indeed doesn't recognize every binary file format, as this sounds like an impossible mission... However, for binary formats aren't recognized, file outputs 'data'. So you can still guess that this file is a binary one.
I can agree that calling 'file' on every program isn't a fast thing to do, however, if you want to gain speed you should consider not having a bash script anyway. Using a non-interpreted language would be much faster.

Just out of curiosity: why do you need to decide if a file is binary in the first place? Scanning through the filesystem, recording the changes and removing the new files upon user request is all you want to do. Am I wrong?
lacek
Moderator Team Member
 
Posts: 769
Joined: 2004-03-11 18:49
Location: Budapest, Hungary

Postby thamarok » 2007-03-16 20:56

Yup that's what I want to do.
thamarok
 

Postby hcgtv » 2007-03-16 21:13

thamarok wrote:The program works like this: You click on "scan system" and it will make a snapshot of the current state of the filesystem. Then the user tweaks and installs whatever (s)he likes and then clicks on "scan system" again, then the program will compare both snapshots and give the user an easy to understand summary of all the new, deleted and updated files.

The snapshot could be made with rsync, then for the comparison do:

-n, --dry-run show what would have been transferred
Bert Garcia - When all you have is a keyboard
User avatar
hcgtv
 
Posts: 518
Joined: 2006-11-17 23:03
Location: Charlotte, NC

Postby thamarok » 2007-03-16 21:35

REMOVED
Last edited by thamarok on 2007-03-16 22:05, edited 2 times in total.
thamarok
 

Postby Fluenza » 2007-03-16 21:39

thamarok wrote:Thanks, but:
[magnify]
AND NOTE TO EVERYONE: I DON'T CARE IF THERE IS ALREADY SUCH AN APPLICATION SO DON'T RECOMMEND ME ANYTHING.
[/magnify]


Are you just wanting to write this app so that you can learn to code bash scripts? Hmmm, I should spend some time learning to write bash scripts myself. :idea:
Visualize, Describe, Direct (VDD)
Common Operational Picture (COP) --> Common Operational Response (COR) --> Common Operational Effect (COE)
User avatar
Fluenza
 
Posts: 245
Joined: 2006-11-22 18:44
Location: Fog of War

Postby thamarok » 2007-03-16 21:45

A client asked for this so I wanted to help him, but after looking up all the resources I know, I ended up with no solution, so I asked here.
thamarok
 

Postby Grifter » 2007-03-17 02:55

first of all file isn't slow, it's incredibly fast

second, WHAT!? Ok granted it's been a while since I used windows, but when you renamed a .exe file to .txt in my day, it would happily open the binary file in a text editor

using a script to determine if a file is binary would take up far more resources than using the command file, and while file doesn't have the specs on every conceivable data file ever created, it has the ones that matter (and it's a long list), second, if a file is binary but is unknown to file, it will name it "data"
Eagles may soar, but weasels don't get sucked into jet engines...
Grifter
 
Posts: 1572
Joined: 2006-05-04 07:53
Location: Svea Rike

Postby thamarok » 2007-03-17 07:38

I think I will start with HEX values.. thanks anyway..
thamarok
 

Next

Return to Programming

Who is online

Users browsing this forum: No registered users and 8 guests

fashionable