Tech Support > Operating Systems > Linux / Variants > sorting filenames
sorting filenames
Posted by Leslie Ballentine on January 15th, 2004


I have a list of filenames, of the form

/dir1/dir2/.../dirN/file
../dir1/dir2/dir3/file
etc.

I wish to sort these alphabetically according to the last part of the
name, "file".
The trouble is that the number of subdirectories in the
path is variable. If the number of "/" characters were constant (say 3),
then I could define "/" as the field-separator, and do

sort -t '/' -k 4

to sort on the last (rightmost) field. But I cannot do this if the number
of fields is variable.

Does anyone have a readymade solution to this problem, or must I write a
program to parse the filenames?

--
Leslie Ballentine

Posted by Chris F.A. Johnson on January 15th, 2004


On Thu, 15 Jan 2004 at 22:13 GMT, Leslie Ballentine wrote:
awk -F "/" '{printf "%s\t%s\n", $NF, $0}' | sort | cut -f2-


--
Chris F.A. Johnson http://cfaj.freeshell.org
================================================== =================
My code (if any) in this post is copyright 2004, Chris F.A. Johnson
and may be copied under the terms of the GNU General Public License

Posted by John Hunter on January 16th, 2004


Leslie> Does anyone have a readymade solution to this problem, or
Leslie> must I write a program to parse the filenames?

It's an easy program in python, which has the added benefit of using
dir names for the sort in the case of filenames that compare the same

#!/usr/bin/env python

filenames = [
'/dir1/dir2/.../dirN/file1',
'/dir1/dir2/.../dirN/file2',
'./dir1/dir2/dir3/file0'
]

components = [fname.split('/') for fname in filenames]
for seq in components: seq.reverse()
components.sort()

for path in components:
path.reverse()
print '/'.join(path)


Posted by William Park on January 16th, 2004


John Hunter <jdhunter@ace.bsd.uchicago.edu> wrote:
For Heaven's sake, everything becomes a nail if hammer is all you have.
Shell is perfectly adequate for these kind of thing.

Rewrite the list with "file" at the front, ie.
file1 /dir1/dir2/.../file1
file2 /dir1/dir2/.../file2
...
Then, sort, and recover the original list.

--
William Park, Open Geometry Consulting, <opengeometry@yahoo.ca>
Linux solution for data management and processing.

Posted by Alan Connor on January 16th, 2004


On 16 Jan 2004 05:23:02 GMT, William Park <opengeometry@yahoo.ca> wrote:
#!/bin/sh

while read line; do

echo "`basename $line` $line" >> outputfile

done < inputfile

---------------------

then sort and run the sorted list through

sed 's/^.* //' sortlist > tempfile; mv tempfile finalfile


or something like that....

AC


Posted by Kirk Strauser on January 16th, 2004


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

At 2004-01-15T22:13:38Z, Leslie Ballentine <ballenti@sfu.ca> writes:

Got Perl? Sure you do!

find . | perl -e 'print sort { $c=$a;$d=$b;$c =~ s/^.*\///;$d =~s/^.*\///;$c cmp $d; } <>'

This *should* execute much faster than the equivalent set of shell commands.
- --
Kirk Strauser
The Strauser Group
Open. Solutions. Simple.
http://www.strausergroup.com/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)

iD8DBQFAB/2i5sRg+Y0CpvERAjh9AJ49Ablv5lonXTIncyYmhoteBe/zFACeK8XS
MLZ4Pm0Zk/wT0nFXmoThrDc=
=7nI6
-----END PGP SIGNATURE-----

Posted by Anonymous on January 16th, 2004


"LB" == Leslie Ballentine <ballenti@sfu.ca>:
LB> /dir1/dir2/.../dirN/file
LB> ../dir1/dir2/dir3/file
LB> I wish to sort these alphabetically according to the last part of the
LB> name, "file".
LB> The trouble is that the number of subdirectories in the
LB> path is variable. If the number of "/" characters were constant (say 3),
LB> then I could define "/" as the field-separator, and do

Insert a <TAB> character after the last "/", sort on second field
and then remove the extra <TAB> character:


$ cat <<EOF |
dir1/dir2/.../dirN/fileB
../dir1/dir2/dir3/fileA
../dir1/dir2/fileD
../dir1/dir2/dir3/fileC
EOF
sed 's@\(.*/\)@\1 @'| # there is a <TAB> between "1" and "@"
sort -k2|tr -d '\t'

gives:

../dir1/dir2/dir3/fileA
dir1/dir2/.../dirN/fileB
../dir1/dir2/dir3/fileC
../dir1/dir2/fileD

Posted by Kirk Strauser on January 16th, 2004


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

At 2004-01-16T15:28:24Z, Anonymous <nobody@nox.lemuria.org> writes:

Note that <TAB> is a legal filename character.
- --
Kirk Strauser
The Strauser Group
Open. Solutions. Simple.
http://www.strausergroup.com/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)

iD8DBQFACAlL5sRg+Y0CpvERAiZvAJ9hIKStsGrIiNoX5j6OXo vOP+cu0wCdEN9y
YkSotiP174AHsyvpdWCmAq0=
=uJo1
-----END PGP SIGNATURE-----

Posted by Alan Connor on January 16th, 2004


On Fri, 16 Jan 2004 15:10:05 GMT, Kirk Strauser <kirk@strauser.com> wrote:

I use ed(1) as my browser and read one line at time unless the poster has
been tagged in my index menu.

PGP sigs of this sort, that are not tucked away in the headers, are a clear
violation of netiquette, and as long as you persist in using them on the
Usnet, I will not be reading your posts.

I am willing to forego whatever undoubtedly excellent advice or insight
you have to offer in defense of the health of the Usenet.

PGP sigs now, but what next? Advertisements for OTHER free software?

Maybe ads in general?

Wouldn't it be wonderful to have an HTML ad for Viagra at the bottom of
every post?

It stops NOW.


The rest deleted unseen.

AC


Posted by Peter Köhlmann on January 16th, 2004


Alan Connor wrote:

Your choice. Stupid, but your choice

You may now tell us what RFC you found supporting your claims

Idiot
--
Hanlon's Razor: Never attribute to malice which can be equally well
explained by stupidity


Posted by Chris F.A. Johnson on January 16th, 2004


On Fri, 16 Jan 2004 at 18:28 GMT, Alan Connor wrote:
You've almost convinced me to start using a PGP signature my posts.

--
Chris F.A. Johnson http://cfaj.freeshell.org
================================================== =================
My code (if any) in this post is copyright 2004, Chris F.A. Johnson
and may be copied under the terms of the GNU General Public License

Posted by Chris F.A. Johnson on January 16th, 2004


On Fri, 16 Jan 2004 at 15:10 GMT, Kirk Strauser wrote:
I found this much faster than perl:

echo "$flist" |
awk -F "/" '{printf "%s\t%s\n", $NF, $0}' | sort | cut -f2- >/dev/null

On my machine, perl took about 10 times as long as awk/sort/cut on
the first run, presumably because perl was not already in memory.

Subsequent runs typically took about twice as long with perl.

--
Chris F.A. Johnson http://cfaj.freeshell.org
================================================== =================
My code (if any) in this post is copyright 2004, Chris F.A. Johnson
and may be copied under the terms of the GNU General Public License

Posted by Alan Connor on January 16th, 2004


On 16 Jan 2004 19:39:46 GMT, Chris F.A. Johnson <c.fa.johnson@rogers.com> wrote:

Sure. Go ahead. Prove to the world that you are a snot-snosed teenage
punk masquerading as an adult.

This post goes a long way towards establishing that fact all by itself.


AC


Posted by Jim Richardson on January 16th, 2004


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 16 Jan 2004 19:39:46 GMT,
Chris F.A. Johnson <c.fa.johnson@rogers.com> wrote:
Al doesn't like *other* people advertising software in their sigs, just
himself

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)

iD8DBQFACEp+d90bcYOAWPYRAg3VAJoDy5rlTyXAc9tmhLR18a UzjbcP8QCfUiB9
4rKevr9vHxSHFdd33c/cfgU=
=TQ/x
-----END PGP SIGNATURE-----

--
Jim Richardson http://www.eskimo.com/~warlock
To err is human...to really foul up requires the root password.

Posted by LEE Sau Dan on January 16th, 2004


Alan> #!/bin/sh

Alan> while read line; do
Alan> echo "`basename $line` $line" >> outputfile
Alan> done < inputfile

That can be replaced easily with awk:

awk '{print $NF" "$0}' FS=/ inputfile


Alan> ---------------------
Alan> then sort and run the sorted list through

Alan> sed 's/^.* //' sortlist > tempfile; mv tempfile finalfile

Too complicated!

cut -f2

is all you need.



Alan> or something like that....

Only in one line:

awk '{print $NF" "$0}' FS=/ inputfile | sort | cut -f2


It does worth to learn the tools 'cut', 'paste', 'uniq', 'join' and
also 'awk'. Even having a rough idea of what they do can be very
helpful when the need comes.


--
Lee Sau Dan +Z05biGVm-(Big5) ~{@nJX6X~}(HZ)

E-mail: danlee@informatik.uni-freiburg.de
Home page: http://www.informatik.uni-freiburg.de/~danlee

Posted by LEE Sau Dan on January 16th, 2004


Kirk> At 2004-01-15T22:13:38Z, Leslie Ballentine <ballenti@sfu.ca>
Kirk> writes:
Kirk> Got Perl? Sure you do!

Kirk> find . | perl -e 'print sort { $c=$a;$d=$b;$c =~
Kirk> s/^.*\///;$d =~ s/^.*\///;$c cmp $d; } <>'

Kirk> This *should* execute much faster than the equivalent set of
Kirk> shell commands.

Why is it faster than:
find . | awk '{print $NF" "$0}' FS=/ | sort | cut -f2
?



--
Lee Sau Dan +Z05biGVm-(Big5) ~{@nJX6X~}(HZ)

E-mail: danlee@informatik.uni-freiburg.de
Home page: http://www.informatik.uni-freiburg.de/~danlee

Posted by Noi on January 17th, 2004


On Thu, 15 Jan 2004 22:13:38 +0000, Leslie Ballentine thoughtfully wrote:

I agree with Chris F. about using gawk/awk as simple and repeatable.
# print name size and directory name, sorting and formatting
# into 4 colums and view using less
# gawk -F\/ means use "\" as delimiter
# RS == "" was to strip blank lines

$ find ~/files/music -type f -printf %p"\t"%s"\t"%t"\n" | gawk -F\/ '{RS
== ""; print $1, $6, $5, $4}' | sort -b | column -c4 -t -s^ | less

--
------------------------------------------------------
Linux registered user #302812
using Fedora Core 1 kernel 2.4.22-1.2115.nptl
------------------------------------------------------


Posted by Kirk Strauser on January 17th, 2004


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

At 2004-01-17T22:08:23Z, Noi <noi@siam.com> writes:

This does not work. <TAB> is a legitimate (if unlikely) filename character.
- --
Kirk Strauser
The Strauser Group
Open. Solutions. Simple.
http://www.strausergroup.com/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)

iD8DBQFACcjz5sRg+Y0CpvERAvs7AKCUayYLzrPhTSpK7TkygB 2/6av0GgCfSrE2
GeyY7xP+66E0/bOnDXiTgrU=
=WJYt
-----END PGP SIGNATURE-----

Posted by Alan Connor on January 18th, 2004


On Sat, 17 Jan 2004 23:45:06 GMT, Kirk Strauser <kirk@strauser.com> wrote:
I always wonder whether WHY people use these things on the Usenet.

What makes them think anyone is going to forge their name? 99.99% of the
people on the Usenet don't worry about such things.....What have they done
to piss someone of that badly?
Secondly, don't they know that almost no one cares what name they use as

long as their posts are worth reading?

Or that we don't know that Kirk Strausser may not be this person's name?
(you can get a public key in any name you want ----- you can get a dozen
of them and have each of them sign the other keys, making them look
quite valid)

Or that we don't know that they can just leave the PGP sig off and go
trolling at will?

Is it the same person here that posted with that PGP sig the last time?
Who KNOWS!!!?????

Almost NONE of us have the software to determine this, and we don't have
it because we don't CARE, and the dumb things don't really tell you
anything anyway....

Do they think that we really believe their name is <whatever> just because they
have a public key in that name?

And where did they get the fucking huge ego!!

I can tell you for certain that there are thousands of people on the Linux
newsgroups that have things to say that are just as worthwhile as what
this fellow has to say.

THEY don't need any PGP sigs cluttering up their posts...


AC

Posted by Peter Köhlmann on January 18th, 2004


Alan Connor wrote:

< snip typical AC PGP-rant >

Idiot
--
"Last I checked, it wasn't the power cord for the Clue Generator that
was sticking up your ass." - John Novak, rasfwrj



Similar Posts