- Problem: Syncing files between two computers in different time-zones
- Posted by Sebastian G. on May 10th, 2008
R.Wieser wrote:
So would touch -t $TIMESTRING -s for the timestamps. Your point being? If
you want to be sure, you have to do full comparisons (probably guided by
using cryptographically secure hashes).
- Posted by R.Wieser on May 10th, 2008
Hello Sebastian,
I'm sorry, but what are you trying to proove here (if anything) ? That you
know of other methods to break a/any backup- scheme ?
Yeah, yeah. I considered several methods and have not quite decided yet.
Sebastian G. <seppi@seppig.de> schreef in berichtnieuws
68ltl6F2s5doeU1@mid.dfncis.de...
- Posted by Jerry Coffin on May 11th, 2008
In article <4825ba0f$0$5403$e4fe514c@dreader29.news.xs4all.nl >,
address@not.available says...
I didn't really mean it as a suggestion as much as an explanation of how
(some) backup programs work, and why they have a much easier time making
things work than you do -- the job you've decided to take on is much
more difficult than theirs.
--
Later,
Jerry.
The universe is a figment of its own imagination.
- Posted by Jerry Coffin on May 11th, 2008
In article <4825f95e$0$3023$e4fe514c@dreader15.news.xs4all.nl >,
address@not.available says...
Even using this doesn't really relieve you of the problem. Hashing does
a fine job of telling you whether two files are the same or different --
but if they're different, it does nothing to tell you which one is
newer.
My own suggestion would be to keep the data in a completely separate
file where nothing else has any business messing with it at all. In that
file, I'd store all the times as something like GMT -- i.e. time since
some epoch, and never convert to anything resembling local time (if you
can help it). You'll probably need a separate conversion from the file's
timestamp to your internal timestamp for every file system (and possibly
even different implementations of some filesystems, especially FAT).
From there, the hashing comes into play primarily when/if you decide you
want to support incremental updates -- i.e. break a file into chunks,
and only send chunks that have changed, rather than copying the whole
file regardless of how small a change took place. The gain from this
varies widely, but if there's any chance of needing to support things
like large databases, it becomes _extremely_ useful in a hurry.
You might want to study rsync (and associated white papers) for some
ideas about how to do that sort of thing.
--
Later,
Jerry.
The universe is a figment of its own imagination.
- Posted by Sebastian G. on May 11th, 2008
Jerry Coffin wrote:
Indeed, it is an optimization for a scenario where the majority of files
doesn't change. And if you create the hashes while transferring the data to
the backup media and store them there, at the next run you only have to read
in all the files from the disk, but no need for reading the files from the
backup for doing the comparison - unless you find that a file on the disk
has changed.
- Posted by R.Wieser on May 13th, 2008
Hello Jerry,
That was/is one of my ponderings too. But I made it more difficult for
myself by thinking of the situation in which the stored file could get
changed *without* using the backup-program (like a quick update of the
stored data by simply copying the file) ...
Somehow I must be able to detect that. But the very reason for which I
should than use such a seperate database-file seems than allso the reason
why I can't do that (can't depend on any of the remote files timestamps)
:-) *if* you could find a dependable way to calculate GMT from the local
files UTC that is.
Imagine changing the computers timezone and what it will do to the UTC-time
of the file :-(
Windows seems to muck around a files UTC(?) time depending on your timezone
and summer/wintertime it is anyones guess what you will see when you look at
a file thats not saved in your timezone and/or summer/wintertime setting.
It gets especially funny as seems to it even "adjusts" the Last-written
UTC-time read from the *remote* drive :-\
I'm looking this way and that, but can't seem to find a combination which
will cover all (imagined?) problems.
Regards,
Rudy Wieser
Jerry Coffin <jcoffin@taeus.com> schreef in berichtnieuws
MPG.22901e5bb32b36f1989cb2@news.sunsite.dk...
- Posted by Jerry Coffin on May 15th, 2008
In article <48297847$0$15728$e4fe514c@dreader26.news.xs4all.n l>,
address@not.available says...
Store the hash of the file in your database, and only treat the file as
changed when the hash changes.
You only want to use the timestamps to indicate which version of the
file is newer after you've verified that one or the other has changed.
Yes, that's nontrivial but at least splitting it up into pieces makes it
a bit more manageable than trying to figure out the difference between
any two arbitrary file systems.
Nothing. The computer isn't going to go back and re-stamp all files when
the timezone changes. It may translate that differently for you when you
read it, but I'd be _very_ surprised if the timestamp itself was
changed.
That seems to me to make life easier. Use the hash to figure out whether
the file has changed. If it has, you compare timestamps only to figure
out which is newer -- and since adjustments seem to be done in sync, you
can still compare them directly.
This sort of thing is why there are lots of research papers written
about how to implement multi-master distributed file systems -- and why
a lot of early ones were single-master instead.
--
Later,
Jerry.
The universe is a figment of its own imagination.
- Posted by R.Wieser on May 15th, 2008
Hello Jerry,
As a replacement of the file-time check. Yep, makes sense. As the local
file-time of a file can't be retrievend dependably (the returned UTC
file-time is calculated from the stored, local filetime. But only for FAT,
not NTFS, which makes it even more interresting :-\ )
Quite so. That is why I did set the backup-storage system at Zero GMT, no
summer/winter-time switching.
Everything. As the UTC time-stamp itself is the *only* thing I seem to be
able to retrieve and its adjusted from the stored, local filetime depending
on time-zone and summer/winter-time settings I might as well say that those
file-times *are* changed.
Not really. All it does now is to make it impossible to me to depend upon
the retrieved file-date *from the remote storage*. (funny though, only the
last-written time is adjusted, not the other two. But alas, the remote
system does not allow me to change those two)
It allso makes it impossible for me to do a (simple) check if the file I
wrote data for in that database is the same one as currently in backup.
And that leads me back to my origional problem ... (either that, or just
hope noone will change the file on backup)
:-) and I thought I could make a quick backup program. It looks like
Windows makes even the simple things difficult.
Regards,
Rudy Wieser
Jerry Coffin <jcoffin@taeus.com> schreef in berichtnieuws
MPG.22953c9d5ea172f4989cc0@news.sunsite.dk...
- Posted by Jerry Coffin on May 16th, 2008
In article <482c90f0$0$15740$e4fe514c@dreader26.news.xs4all.n l>,
address@not.available says...
Hello Rudy,
Thinking about it a bit more, I think the hashing really deals with
almost all possibilities pretty well. You really only have three
possibilities:
1) Both copies match their hash: no modifications, no copying needed.
2) Only one copy matches: the one that doesn't match is newer.
3) Neither copy matches: you have conflicting changes.
The third case is the one that I was thinking of dealing with via time
stamps -- but in reality, time stamps don't tell you much. If both
copies of the file have been changed independently, the relative timing
of the two changes doesn't really mean anything. For example, if you and
I both edit a file, the fact that I saved it 3 minutes later (or
earlier) than you did doesn't make my change "better" (more worthy of
saving) than yours (or vice versa). Some cases like this can probably be
handled based on file type, but in the worst case, you probably just
need to keep both around and let a user decide what to do.
--
Later,
Jerry.
The universe is a figment of its own imagination.
- Posted by R.Wieser on May 16th, 2008
Hello Jerry,
This is something I did not think of. Although comparing the hash to the
stored copy is not quite what I would want, as I than would need to retrieve
all that (remote!) data first (which could be quite slow)
Especially not if you can't depend them on staying the same :-)
Thanks for the explanation, I'll have to think about it some more.
Regards,
Rudy Wieser
Jerry Coffin <jcoffin@taeus.com> schreef in berichtnieuws
MPG.2296c2bb313f467a989cc5@news.sunsite.dk...
- Posted by Jerry Coffin on May 18th, 2008
In article <482dab87$0$29829$e4fe514c@dreader22.news.xs4all.n l>,
address@not.available says...
[ ... ]
Yup -- if you decide to do this, it'll probably be _well_ worthwhile to
create a small server program to run on the backup machine that just
lets you pass in a path to a file, and it just returns the hash of the
file.
--
Later,
Jerry.
The universe is a figment of its own imagination.
- Posted by R.Wieser on May 19th, 2008
Hello Jerry,
That could be a problem : the remote backup-storage machine is actually a
NAS drive running embedded software. But yes, it would be my idea too.
Regards,
Rudy Wieser
Jerry Coffin <jcoffin@taeus.com> schreef in berichtnieuws
MPG.229a1e8eef7fcedc989cc9@news.sunsite.dk...