[Dirvish] Repair failed image.. is it possible?

hanj mailing at astarna.com
Tue Jan 9 16:30:21 UTC 2007


On Tue, 9 Jan 2007 15:55:51 +0000
Keith Lofstrom <keithl at kl-ic.com> wrote:

> On Tue, Jan 09, 2007 at 08:19:35AM -0700, hanj wrote:
> > Hello All
> > 
> > I have a interesting and annoying situation. On one remote server, I'm
> > having issues with corrupted MAC via SSH and my session disconnects.
> > This appears to be a hardware problem somewhere on my route.. and I'm
> > working with my ISP's network admins on the problems. Now.. the
> > question to you. When this happens, my dirvish image fails since I'm
> > disconnected in the middle of the backup.
> > 
> > Is it possible to repair the image? Currently, I have to delete the
> > dated folder and try again, and cross my fingers it doesn't fail on
> > this try. I would really like to just repair the image from the point
> > it failed.
> > 
> > I tried copying files, etc from 'good' images, but it doesn't see them
> > for the next pass the following day.
> 
> This is more an network question than a dirvish question - dirvish needs
> rsync to be working, and rsync needs the underlying network transport to
> be working.  I don't think you should be trying to run dirvish (or any
> other backup tool) over a network until you can get the network operating
> properly.  This can be due to many things, very likely a configuration
> problem since typical IP transport protocols are tolerant of lost packets,
> but intolerant of configuration errors that continuously misdirect them.
> 
> Sometimes the "configuration error" is a zombied machine somewhere on
> the path.  Do not rule out enemy action.  While ssh can tunnel through
> hostile networks, it will get confused and have to restart a lot if
> another machine is pretending to be one of two legitimate endpoints. 
> However, it is more likely to be something like a iptables and NAT
> misconfiguration - this has happened to me, and I fixed it mostly by
> careful reading of the iptables docs and proper configuration rather
> than by observing packets.
> 
> You will need a network guru, not an rsync guru, for now.  If you need
> to build test cases that stress a probably-working network, rsync can
> be good for that, but avoid the complexities of dirvish and build some
> simplified test cases.  For example, use rsync alone to copy directories
> between two machines, identical process each time (same initial source
> and target data).  If you get varying results from identical rsync copy
> processes and you cannot figure out what is happening from the tcpdump
> logs, then pick an easier-to-understand application than rsync.  


Hello

Thanks for writing. I understand your point. The network issue is being
addressed. It's most likely a faulty interface on a router on one of
the hops to my colo server. I verified that my home office route causes
this problem to any server in their facility, but I'm fine from other
networks.

The problem is that, they're working on it. Been working on it.. but in
the mean time, some of my daily backups fail. I understand your point
about not running backups during this phase, but I would feel much more
comfortable having backups while they work on the problem on their end.

The SSH corrupted MAC error is basically a dropped packet. SSH can
'reset' like normal tcp/ip traffic.. so it drops the connection. This
is intermittent, so backups can sometimes work. I can continue where it
left off by issuing rsync command (listed in summary) pointing to the
image it just failed on. It completes the backup, but the following day
it still think it's missing those files. Must be a link or some other
magic that dirvish does after the rsync phase.

As I said, my ISP is working on the problem, but the suspected
interface/device is outside of their direct network and convincing the
upstream provider that they have a problem is time consuming.

Thanks again!
hanji



More information about the Dirvish mailing list