[Dirvish] USB drives causing server lockups.(somewhat ontopic)

Keith Lofstrom keithl at kl-ic.com
Thu Aug 21 17:22:37 UTC 2008


On Thu, Aug 21, 2008 at 08:04:00AM +0100, Noel Kelly wrote:
> Keith's advice notwithstanding, I found a work around for crappy USB 
> hardware can be to renice the process and limit the bandwidth:
> 
> nice --adjustment=10 rsync -rltDvv --progress --no-whole-file --delete 
> --exclude=Temp --bwlimit=1000 /home/shared/Backups/ 
> /media/usb-storage-10000E000CC31A5D\:0\:0\:0p1/Backups/

It is good that this works for you. 

The problem that I saw was unrelated to average bandwidth, but
instead to total bytes written.  The problem may have been caused
by byte rate, that is, the time spacing between bytes sent down
the cable, but the --bwlimit option on rsync only controls average
packet rate in user space, with the intention of leaving some
network bandwidth available for other tasks.  Rsync otherwise runs
many parallel transfers and fills the pipe as much as possible.

My tests are on the wiki page; they are simple high speed writes
of many large files, which is easier to characterize than rsync's
mix of random-sized reads and writes.  Most of the time, with 2.4
kernels and problematic hardware, the process would work for some
random period of time, writing hundreds of gigabytes before locking
up.  The average time was very roughly an hour, but sometimes it
would go for five minutes, and sometimes 3 hours.  This was 
independent of machine and USB2 controller.

Different USB2 hardware did not exhibit the same problems.  The
problem tracked chipsets, with the Cypress chipset being problematic,
the Genesys and Myson chipsets succeeding, and the JMicron chipset
not working at all.  Looking at the data sheets, the Cypress chipset
has double buffering (2x128 bytes IIRC) to make it faster, and for
short bursts it is faster.  But hardware design for double buffering
is tricky, and Cypress may have goofed, relying on some quirk of the
Windows driver to keep them out of trouble.  Even that may be too
generous to Cypress.   Usually, the chip designers and system
integrators do not do extensive testing, just hook a prototype up
to a Windows machine for a few hours and watch it work, then mass
produce and ship.  Yuck.

I vaguely recall trying my big read/writes from a vmware guest
instance running on a Linux host, and not seeing lockups.  That
was hard to explain, but it did not solve my problem, so I did
not explore it deeply.  

After banging my head on the problem for a week or two, including 
contacting Linux/USB driver writers, and the hardware manufacturers
and getting brushed off, and a failed project with a kernel driver
wannabee, I gave up and set up the wiki page to share information
on what brands and models of external USB2 hardware do and don't work.  
Again, I suggest using E-SATA instead.

Keith

-- 
Keith Lofstrom          keithl at keithl.com         Voice (503)-520-1993
KLIC --- Keith Lofstrom Integrated Circuits --- "Your Ideas in Silicon"
Design Contracting in Bipolar and CMOS - Analog, Digital, and Scan ICs



More information about the Dirvish mailing list