Backups!

Aug. 6th, 2008 02:13 pm
ciphergoth: (Default)
[personal profile] ciphergoth
Further to my last post about backups, it looks like someone has written the backup tool that I wanted to exist:

http://duplicity.nongnu.org/

  • All cleverness is on the client - the server can be a dumb store like Amazon S3
  • Backups are therefore initiated on the client - good for sometimes-on machines
  • Backups can be encrypted and signed with GPG
  • It supports incremental backups of large files, using rdiff "signature files"
  • All in Python, appears quite new
Support for non-Linux systems seems to be pretty weak at the moment, but I don't see anything that would make it inherently hard except possibly the use of "tar" as a container.

Thoughts?

Date: 2008-08-06 01:30 pm (UTC)
From: [identity profile] marnanel.livejournal.com
Things I like about the modern world:

I will often say “Something like this should exist” and someone will say “Aha, it exists deep within the bowels of SourceForge”.

Date: 2008-08-06 01:33 pm (UTC)
From: [identity profile] ciphergoth.livejournal.com
The thing that gets me is, surely this is what practically everyone should be using? Have you seen the string-and-sealing-wax answers I got to my last post?

Date: 2008-08-06 01:46 pm (UTC)
From: [identity profile] marnanel.livejournal.com
I think everyone thinks this is a trivial problem which can be solved with the standard tools (and to *some* extent it is, but that's like saying all text processing problems can be handled by awk and sed and there was no need for Larry Wall to worry his head about perl), so everyone tries to scrape their own half-arsed solution together and nobody does it very well.

I think duplicity will be awfully useful in our house, since almost every paycheque we end up getting some new kind of backup thing and none of them work very well.

Date: 2008-08-06 02:30 pm (UTC)
From: [identity profile] skx.livejournal.com
Agreed.

I've seen it documented but so far its about the only backup system I've not used.

Right now I'm running backuppc, and rsnapshot on my systems.

The attraction of rsnapshot is that it uses hardlinks + rsync so space usage is minimal.

The attraction of backuppc is that it merges identical files on different hosts - so it requires even less disk space. (e.g. I'm backing up 10 Debian etch systems, so I have ten identical copies of /bin/ls - backuppc will notice that and only store one copy on disk :)

Duplicity looks very nice because of the encryption, but once I've seen the space-saving the backuppc achieves it is hard to move away from. (It uses hashes of file contents - so sticking in encryption would break it.)

Date: 2008-08-06 02:40 pm (UTC)
From: [identity profile] marnanel.livejournal.com
I've never tried duplicity, but I note the encryption is optional, and much of the lack of space-saving techniques is an artefact of the choice of tar as a target format (there's a paper on this subject on the site).

Date: 2008-08-07 08:43 pm (UTC)
From: [identity profile] http://users.livejournal.com/_lj_sucks_/
Practically everyone uses local disks to do backup. It's the "over the network" part that made your requirements hard to meet.

Date: 2008-08-06 01:35 pm (UTC)
From: [identity profile] hughe.livejournal.com
it is also included as part of 3 major vendor's distributions, according to their webpage...

Date: 2008-08-06 01:41 pm (UTC)
From: [identity profile] ciphergoth.livejournal.com
Sure, but we're talking about distributions with tens of thousands of packages - I don't think it's exactly fame at last.

"watch".

Date: 2008-09-08 08:25 pm (UTC)
From: [identity profile] grendelkhan.livejournal.com
I had that happen to me once.

Me: If only there were some tiny script or something which would automate checking on the results of a frequently-run command. Maybe something that would re-execute the same thing at a set interval and update the display with the results. Too bad I'm stuck either using some weird bashism or hitting up-enter every so often.
Coworker: "watch".

Memorable in that it was exactly the program I'd imagined as being perfectly useful for what I was doing at the moment, and it's already on the system if you have 'top' installed, which, come on, everyone does.

Re: "watch".

Date: 2008-09-08 08:27 pm (UTC)
From: [identity profile] marnanel.livejournal.com
Well. I somehow didn't know about watch until just now! Thank you.

Date: 2008-08-06 02:15 pm (UTC)
simont: A picture of me in 2016 (Default)
From: [personal profile] simont
I like rdiff-backup myself. It too is written in Python, it too supports incremental backups via rdiff technology, and oddly enough it too is hosted on nongnu.org :-).

It doesn't support encryption (at some point I must get round to solving that via encrypted losetup), but on the plus side its main backup directory looks basically like a mirror of the filesystem it was backing up in the first place – which makes it trivially easy to recover one file at a time. Since my backups rescue me from misaimed rm much more often than from whole-disk destruction, this seems to me like a useful property.

Date: 2008-08-06 04:49 pm (UTC)
From: [identity profile] phantas.livejournal.com
I was pointed to that when I made a lightning talk at the Portuguese Perl Workshop regarding my (long time on hiatus) project for remote secure backups.

Haven't had a look at it yet.

But daily secure remote sync through that and occasional time machine is my technique for backups.

Me thinks remote secure client-based backup syncs and Time Machine + Time Capsule For All are the way to go in the future.

Date: 2008-08-07 08:57 am (UTC)
From: [identity profile] pengshui-master.livejournal.com
I've noted in the past that since the rysnc algorithm doesn't need to actually be able to compute the signatures at the remote end - it just needs to know them.

So if the signatures are sent with an encrypted block, you could easily do encrypted backups, with the all the advantages of rsync.

And if you included a hash strong enough in your signature you could share blocks too, giving you the space advantage of backuppc.

The only downside is the block boundaries are fixed on the first backup which make intelligent choice of blocksize critical.

Date: 2008-09-04 02:58 am (UTC)
From: [identity profile] plexq.livejournal.com
[Bad username or site: marnanel'/ @ livejournal.com] very helpfully pointed me to this post. The only backup solution that most people care about is one that runs on Windows. Most people living in the real world need to use tools like Word and Visio and Photoshop, so a Linux only solution is pointless for us, even if we love Linux and use it every day on our servers. All the backup software I've used to date is lousy. Mirra didn't backup files that began with a ., Memeo can't backup to network shares on OS X, and doesn't reconnect to transient shares on Windows, Amanda is _way_ too complicated. Time Capsule only works on OS X and has major problems when you try to do a system restore. Am I just supposed to buy Veritas and be done with it? Maybe I'll just sit down and write some software myself. This isn't a hard problem.

Date: 2008-09-04 07:33 am (UTC)
From: [identity profile] ciphergoth.livejournal.com
Before writing your own, consider improving the Windows support in duplicity? I don't think there's anything inherently hard about that, it just isn't the main interest of the authors.

Date: 2008-11-07 04:39 pm (UTC)
lovingboth: (Default)
From: [personal profile] lovingboth
For data on a Windows system, I am still happy with Carbonite. Install, tell it that, yes, you do want media files backed up, and forget, with unlimited storage for $50/year. I currently have about 150GB with it.

For the system itself, it's disk image on a USB drive time.

Date: 2008-11-07 05:16 pm (UTC)
From: [identity profile] plexq.livejournal.com
Oh cool - I'll definitely have to check that out!

Date: 2008-11-07 04:36 pm (UTC)
lovingboth: (Default)
From: [personal profile] lovingboth
The downside for me would be the need to have significant spare space locally.

Date: 2008-11-07 04:39 pm (UTC)
From: [identity profile] ciphergoth.livejournal.com
duplicity can use remote as well as local targets, including support for Amazon S3. It's possible I'm misunderstanding you though?

Date: 2008-11-07 05:05 pm (UTC)
lovingboth: (Default)
From: [personal profile] lovingboth
From the FAQ:

"Duplicity may require lots of temp space sometimes, depending on the size of the volumes created..."

I'm guessing that you could arrange to have that remotely, albeit at a cost in time and possibly money.

But I know how every OS I have ever run behaves when its file systems approach or reach fullness. (Ubuntu has improved greatly, for example.) Having a reasonable speed net connection combined with no quota doesn't help. It is, in part, an addiction: there is so much good stuff out there, and I wants it my precious...

This has been one of the reasons I like what I have for Windows - it uses a delightfully small amount of local disk space.

Profile

ciphergoth: (Default)
Paul Crowley

January 2025

S M T W T F S
   1234
5678 91011
12131415161718
19202122232425
262728293031 

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Jul. 4th, 2025 11:45 pm
Powered by Dreamwidth Studios