Scheduled Maintenance: We are aware of an issue with Google, AOL, and Yahoo services as email providers which are blocking new registrations. We are trying to fix the issue and we have several internal and external support tickets in process to resolve the issue. Please see: viewtopic.php?t=158230

 

 

 

Suggestion: Differential software updates

User discussion about Debian Development, Debian Project News and Announcements. Not for support questions.
Message
Author
User avatar
edbarx
Posts: 5401
Joined: 2007-07-18 06:19
Location: 35° 50 N, 14 º 35 E
Been thanked: 2 times

Suggestion: Differential software updates

#1 Post by edbarx »

I think, software updates can be made faster by including only the affected files. The unchanged files need not be included in updates. I think, this reduces the download time considerably.
Debian == { > 30, 000 packages }; Debian != systemd
The worst infection of all, is a false sense of security!
It is hard to get away from CLI tools.

plugwash
Posts: 2507
Joined: 2006-09-17 01:10
Contact:

#2 Post by plugwash »

been suggested before but actually implementing it is easier said than done for both techincal and beurocratic reasons.

User avatar
rickh
Posts: 3434
Joined: 2006-06-29 02:13
Location: Albuquerque, NM USA

#3 Post by rickh »

I thought that's what "pdiff" was doing. I don't know if it's in Etch, but it seems to be working fine on my Lenny and Sid systems.
Debian-Lenny/Sid 32/64
Desktop: Generic Core 2 Duo, EVGA 680i, Nvidia
Laptop: Generic Intel SIS/AC97

User avatar
edbarx
Posts: 5401
Joined: 2007-07-18 06:19
Location: 35° 50 N, 14 º 35 E
Been thanked: 2 times

#4 Post by edbarx »

Thanks for answering me.
but actually implementing it is easier said than done...
I understand your point, because I practice programming as a hobby, obviously when I feel like it.
Debian == { > 30, 000 packages }; Debian != systemd
The worst infection of all, is a false sense of security!
It is hard to get away from CLI tools.

plugwash
Posts: 2507
Joined: 2006-09-17 01:10
Contact:

#5 Post by plugwash »

rickh wrote:I thought that's what "pdiff" was doing. I don't know if it's in Etch, but it seems to be working fine on my Lenny and Sid systems.
pidiff only diffs the package lists not the packages themselves.

User avatar
edbarx
Posts: 5401
Joined: 2007-07-18 06:19
Location: 35° 50 N, 14 º 35 E
Been thanked: 2 times

#6 Post by edbarx »

So, I will try to imagine a way how this [see title of topic] can be done
  • 1) download the package's modified files
    2) rebuild the package using the downloaded files and the files on the client computer
    3) when all the required packages are rebuilt, install the packages using the debian's package management system
Debian == { > 30, 000 packages }; Debian != systemd
The worst infection of all, is a false sense of security!
It is hard to get away from CLI tools.

User avatar
mzilikazi
Forum Account
Forum Account
Posts: 3282
Joined: 2004-09-16 02:14
Location: Colorado Springs, CO

#7 Post by mzilikazi »

edbarx wrote:So, I will try to imagine a way how this [see title of topic] can be done
  • 1) download the package's modified files
    2) rebuild the package using the downloaded files and the files on the client computer
    3) when all the required packages are rebuilt, install the packages using the debian's package management system
I don't see how rebuilding packages would be faster than simply installing the new ones.
Debian Sid Laptops:
AMD Athlon(tm) 64 X2 Dual-Core Processor TK-55 / 1.5G
Intel(R) Pentium(R) Dual CPU T2390 @ 1.86GHz / 3G

User avatar
edbarx
Posts: 5401
Joined: 2007-07-18 06:19
Location: 35° 50 N, 14 º 35 E
Been thanked: 2 times

#8 Post by edbarx »

Ok. I don't mean "rebuild the packages" using the compilers, but using something like the command "dpkg-repack".

Alternatively [ie ignoring what I said earlier], the update can be done as follows: once the modified (and if applicable compiled) files are downloaded on the client computer, the outdated files can be replaced by the respective updated files. Finally, if necessary, a "dpkg-reconfigure" command can be run for those packages which may need it.
Debian == { > 30, 000 packages }; Debian != systemd
The worst infection of all, is a false sense of security!
It is hard to get away from CLI tools.

plugwash
Posts: 2507
Joined: 2006-09-17 01:10
Contact:

#9 Post by plugwash »

the devil is in the details.

you would have to

1: create a system for comparing two debs and generating some kind of "patch deb" containing the modified files (not too hard)
2: integrate theese patch debs into the debian infrastructure tools including some system to decide what patch debs to keep at any time (harder).
3: integtate theese patch debs into the debian package management tools (about as hard as 2) remembering to include a fallback system to fetch the whole deb when say a file that should have been there from the previous version is not there or currupted.
4: get the debian powers that be to accept your system (even harder).

ajdlinux
Posts: 2452
Joined: 2006-04-23 09:37
Location: Port Macquarie, NSW, Australia

#10 Post by ajdlinux »

Debdelta is a system that does something like this, using xdelta to binary diff the two packages. It's not the most bandwidth-efficient system, but it's better than nothing. Get it through APT, then look through its docs directory.
Jabber: xmpp:ajdlinux@jabber.org.au
Spammers, email this: ajdspambucket@exemail.com.au

User avatar
edbarx
Posts: 5401
Joined: 2007-07-18 06:19
Location: 35° 50 N, 14 º 35 E
Been thanked: 2 times

#11 Post by edbarx »

Thanks for suggesting me "debdelta". Now, I can understand what are the difficulties. You (plugwash) are right to say, that the most difficult part is in convincing people, rather than developing a package that does what I said. It is always not easy to convince people. I cannot blame the debian people for holding on their beliefs.
Debian == { > 30, 000 packages }; Debian != systemd
The worst infection of all, is a false sense of security!
It is hard to get away from CLI tools.

ajdlinux
Posts: 2452
Joined: 2006-04-23 09:37
Location: Port Macquarie, NSW, Australia

#12 Post by ajdlinux »

It's also about managing a transition to the new system - how do you justify a few extra gigs of files to hundreds of mirror operators? How do you find the CPU power to prepare the diffs? etc. etc.

It's been debated a few times on -devel IIRC - fairly interesting discussions.
Jabber: xmpp:ajdlinux@jabber.org.au
Spammers, email this: ajdspambucket@exemail.com.au

User avatar
edbarx
Posts: 5401
Joined: 2007-07-18 06:19
Location: 35° 50 N, 14 º 35 E
Been thanked: 2 times

#13 Post by edbarx »

I don't think there is the need of a new system... When one issues the command "aptitude upgrade", the program which does the upgrade, can have a subroutine which should handle the differential updates. So, I am speaking of an update of an existing package and not of a complete upheaval of the system currently in use.

Regarding the extra space required on mirrors, I think there is no need of extra space. The system can remain as it is. Only the server program, which uploads the debian packages to the client computer, will need an update. The server program, has to be able to deliver only the modified files to the client computer and at the end, deliver a file containing the details of what to do with the unchanged files. With this file, the client computer can recreate the packages using the unmodified files and the downloaded files.
Debian == { > 30, 000 packages }; Debian != systemd
The worst infection of all, is a false sense of security!
It is hard to get away from CLI tools.

plugwash
Posts: 2507
Joined: 2006-09-17 01:10
Contact:

#14 Post by plugwash »

Asking mirror operators to install a program that unpacks packages and delivers individual parts of them on the fly is going to be even less popular than asking them to host more files.

I also think a few extra gigs is a massive underestimate of the disk space involved.

User avatar
edbarx
Posts: 5401
Joined: 2007-07-18 06:19
Location: 35° 50 N, 14 º 35 E
Been thanked: 2 times

#15 Post by edbarx »

What have I learnt from this discussion?
  1. Server maintainers do not like custom programs running on their servers
  2. Differential upgrades are not yet supported by Debian
  3. Long and bandwidth consuming updates cannot be yet avoided by Debian users.
What I suggested was not even acknowledged as an idea which might work. So, I am sort of disappointed because, I can distinguish from my experience in programming, which ideas can work and which ideas cannot work. I am still convinced that what I suggested cannot cause so many problems, because:
  1. data compression is sequential in its nature ie the data, although compressed, is not scrambled ie the data, although unreadable, it is still in order. I think this is true, because files can be extracted from file cabinets without having to decompress everything.
  2. the list of what files must be delivered cannot be a long file
  3. the server can be requested to do partial package uploads to the client computer
  4. the server need not decompress the package to get the requested files
  5. contrary to what some are saying, the burden on the servers will not be increased. On the contrary, it can be reduced.
  6. there is no need to install a "server program" on the server
  7. partial downloads are already used in practice. ie by "download manager"
So, there are no extraordinary requirements.

At the moment, I am still a beginner in Debian GNU/Linux. Before I started to use Debian, I used to program for Windows as a hobby. Had I the same experience with Debian GNU/Linux, I would have tried to program what I am proposing myself, because I believe that it is a valid idea. However, I do not have the required expertise.

Please, do not misinterpret me. Here, I am only exposing my idea. I am not trying to oblige you (the reader) to do it for me. I strongly believe that, sharing ideas is one of the most important aspects of an advanced society.
Debian == { > 30, 000 packages }; Debian != systemd
The worst infection of all, is a false sense of security!
It is hard to get away from CLI tools.

User avatar
edbarx
Posts: 5401
Joined: 2007-07-18 06:19
Location: 35° 50 N, 14 º 35 E
Been thanked: 2 times

Yet another idea which might help improve updates.

#16 Post by edbarx »

In updates, I suggest to skip all the mechanisms that install and uninstall packages. I suggest to include only the modified binaries and to replace them on the client's computer. This should always result in faster updates. The problem with slow updates has to do with using apt, aptitude, apt-get and dpkg. The latter programs should not be allowed to do the updates, because they require a .deb file. In updates, it makes more sense to replace ONLY the updated binaries and settings on the client's computer.
Debian == { > 30, 000 packages }; Debian != systemd
The worst infection of all, is a false sense of security!
It is hard to get away from CLI tools.

plugwash
Posts: 2507
Joined: 2006-09-17 01:10
Contact:

#17 Post by plugwash »

The problem with slow updates has to do with using apt, aptitude, apt-get and dpkg. The latter programs should not be allowed to do the updates, because they require a .deb file. In updates, it makes more sense to replace ONLY the updated binaries and settings on the client's computer.
I disagree, apt has all the infrastructure for working out what packages need to be updated and downloading them. dpkg has all the infrastructure for carrying out the maintainers configuration related changes and tracking what versions are installed as well as updating the main files of the package. Throwing away that infrastructure would be stupid. For a system to work it needs to be built within that existing infrastructure.
# data compression is sequential in its nature ie the data, although compressed, is not scrambled ie the data, although unreadable, it is still in order. I think this is true, because files can be extracted from file cabinets without having to decompress everything.
Most compression algorithms require you to start reading from the beginning. You can stop before you reach the end but if your file is in the middle of a solid archive you are going to have to uncompress everything that comes before that file. Tarballs are solid archives, zips are not, rars can be either depending on the options. A deb is an ar achive containg two tarballs.

User avatar
edbarx
Posts: 5401
Joined: 2007-07-18 06:19
Location: 35° 50 N, 14 º 35 E
Been thanked: 2 times

#18 Post by edbarx »

plugwash wrote:For a system to work it needs to be built within that existing infrastructure.
Are you sure? :shock: Have you ever heard about progress?

If a system puts an imposed limit by itself, it should be reexamined, revised and if necessary, replaced.

The current package management system is what is causing unnecessarily slow updates, because it assumes to download All the files, modified or not! Downloading unmodified files simply does not make sense.
Debian == { > 30, 000 packages }; Debian != systemd
The worst infection of all, is a false sense of security!
It is hard to get away from CLI tools.

plugwash
Posts: 2507
Joined: 2006-09-17 01:10
Contact:

#19 Post by plugwash »

edbarx wrote:
plugwash wrote:For a system to work it needs to be built within that existing infrastructure.
Are you sure? :shock: Have you ever heard about progress?
Sometimes an existing system is so broken or outdated that replacement is the only way to make progress. However very often people are all too eagar to rip up estabilished working systems and start from scratch without a sufficiantly good reason.
The current package management system is what is causing unnecessarily slow updates, because it assumes to download All the files modified or not! Downloading unmodified files simply does not make sense.
right but that is a relatively small part of what the package management system does.

A system for doing differential updates is a sensible idea but to pull it off requires several things
* Someone with enough knowlage of compression, mirror operation, existing file formats and operational characteristcis etc to do a sensible design.
* Someone prepared to get to know the existing codebases and implement that sensible design within those code bases.
* Someone with the rescources to set up and host a test/demonstration setup that people can use.
* Someone with the political skill to get it accepted into the official system.

User avatar
MeanDean
Posts: 3866
Joined: 2007-09-01 01:14

#20 Post by MeanDean »

edbarx wrote: The current package management system is what is causing unnecessarily slow updates, because it assumes to download All the files, modified or not! Downloading unmodified files simply does not make sense.
Can you provide some specific examples and show us how much difference this would make?

Post Reply