Page 1 of 2

Suggestion: Differential software updates

Posted: 2007-12-20 09:56
by edbarx
I think, software updates can be made faster by including only the affected files. The unchanged files need not be included in updates. I think, this reduces the download time considerably.

Posted: 2007-12-20 18:56
by plugwash
been suggested before but actually implementing it is easier said than done for both techincal and beurocratic reasons.

Posted: 2007-12-20 19:16
by rickh
I thought that's what "pdiff" was doing. I don't know if it's in Etch, but it seems to be working fine on my Lenny and Sid systems.

Posted: 2007-12-20 20:13
by edbarx
Thanks for answering me.
but actually implementing it is easier said than done...
I understand your point, because I practice programming as a hobby, obviously when I feel like it.

Posted: 2007-12-20 20:50
by plugwash
rickh wrote:I thought that's what "pdiff" was doing. I don't know if it's in Etch, but it seems to be working fine on my Lenny and Sid systems.
pidiff only diffs the package lists not the packages themselves.

Posted: 2007-12-22 17:31
by edbarx
So, I will try to imagine a way how this [see title of topic] can be done
  • 1) download the package's modified files
    2) rebuild the package using the downloaded files and the files on the client computer
    3) when all the required packages are rebuilt, install the packages using the debian's package management system

Posted: 2007-12-31 19:50
by mzilikazi
edbarx wrote:So, I will try to imagine a way how this [see title of topic] can be done
  • 1) download the package's modified files
    2) rebuild the package using the downloaded files and the files on the client computer
    3) when all the required packages are rebuilt, install the packages using the debian's package management system
I don't see how rebuilding packages would be faster than simply installing the new ones.

Posted: 2007-12-31 21:03
by edbarx
Ok. I don't mean "rebuild the packages" using the compilers, but using something like the command "dpkg-repack".

Alternatively [ie ignoring what I said earlier], the update can be done as follows: once the modified (and if applicable compiled) files are downloaded on the client computer, the outdated files can be replaced by the respective updated files. Finally, if necessary, a "dpkg-reconfigure" command can be run for those packages which may need it.

Posted: 2008-01-01 04:48
by plugwash
the devil is in the details.

you would have to

1: create a system for comparing two debs and generating some kind of "patch deb" containing the modified files (not too hard)
2: integrate theese patch debs into the debian infrastructure tools including some system to decide what patch debs to keep at any time (harder).
3: integtate theese patch debs into the debian package management tools (about as hard as 2) remembering to include a fallback system to fetch the whole deb when say a file that should have been there from the previous version is not there or currupted.
4: get the debian powers that be to accept your system (even harder).

Posted: 2008-01-01 07:05
by ajdlinux
Debdelta is a system that does something like this, using xdelta to binary diff the two packages. It's not the most bandwidth-efficient system, but it's better than nothing. Get it through APT, then look through its docs directory.

Posted: 2008-01-01 10:11
by edbarx
Thanks for suggesting me "debdelta". Now, I can understand what are the difficulties. You (plugwash) are right to say, that the most difficult part is in convincing people, rather than developing a package that does what I said. It is always not easy to convince people. I cannot blame the debian people for holding on their beliefs.

Posted: 2008-01-01 10:19
by ajdlinux
It's also about managing a transition to the new system - how do you justify a few extra gigs of files to hundreds of mirror operators? How do you find the CPU power to prepare the diffs? etc. etc.

It's been debated a few times on -devel IIRC - fairly interesting discussions.

Posted: 2008-01-01 11:50
by edbarx
I don't think there is the need of a new system... When one issues the command "aptitude upgrade", the program which does the upgrade, can have a subroutine which should handle the differential updates. So, I am speaking of an update of an existing package and not of a complete upheaval of the system currently in use.

Regarding the extra space required on mirrors, I think there is no need of extra space. The system can remain as it is. Only the server program, which uploads the debian packages to the client computer, will need an update. The server program, has to be able to deliver only the modified files to the client computer and at the end, deliver a file containing the details of what to do with the unchanged files. With this file, the client computer can recreate the packages using the unmodified files and the downloaded files.

Posted: 2008-01-01 12:08
by plugwash
Asking mirror operators to install a program that unpacks packages and delivers individual parts of them on the fly is going to be even less popular than asking them to host more files.

I also think a few extra gigs is a massive underestimate of the disk space involved.

Posted: 2008-01-01 13:28
by edbarx
What have I learnt from this discussion?
  1. Server maintainers do not like custom programs running on their servers
  2. Differential upgrades are not yet supported by Debian
  3. Long and bandwidth consuming updates cannot be yet avoided by Debian users.
What I suggested was not even acknowledged as an idea which might work. So, I am sort of disappointed because, I can distinguish from my experience in programming, which ideas can work and which ideas cannot work. I am still convinced that what I suggested cannot cause so many problems, because:
  1. data compression is sequential in its nature ie the data, although compressed, is not scrambled ie the data, although unreadable, it is still in order. I think this is true, because files can be extracted from file cabinets without having to decompress everything.
  2. the list of what files must be delivered cannot be a long file
  3. the server can be requested to do partial package uploads to the client computer
  4. the server need not decompress the package to get the requested files
  5. contrary to what some are saying, the burden on the servers will not be increased. On the contrary, it can be reduced.
  6. there is no need to install a "server program" on the server
  7. partial downloads are already used in practice. ie by "download manager"
So, there are no extraordinary requirements.

At the moment, I am still a beginner in Debian GNU/Linux. Before I started to use Debian, I used to program for Windows as a hobby. Had I the same experience with Debian GNU/Linux, I would have tried to program what I am proposing myself, because I believe that it is a valid idea. However, I do not have the required expertise.

Please, do not misinterpret me. Here, I am only exposing my idea. I am not trying to oblige you (the reader) to do it for me. I strongly believe that, sharing ideas is one of the most important aspects of an advanced society.

Yet another idea which might help improve updates.

Posted: 2008-04-08 08:10
by edbarx
In updates, I suggest to skip all the mechanisms that install and uninstall packages. I suggest to include only the modified binaries and to replace them on the client's computer. This should always result in faster updates. The problem with slow updates has to do with using apt, aptitude, apt-get and dpkg. The latter programs should not be allowed to do the updates, because they require a .deb file. In updates, it makes more sense to replace ONLY the updated binaries and settings on the client's computer.

Posted: 2008-04-08 08:38
by plugwash
The problem with slow updates has to do with using apt, aptitude, apt-get and dpkg. The latter programs should not be allowed to do the updates, because they require a .deb file. In updates, it makes more sense to replace ONLY the updated binaries and settings on the client's computer.
I disagree, apt has all the infrastructure for working out what packages need to be updated and downloading them. dpkg has all the infrastructure for carrying out the maintainers configuration related changes and tracking what versions are installed as well as updating the main files of the package. Throwing away that infrastructure would be stupid. For a system to work it needs to be built within that existing infrastructure.
# data compression is sequential in its nature ie the data, although compressed, is not scrambled ie the data, although unreadable, it is still in order. I think this is true, because files can be extracted from file cabinets without having to decompress everything.
Most compression algorithms require you to start reading from the beginning. You can stop before you reach the end but if your file is in the middle of a solid archive you are going to have to uncompress everything that comes before that file. Tarballs are solid archives, zips are not, rars can be either depending on the options. A deb is an ar achive containg two tarballs.

Posted: 2008-04-08 08:59
by edbarx
plugwash wrote:For a system to work it needs to be built within that existing infrastructure.
Are you sure? :shock: Have you ever heard about progress?

If a system puts an imposed limit by itself, it should be reexamined, revised and if necessary, replaced.

The current package management system is what is causing unnecessarily slow updates, because it assumes to download All the files, modified or not! Downloading unmodified files simply does not make sense.

Posted: 2008-04-08 09:26
by plugwash
edbarx wrote:
plugwash wrote:For a system to work it needs to be built within that existing infrastructure.
Are you sure? :shock: Have you ever heard about progress?
Sometimes an existing system is so broken or outdated that replacement is the only way to make progress. However very often people are all too eagar to rip up estabilished working systems and start from scratch without a sufficiantly good reason.
The current package management system is what is causing unnecessarily slow updates, because it assumes to download All the files modified or not! Downloading unmodified files simply does not make sense.
right but that is a relatively small part of what the package management system does.

A system for doing differential updates is a sensible idea but to pull it off requires several things
* Someone with enough knowlage of compression, mirror operation, existing file formats and operational characteristcis etc to do a sensible design.
* Someone prepared to get to know the existing codebases and implement that sensible design within those code bases.
* Someone with the rescources to set up and host a test/demonstration setup that people can use.
* Someone with the political skill to get it accepted into the official system.

Posted: 2008-04-08 11:47
by MeanDean
edbarx wrote: The current package management system is what is causing unnecessarily slow updates, because it assumes to download All the files, modified or not! Downloading unmodified files simply does not make sense.
Can you provide some specific examples and show us how much difference this would make?