Page 1 of 2

Yet another idea which might help improve updates.

Posted: 2008-04-08 08:10
by edbarx
In updates, I suggest to skip all the mechanisms that install and uninstall packages. I suggest to include only the modified binaries and to replace them on the client's computer. This should always result in faster updates. The problem with slow updates has to do with using apt, aptitude, apt-get and dpkg. The latter programs should not be allowed to do the updates, because they require a .deb file. In updates, it makes more sense to replace ONLY the updated binaries and settings on the client's computer.

Posted: 2008-04-08 08:38
by plugwash
The problem with slow updates has to do with using apt, aptitude, apt-get and dpkg. The latter programs should not be allowed to do the updates, because they require a .deb file. In updates, it makes more sense to replace ONLY the updated binaries and settings on the client's computer.
I disagree, apt has all the infrastructure for working out what packages need to be updated and downloading them. dpkg has all the infrastructure for carrying out the maintainers configuration related changes and tracking what versions are installed as well as updating the main files of the package. Throwing away that infrastructure would be stupid. For a system to work it needs to be built within that existing infrastructure.
# data compression is sequential in its nature ie the data, although compressed, is not scrambled ie the data, although unreadable, it is still in order. I think this is true, because files can be extracted from file cabinets without having to decompress everything.
Most compression algorithms require you to start reading from the beginning. You can stop before you reach the end but if your file is in the middle of a solid archive you are going to have to uncompress everything that comes before that file. Tarballs are solid archives, zips are not, rars can be either depending on the options. A deb is an ar achive containg two tarballs.

Posted: 2008-04-08 08:59
by edbarx
plugwash wrote:For a system to work it needs to be built within that existing infrastructure.
Are you sure? :shock: Have you ever heard about progress?

If a system puts an imposed limit by itself, it should be reexamined, revised and if necessary, replaced.

The current package management system is what is causing unnecessarily slow updates, because it assumes to download All the files, modified or not! Downloading unmodified files simply does not make sense.

Posted: 2008-04-08 09:26
by plugwash
edbarx wrote:
plugwash wrote:For a system to work it needs to be built within that existing infrastructure.
Are you sure? :shock: Have you ever heard about progress?
Sometimes an existing system is so broken or outdated that replacement is the only way to make progress. However very often people are all too eagar to rip up estabilished working systems and start from scratch without a sufficiantly good reason.
The current package management system is what is causing unnecessarily slow updates, because it assumes to download All the files modified or not! Downloading unmodified files simply does not make sense.
right but that is a relatively small part of what the package management system does.

A system for doing differential updates is a sensible idea but to pull it off requires several things
* Someone with enough knowlage of compression, mirror operation, existing file formats and operational characteristcis etc to do a sensible design.
* Someone prepared to get to know the existing codebases and implement that sensible design within those code bases.
* Someone with the rescources to set up and host a test/demonstration setup that people can use.
* Someone with the political skill to get it accepted into the official system.

Posted: 2008-04-08 11:47
by MeanDean
edbarx wrote: The current package management system is what is causing unnecessarily slow updates, because it assumes to download All the files, modified or not! Downloading unmodified files simply does not make sense.
Can you provide some specific examples and show us how much difference this would make?

Posted: 2008-04-08 19:07
by edbarx
MeanDean wrote:Can you provide some specific examples and show us how much difference this would make?
It doesn't need much imagination to conclude that in updates of packages, not all the files are modified between consecutive updates. Till now, I haven't done a statistical analysis. With the next update, I will try to compare the files of a random sample of ten packages and then I will post the results.

Posted: 2008-04-09 20:10
by GNU.Wasabi
Wait a second, what if say, you run a pretty outdated Etch (for example from the times Etch was testing and Sarge stable) and you wanted to update package pkg-abc from your currently installed version 1.05 to the current upstream version 1.60, how would it work with differential updates? If you installed the update of pkg-abc 1.60, would it only put the updated files and settings between version 1.60 and 1.59 or would it use some kind of intelligent system to determine the changes between version 1.60 and 1.05?
If it is the former, you would need to install the following updates of pkg-abc: 1.06, 1.07, 1.08, 1.09, 1.10, ...1.58, 1.59, 1.60. If it is the latter, then it would be a little bit useful, but what am I gaining from the differential update versus a normal update if the only file between version 1.05 and 1.60 of pkg-abc is README that remained the same while the rest changed?

To me this "differential update" system is extremely hard to make and also seems to slowdown the speed of how APT currently operates. Do we want our dependencies to be checked once more to see if there are any changes between the versions and make the update operation very slow? What about bigger packages like 20Mb and more? It would be very slow on older machines especially to sort out the differences between the versions.

By the way, how would it work anyway when it checks for the changes in the files? The server would keep track of every file inside every package and also the MD5 of the files inside the packages for the system to work.

Or maybe I'm just understanding something wrong.
My opinion: leave APT as it is, works for me at least.

Reply to GNU.Wasabi

Posted: 2008-04-12 08:21
by edbarx
In my opinion, differential updates make sense only when updating a package from one version to the next. Widening the gap invariably makes differential updates inefficient, if not worse than doing the updates they are done at present.

By differential updates I mean this: download a file only if the same file in the newer version has been modified or created from scratch.

Posted: 2008-04-15 18:28
by GNU.Wasabi
edbarx, could you answer to my question that I mentioned above:
By the way, how would it work anyway when it checks for the changes in the files? The server would keep track of every file inside every package and also the MD5 of the files inside the packages for the system to work.
Thanks!

Posted: 2008-04-16 06:01
by edbarx
GNU.Wasabi wrote:edbarx, could you answer to my question that I mentioned above:
By the way, how would it work anyway when it checks for the changes in the files? The server would keep track of every file inside every package and also the MD5 of the files inside the packages for the system to work.
Thanks!
The .deb archives contain the file list of each package. The list is complete with the file size and modification date. This information can be used to decide which files have been updated between versions of packages. Moreover, this list together with the updated files can be kept on the server. This avoids the need for the server to decompress the .deb archives. Packages undergoing huge updates should be treated as they are treated at present.

I suggest this approach only for packages undergoing small changes. In this way, these .deb files can be excluded from the download procedure.