Reading through large open source projects

Message

runfrodorun · #16 Post by **runfrodorun** » 2015-08-29 03:14

Just one more thing to help shed a little more light on this:

Linux has ~8mil lines of _implementation_ code (excluding header files, documentation, and anything else that would go along with it) that is to say .c files only.

If you spent every day for ten years no vacations no weekends, you'd have to read about 2200 lines of code and understand it crystal clear every day. It's possible for a prodigy with no life maybe, but that's ten years and those are some pretty unreasonable circumstances. Anyone who has a life might not be doing that

If you want to read supporting materials to help your understanding, forget it.

Also assumes not having to re-read code as you read other code that calls other functions you forgot about. Hey, it's 8 million lines!

The GPL is like a black hole for linux kernel code. More than half of the changes made today are made by companies, not volunteer contributors. Even microsoft has had some stakes in developing certain features in linux. You can bet that Torvalds might not even know some things that are in there. Scary? probably not... I like to think somebody's looking at it, because these days it aint me

-RJ

tomazzi · #17 Post by **tomazzi** » 2015-08-29 18:34

runfrodorun wrote:If you spent every day for ten years no vacations no weekends, you'd have to read about 2200 lines of code and understand it crystal clear every day. It's possible for a prodigy with no life maybe, but that's ten years and those are some pretty unreasonable circumstances. Anyone who has a life might not be doing that If you want to read supporting materials to help your understanding, forget it.

Also assumes not having to re-read code as you read other code that calls other functions you forgot about. Hey, it's 8 million lines!

That way of thinking is typical for people who actualy don't write any code. Writting a code is just some abstraction for them and reding that huge number of lines of "strange" text appears as just unimaginably hard task...

For professional programmer reading a code is like looking at the picture - You need maybe few seconds to realize what it shows. After writting hundreds of thousants lines of code, reading the code becomes easier than reading a book in Your native language

runfrodorun wrote:The GPL is like a black hole for linux kernel code.

Without GPL Linux kernel would have died as a project 25 years ago.

runfrodorun wrote:(...) Even microsoft has had some stakes in developing certain features in linux. You can bet that Torvalds might not even know some things that are in there. Scary? probably not... I like to think somebody's looking at it (...)

Microsoft have added drivers for their Hyper-V machines to improve performance of GNU/Linux systems running on top of windows server. They were just forced by their customers - that's the simple truth.
This is actually a good, non-intrusive piece of code. Some people were yelling that Microsoft have infected/took over the Linux kernel in some way, while the situation is exactly opposite

Every patch is discussed and the code is reviewed before it goes to kernel - no need to worry.

GarryRicketson · #18 Post by **GarryRicketson** » 2015-08-29 18:43

@by tomazzi
+100

Just a short comment , ( I couldn' t think of a better way to explain or say, what tomazzi said. very well put), but the compilers, and other software or programs, make it possible to "scan" large amounts of code, no need to "manually" read every single "bit".
In other words , the computer does most of the "work".
Essentially this is what many "virus scanners" do, but the same kind of programs can be modified, to scan for other types of undesirable code as well.
Normal users, would find it equally "impossible" to manually scan every single file in the system, looking for a piece of code, "kiddie script", that is a virus or mal-ware,worm,etc.
In a nut shell it is quite possible, and necessary to go over millions of lines of code, in order to locate "bugs", etc. That also is why any "big" program, or OS, requires "teams", usually it is more then just one person doing this work.

runfrodorun · #19 Post by **runfrodorun** » 2015-08-30 21:32

tomazzi wrote:That way of thinking is typical for people who actualy don't write any code. Writting a code is just some abstraction for them and reding that huge number of lines of "strange" text appears as just unimaginably hard task...

For professional programmer reading a code is like looking at the picture - You need maybe few seconds to realize what it shows. After writting hundreds of thousants lines of code, reading the code becomes easier than reading a book in Your native language

You're oversimplifying this. Good code is treated like a black box, yes, but each of those lines is meaningful and just being able to read them and know what they mean is not enough. My experiences in graduate level math taught me that. If you have good modularization, then the number of lines of code is actually more staggering, not less, because you don't have the redundancy.

TLDR: knowing what it means does not mean you know why it's there. Knowing how to use it doesn't tell you how it works.

Perhaps we're disagreeing on our definitions of understand, or reading through.

You don't have to believe me but you should

20 years of experience. Every piece of software is different, I have an easier time reading some software than others, but looking at a picture shows the big picture. and even then, sometimes looking at a picture can give you jack. xorg would be an example of that. Understanding the big big picture is about all you're going to get from reading the code, and that's not very helpful.

Also depends how many quick hacks were put in to get things working. You won't understand details like this.

For programmers that think very differently, and for design patterns that you've never seen before, things can get murky pretty quickly. To cite another quick example to explain what I'm talking about, #define macros can make things very murky without running the preprocessor.

tomazzi wrote:
runfrodorun wrote:The GPL is like a black hole for linux kernel code.
Without GPL Linux kernel would have died as a project 25 years ago.

This was actually the point I was trying to make, I didn't say it was a bad thing. GPL fan here. (black holes accumulate mass rapidly, that was what I was hinting at!)

tomazzi wrote:
runfrodorun wrote:(...) Even microsoft has had some stakes in developing certain features in linux. You can bet that Torvalds might not even know some things that are in there. Scary? probably not... I like to think somebody's looking at it (...)
Microsoft have added drivers for their Hyper-V machines to improve performance of GNU/Linux systems running on top of windows server. They were just forced by their customers - that's the simple truth.
This is actually a good, non-intrusive piece of code. Some people were yelling that Microsoft have infected/took over the Linux kernel in some way, while the situation is exactly opposite

Every patch is discussed and the code is reviewed before it goes to kernel - no need to worry.

I haven't forgotten. My point is you'd be surprised who contributes, not oh no scary microsoft. but thanks anyway.

GarryRicketson wrote:@by tomazzi
+100
Just a short comment , ( I couldn' t think of a better way to explain or say, what tomazzi said. very well put), but the compilers, and other software or programs, make it possible to "scan" large amounts of code, no need to "manually" read every single "bit".
In other words , the computer does most of the "work".
Essentially this is what many "virus scanners" do, but the same kind of programs can be modified, to scan for other types of undesirable code as well.
Normal users, would find it equally "impossible" to manually scan every single file in the system, looking for a piece of code, "kiddie script", that is a virus or mal-ware,worm,etc.
In a nut shell it is quite possible, and necessary to go over millions of lines of code, in order to locate "bugs", etc. That also is why any "big" program, or OS, requires "teams", usually it is more then just one person doing this work.

Perhaps we are on different wavelengths here -- let me take a second to explain where I'm coming from.

Say you're on a massive development project. Say you're new to the team (You're supposed to keep new hires tacked on to big projects to a minimum in a business world, here's why) You're charged with creating a certain feature in the code. You take ownership of the backlog item, feature development log, whatever your company or partnership calls it. Now, you don't have a clue what the overarching design of the software is, you don't know where to look, how many places to look, and then once you understand those basics (often it can take months of learning for the largest of projects if you have no support from your coworkers) now each little piece of code has it's own patterns, it's own rules, it's own assumptions that it makes on its calls and callers. How are you to make an informed change that is consistent with the design? It is not so simple as looking at a 'big picture.' Understanding code is hard, reading it maybe not as much (but still d___ hard for some!)

Also- not all features are created equally. When I was working on a filesystem driver that I didn't design I ended up putting in little hacks here and there that were probably pretty safe things to do. Changing the defaults, bit patterns for file headers, etc. but that's not an understanding of the code. When I worked on linux (back in 2.4 and early 2.6) I did NOT have a clear understanding of how everything in the kernel worked. I learned what I needed to know to work on my drivers, and only what I needed to know. NOT a clear understanding of the whole thing. Nobody's got time for that.

TLDR: You can understand what you need to know to do your job, this is NOT a clear understanding of the source base. When you need to do a big refactor, you are dead in the water having just skimed the code as you suggest.

Prove me wrong... find out what the developers were thinking when they created each function in the linux kernel system calls, and how when you change something it's not going to break their model.

I hope I'm making sense, but probably not as per usual.

-RJ

edit: clarification

tomazzi · #20 Post by **tomazzi** » 2016-01-10 02:46

runfrodorun wrote: Say you're on a massive development project. Say you're new to the team (You're supposed to keep new hires tacked on to big projects to a minimum in a business world, here's why) You're charged with creating a certain feature in the code. You take ownership of the backlog item, feature development log, whatever your company or partnership calls it. Now, you don't have a clue what the overarching design of the software is, you don't know where to look, how many places to look, and then once you understand those basics (often it can take months of learning for the largest of projects if you have no support from your coworkers) now each little piece of code has it's own patterns, it's own rules, it's own assumptions that it makes on its calls and callers. How are you to make an informed change that is consistent with the design? It is not so simple as looking at a 'big picture.' Understanding code is hard, reading it maybe not as much (but still d___ hard for some!)

I'm surprised, but somehow I've missed Your reply...
I've marked bold fragments of Your post which are interesting for me.

1. "You're charged with creating a certain feature in the code" and You do what? - There's only one way to go: read and understand the source (it may be a hard task, but it's unavoidable)
2. If you can't "get a clue" *after* reading the sources, then there are 2 possible ways to go:
a) the sources are written so badly, that the best way to go is to not waste Your time, at least at not at a given salary.... (rise it!

)
b) Make an agreement in which You'll claim that the sources are shitty, but they can be turned to/(or replaced with) some another solution which will work in particular case.

3. "...now each little piece of code has it's own patterns, it's own rules, it's own assumptions" - if this is the case - leave this shitty company as soon as possible....

Regerds.

runfrodorun · #21 Post by **runfrodorun** » 2016-01-10 17:33

I can see I'm not going to change your mind, and that's too bad. It's a pitty, because I think if we were looking at this the same way we would probably agree.

I wrote a groebner basis based theorem prover last year and it's not even that many lines of code... nobody I've found knows how that works yet, because it took me months to understand the math that went into it and to implement the F5 algorithm, which is an NP complete heuristic based approach to computing the groebner basis.

So yeah I guess you're right, if you just want to use that API and be done with it, then sure, you understand the code just fine. But I'm saying that understanding the code is understanding how that works, and that will take you a few years of graduate math classes and a lot of time to figure it out. So why are we arguing about this if we aren't even agreeing what understanding the code means. It's one thing when you're working for a company with a bunch of idiot developers and trying to deal with their bad assumptions, and it's another thing when the lead developer quits, and now all of a sudden you're thrust into a position where you have to make choices that are more about philosophy and principal of where your design is going, and that is not as simple as knowing which API calls to use. And that's a position that I have been put in before, so I don't think you really know what the ramifications of understanding code on that level are.

Cheerio

-RJ

tomazzi · #22 Post by **tomazzi** » 2016-01-10 22:38

runfrodorun wrote:I wrote a groebner basis based theorem prover last year and it's not even that many lines of code... nobody I've found knows how that works yet, because it took me months to understand the math that went into it and to implement the F5 algorithm, which is an NP complete heuristic based approach to computing the groebner basis.

Well, although Your example is very good in showing the problem, this is a corner case. I mean, that highly specialized mathematical algorithms are rare. Of course in such case equivalently highly specialized programmers are needed.

But, f.e. the aforementioned Linux kernel is very simple - there's no highly specialized code, and every professional programmer can read it without a problem, especially when there are excellent tools like http://lxr.free-electrons.com or IDEs.

Again: we are talking about *large* projects, where the quality of tools used is very important, but *large* project doesn't necessarily mean "extremely complicated" - it can be at best complex.

And complex projects are always standardised and usually modular - there are settled naming conventions and standard interfaces, among other things.
So, when You become familiar with the basic rules settled, then You won't have a problem in reading the project's code.

Want to check how the thread's stacks are are guarded or how they're expanded when the guard zone is hit? - no problemo - It takes just few seconds to find the source and maybe few next minutes to read the code...

And this is the point - You don't have to remember that 8 millions of lines of code - You read only the part which You're working with.

Regards.

Debian User Forums

Reading through large open source projects

Re: Reading through large open source projects

Re: Reading through large open source projects

Re: Reading through large open source projects

Re: Reading through large open source projects

Re: Reading through large open source projects

Re: Reading through large open source projects

Re: Reading through large open source projects