Cross-compilation needed between 2 "amd64" systems ?

Need help with C, C++, perl, python, etc?

Cross-compilation needed between 2 "amd64" systems ?

Postby Piglou » 2015-08-18 10:52

The question seems to be "too easy" to be asked here, but some people told me it was not necessary, some people told me it's not always the case.
I'm using the same OS on both side.

I would like to know if CPU flag available on the "building machine" (FX9590) becomes necessary flags into the built executable/libraries (I want to run it on J1900). If so, how to ask the compiler not to account on them ?



Tiny running machine :

Code: Select all
processor       : 0 1 2 3
vendor_id       : GenuineIntel
cpu family      : 6
model           : 55
model name      : Intel(R) Celeron(R) CPU  J1900  @ 1.99GHz
stepping        : 8
microcode       : 0x829
cpu MHz         : 1332.904
cache size      : 1024 KB
physical id     : 0
siblings        : 4
core id         : 3
cpu cores       : 4
apicid          : 6
initial apicid  : 6
fpu             : yes
fpu_exception   : yes
cpuid level     : 11
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 movbe popcnt tsc_deadline_timer rdrand lahf_lm 3dnowprefetch ida arat epb dtherm tpr_shadow vnmi flexpriority ept vpid tsc_adjust smep erms
bugs            :
bogomips        : 3993.60
clflush size    : 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:



Building factory :

Code: Select all
processor       : 0 1 2 3 4 5 6 7
vendor_id       : AuthenticAMD
cpu family      : 21
model           : 2
model name      : AMD FX(tm)-9590 Eight-Core Processor
stepping        : 0
microcode       : 0x6000822
cpu MHz         : 1400.000
cache size      : 2048 KB
physical id     : 0
siblings        : 8
core id         : 7
cpu cores       : 4
apicid          : 23
initial apicid  : 7
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 popcnt aes xsave avx f16c lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs xop skinit wdt lwp fma4 tce nodeid_msr tbm topoext perfctr_core perfctr_nb arat cpb hw_pstate npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold vmmcall bmi1
bugs            : fxsave_leak
bogomips        : 8036.43
TLB size        : 1536 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 48 bits physical, 48 bits virtual
power management: ts ttp tm 100mhzsteps hwpstate cpb eff_freq_ro


I run the executable on the same OS both side but I got some weird problems on the target side (I'm suspecting different CPU flags but I don't know how to handle that)

Thank you in advance !
Sorry if it's too easy but I didn't find any clear answer
Piglou
 
Posts: 14
Joined: 2015-07-25 11:55

Re: Cross-compilation needed between 2 "amd64" systems ?

Postby Piglou » 2015-08-18 11:19

After some tests it's becoming clear that it doesn't work well currently because CPU are different.
How to build a simple executable and set CPU flags that should/should not be used ?

Like "gcc main.c o Program"

EDIT : I saw that :
https://gcc.gnu.org/onlinedocs/gcc-4.5. ... tions.html
Is the option -mtune=generic enough, and is -march=generic necessary ?
Piglou
 
Posts: 14
Joined: 2015-07-25 11:55

Re: Cross-compilation needed between 2 "amd64" systems ?

Postby stevepusser » 2015-08-18 18:59

Debian is able to build amd64 packages that work across all amd64 platforms, so it must be possible. I think one optimization flag is -O2, but I don't know what the machine target is used, though your generic flags seem like a good idea.
The MX Linux repositories: Backports galore! If we don't have something, just ask and we'll try--we like challenges. New packages: Krita 3.3.2.1, Pale Moon 27.6.0, Audacity 2.2.0, mpv 0.27.0, Corebird 1.7.1, Firefox 57.0, SMPlayer 17.11.2
User avatar
stevepusser
 
Posts: 8905
Joined: 2009-10-06 05:53

Re: Cross-compilation needed between 2 "amd64" systems ?

Postby runfrodorun » 2015-08-24 22:42

the -O flags do not affect the portability of the generated code. I run a distributed compile server with distcc and dmucs, if you are using the same version of gcc that is enough. On the other hand, you need to be generating code that that is at best the worst case scenario of cpu feature sets between the two, and any tuning for optimization will be pointless because each cpu will probably have different cache line sizes etc. Compile with the same options on both machines, don't enable any auto tune flags.

as far as your last question, mtune is deprecated i believe, and march is the replacement, I would suggest using as you said --march=generic or --march=amd64, i think is the correct option (don't quote me on that) You may even try it without any of those, I believe by default gcc only generates code that depends on required amd64 features (don't quote me on that either, haven't done compilers in a long time)

I ran a quick test just compiling hello world between my VPS and my i7 2600k, which have dramatically different feature sets, and it checks out. Obviously not a very good test, you may consider giving it a shot. I compiled with no flags: "gcc main.c -o main.out"

My gentoo friends would know a little more about tweaking gcc, but long story short is that if you want code to use extra features on your CPU you need to actually ask it to do so, and it will not otherwise.

Sorry if this is not enough help, if you post your funny outputs i might be able to tell you a little more

-RJ

edit: I sort of hinted at this but I should make it more explicit. You MUST have the same libraries available on your target system as you have on your compiling system, for at least the libraries that you link to. Probably would be smart to have the same versions too if you're having problems.
Much opinionated.
Some abrasive.
No systemd.
Wow.
runfrodorun
 
Posts: 202
Joined: 2013-06-19 05:09

Re: Cross-compilation needed between 2 "amd64" systems ?

Postby tomazzi » 2015-08-29 21:41

I agree with everything You've said, except the following part:

runfrodorun wrote:the -O flags do not affect the portability of the generated code

<Of course I'm talking here about portability of binary executables between Intel and AMD - otherwise portability of binaries is simply impossible>

Well, it does affect the portability, usually starting from -O2, but it depends on what code is compiled. F.e. GCC can optimize-out nested calls to a function - what means, that the result of the operation can depend on "invisible" out-of-order execution algorithms used by particular CPU. in such case it also removes frame pointers, so it makes it next-to-impossible to debug such function.
SSE/MMX unaligned operations are treated differently on different CPUs - unpredictable results, possible crash.
Timers: accuracy of timers used by application depends on the CPU: applications can crash or expose undefined behavior when running on different CPUs, especially when different drivers are used for HPET (especially multithreaded programs)
etc, etc - there are literally hundreds of potential problems...

So, the only way is disable optimisations, but even in such case there's no 100% warranty. Writting programs which generates portable binary object code needs extreme attention and knowledge.

For that reason, I think that autotools projects are far more safe and easier to maintain, but of course they're much harder to distribute in a binary form...

Regards.
Odi profanum vulgus
tomazzi
 
Posts: 730
Joined: 2013-08-02 21:33

Re: Cross-compilation needed between 2 "amd64" systems ?

Postby runfrodorun » 2015-08-30 21:45

You should never be doing an unaligned access for anything with MMX, SSE, or AVX. the results are terrible even when they do work (_horribly_ slow, big overhead), and even with that often times you need the intrinsics from intel and you do those optimizations yourself. gcc has some provision for sse now (and by extension probably mmx).

The compiler will also compile some code out of order and yes on -O2 and -O3 it will remove frame pointers, and you can't see values of variables and stepping through is nonsense but that doesn't necessarily make the code less portable.

Citation: run gentoo and do build acceleration with native compilers across intel and amd systems. works fine with O2. Gentoo userspaces don't necessarily always run stable when compiled at -O3 so can't really test, since it isn't perfect on my own system let alone distributed. Obviously not an exhaustive test, but I think we can agree that's pretty telling.

As far as timing goes, if your multithreaded timing and consistency is dependent on CPU, I have found that it is often times a result of a mistake in your code. It is possible to run multithreaded code where you forget to put a lock where you really need one and it works just fine, happened to me on multiple occasions. You end up finding those bugs when you run your code on another platform (even if you recompile it for that platform)... it blows up in your face.

I maintain -O2 code is portable assuming it is only using the worst case scenario for amd64 architectures.
Much opinionated.
Some abrasive.
No systemd.
Wow.
runfrodorun
 
Posts: 202
Joined: 2013-06-19 05:09

Re: Cross-compilation needed between 2 "amd64" systems ?

Postby tomazzi » 2015-08-31 14:49

runfrodorun wrote:As far as timing goes, if your multithreaded timing and consistency is dependent on CPU, I have found that it is often times a result of a mistake in your code. It is possible to run multithreaded code where you forget to put a lock where you really need one and it works just fine, happened to me on multiple occasions. You end up finding those bugs when you run your code on another platform (even if you recompile it for that platform)... it blows up in your face.

Yes, improper interlocking can cause serious issues, but serious programmer wouldn't even dare to release a program without having a clean helgrind report in his hand ;)
I was thinking about priority inversion effects, which can be uncatchable during debugging, but they may show up when You run the application on some different/newer CPU - what can happen due to different accuracy of timers used to generate thread's time slices or by timers used in the thread - what can lead to reordering of locking of mutexes and other synchronization objects.
...
Yes, optimizing compilers can generate code with reordered execution of particular statements in the source, but my point is that it's more safe to have one level of reordering instead of two. Reordered execution of aready re-ordered code can lead to serious problems with memory barriers efficiency and can completely kill the performance of TMX-like extensions.

That's why IMO, for highly portable binary code it's better to disable all or most of optimizations and let the CPU to decide how to optimize the execution. Of course this may not be the case in very simple programs, so in each particular case the programmer needs to be sure about what optimizations can be safely used.

Regards.
Odi profanum vulgus
tomazzi
 
Posts: 730
Joined: 2013-08-02 21:33

Re: Cross-compilation needed between 2 "amd64" systems ?

Postby runfrodorun » 2015-09-01 01:23

Did a brief and unscientific check, gcc 4.9:

O1 produces more or less byte for byte equivalent binaries. meta tags excepted, and a byte here and there that is different, most likely because of system libraries or patch mismatch. (very non-scientific test, never disassembled and not great at reading hex, forgive me).

O2 does not produce byte for byte equivalent code, but all tests were compatible with both systems. tested some pretty involved examples, nothing multithreaded, let alone major lock contention. all worked on both systems.

You were right about one thing: O2 enables processor specific optimizations. mtune=generic does not change this; binaries will not be byte for byte identical.

It also seems that debian is compiled with O1.

That's about all I could find out, keep in mind that gcc assumes correct implementations of amd64 architectures, and any optimizations that it does would not be made around faulty assumptions.

I don't see any documentation for it anywhere, but I remember seeing something like O4 and O5 a few years ago. All I know about them is they are almost guaranteed to not generate working code and it is insanity to debug the optimizations (they are NP complete problems so they break out the heuristics), So there were never any bets to be off, and O3 is supposed to be pretty nuts, and is not always guaranteed to generate correct code (You wouldn't want to compile your whole userspace with it, I heard it can cause some issues).

Most of the time stuff like O3 issues are actually bugs in gcc or the code its compiling believe it or not...

I could be convinced O2 code is not portable, but I'm still very dubious, even if it has its risks. Can't find anyone online who says otherwise.

I'll cite this one more time: I build accel across many architectures, sometimes without specifying CPU's. If you think running binaries compiled for different chips is insane, try linking a bunch of mismatched objects. It works! Whole userspace rebuild once!

-RJ

p.s. Really can't come to a definite conclusion until I have two identical systems running on opposite chips with the same gcc.
Much opinionated.
Some abrasive.
No systemd.
Wow.
runfrodorun
 
Posts: 202
Joined: 2013-06-19 05:09


Return to Programming

Who is online

Users browsing this forum: No registered users and 3 guests

fashionable