2.3 cpu/jit optimization - Vibrant General

preface: i'm no expert and i've had a beer or two, but this is something that's been bugging me for a while. = P
Obviously the linpack scores between the Hummingbird and Snapdragon is a huge difference, but 2.3 does not improve that (2.3 does appear to slightly improve linpack score, but I've seen more variation between various 2.2 roms than I've seen between something like Bionix V 1.2 or EDT's 2.2.1 stable beta and CM7). Both use the ARMv7 ISA, so a performance difference would be in the hardware implementation of the instructions used meaning that an optimization would be based on calling instructions that perform better for a specific task for a specific architecture. But since the actual instructions are the same, an 'optimization' for one hardware implementation would hurt another implementation of that same instruction set.
This sounds unlikely.
First, it is unlikely that Google is trying to optimize which instructions are called in which order just to improve a specific hardware implementation.
Second, without really bloating android or leaving this up to the hardware manufacturer in kernel development, this is not practical. If it's left to the hardware manufacturer and developing the kernel, then they should be trying to optimize. Otherwise Google wants to cover as much hardware as possible with as little code as possible (and thinking more about this, it's probably left up to the manufacturer; to elaborate the current kernel for Ubuntu 10.10 is 139MB which is the size of some Android ROMs and covers a vast array of hardware, most kernels for the Vibrant are in the 5-10MB range).
Now this leads to JIT, which really falls to the same pitfalls. Android apps are based off Java, Java is a hybrid language that is compiled and interpreted. JIT basically compiles parts of code that are commonly ran in an application to reduce interpretation (because interpretation is slower). But in compiling it, it's still going to eventually be compiled to ARMv7 instructions. I'm not exactly sure how the Dalvik JIT is implemented, but my guess is that since APKs are already 'compiled' it compiles parts of the APKs that would normally be interpreted to machine code and compiles them for faster execution. In which case, this is still the ARMv7 ISA and the difference between the Hummingbird and Snapdragon is hardware implementation, not software.
So, i don't see how some optimization specifically for the Hummingbird is likely or plausible (and keep in mind that Moto uses the OMAP, which would require optimization as well).
There's my rambling, for now.

Related

[Q] [REQ] Galbraith Patch worked into kernals?

http://www.pcworld.com/businesscent...x_kernel_patch_delivers_huge_speed_boost.html
http://forum.xda-developers.com/showthread.php?t=844458
could this be worked into Epic 4G kernels as well?
tyl3rdurden said:
http://www.pcworld.com/businesscent...x_kernel_patch_delivers_huge_speed_boost.html
http://forum.xda-developers.com/showthread.php?t=844458
could this be worked into Epic 4G kernels as well?
Click to expand...
Click to collapse
WOW. I am seriously impressed by your "keeping up with the times" mentality. Good job on noticing this!
So...
"n tests by Galbraith, the patch reportedly produced a drop in the maximum latency of more than 10 times and in the average latency of the desktop by about 60 times. Though the merge window is now closed for the Linux 2.6.37 kernel, the new patch should make it into version 2.6.38."
Along with an Overclocked Froyo kernel (once source is out) this should REALLY improve our experiences.
I mentioned in another thread that I am in talks with Paragon software
http://www.paragon-software.com/exp...ocs/technologies/Paragon_UFSD_for_Android.pdf
for NTFS and HSF access. I think that is is POSSIBLE that this is actually a software patch, although it may need to be placed into the kernel itself as a driver. I promise to update as soon as they get back to me as I just spoke to the devs there yesterday.
Looks like our experience is about to improve dramatically!
Already in IntersectRavens latest kernel and wildmonk's latest beta kernels for nexus one. Check the threads
Click to expand...
Click to collapse
From the other xda thread someone mentioned that some kernels have already implemented. I am sure some of them would be glad to share how it is implemented and how easily it can be done. I know it is different phones/kernels but the idea behind it should be similar.
Dulanic said:
From the other xda thread someone mentioned that some kernels have already implemented. I am sure some of them would be glad to share how it is implemented and how easily it can be done. I know it is different phones/kernels but the idea behind it should be similar.
Click to expand...
Click to collapse
We don't have a source kernel for Froyo yet to do this. Someone correct me if I am wrong please.
Edit: I can't find anything mentioning this patch. If anyone has a link post it. I don't believe this is implemented anywhere yet.
I found the below info here:
http://www.reseize.com/2010/11/linux-kernel-patch-that-does-wonders.html
Below is the video of the Linux desktop when running the kernel and the patch in question was applied but but disabled:
As you can see, the experience when compiling the Linux kernel with so many jobs is rather troubling to the Linux desktop experience. At no point in the video was the 1080p sample video paused, but that was just where the current mainline Linux kernel is at with 2.6.37. There was also some stuttering with glxgears and some responsiveness elsewhere. This is even with all of the Linux 2.6.37 kernel improvements up to today. If recording a video of an older kernel release, the experience is even more horrific! Now let's see what happens when enabling the patch's new scheduler code
It is truly a night and day difference. The 1080p Ogg video now played smoothly a majority of the time when still compiling the Linux kernel with 64 jobs. Glxgears was also better and the window movements and desktop interactivity was far better. When compiling the Linux kernel with 128 jobs or other workloads that apply even greater strain, the results are even more dramatic, but it is not great for a video demonstration; the first video recorded under greater strained made the "before look" appear as like a still photograph.
This could be potentially patched into our Eclair kernel if the changes aren't too intrusive, and by the sounds of it they're not.
The mainline patch was against 2.6.39 kernel however, our froyo kernel will be 2.6.32 and eclair is 2.6.29 - so we're several revisions behind in eclair.
It's definitely interesting, but it's geared toward desktops using the group scheduler - absolutely worth a try if that scheduler works with android easily ( most of the community kernels are using BFS scheduler however )
cicada said:
This could be potentially patched into our Eclair kernel if the changes aren't too intrusive, and by the sounds of it they're not.
The mainline patch was against 2.6.39 kernel however, our froyo kernel will be 2.6.32 and eclair is 2.6.29 - so we're several revisions behind in eclair.
It's definitely interesting, but it's geared toward desktops using the group scheduler - absolutely worth a try if that scheduler works with android easily ( most of the community kernels are using BFS scheduler however )
Click to expand...
Click to collapse
Sniff...
It did sound a little too good to be true. Well, eventually we will get 2.6.38 and that has it built in, if the desktop group scheduler can even be used at all it seems.
but because its in other peoples' kernels cant it be easily ported into ours?
tyl3rdurden said:
but because its in other peoples' kernels cant it be easily ported into ours?
Click to expand...
Click to collapse
It's very possible to patch in. If it's been done before, anyway.
But, because it is based on the .39 kernel, it might be a little buggy. Or a lot buggy. You wanna link me to a kernel that has it and I'll look into it? I probably will wait for Froyo source for at least the .32 kernel.
Here's what Linus himself had to say about the patch:
Yeah. And I have to say that I'm (very happily) surprised by just how small that patch really ends up being, and how it's not intrusive or ugly either.
I'm also very happy with just what it does to interactive performance. Admittedly, my "testcase" is really trivial (reading email in a web-browser, scrolling around a bit, while doing a "make -j64" on the kernel at the same time), but it's a test-case that is very relevant for me. And it is a _huge_ improvement.
It's an improvement for things like smooth scrolling around, but what I found more interesting was how it seems to really make web pages load a lot faster. Maybe it shouldn't have been surprising, but I always associated that with network performance. But there's clearly enough of a CPU load when loading a new web page that if you have a load average of 50+ at the same time, you _will_ be starved for CPU in the loading process, and probably won't get all the http requests out quickly enough.
So I think this is firmly one of those "real improvement" patches. Good job. Group scheduling goes from "useful for some specific server loads" to "that's a killer feature".
Click to expand...
Click to collapse
DevinXtreme said:
It's very possible to patch in. If it's been done before, anyway.
But, because it is based on the .39 kernel, it might be a little buggy. Or a lot buggy. You wanna link me to a kernel that has it and I'll look into it? I probably will wait for Froyo source for at least the .32 kernel.
Click to expand...
Click to collapse
Devin- I agree with waiting until the Froyo source is out for attempting to implement this. I'm not sure that group scheduling is even an option in the Android kernel. But I don't think anyone has done this so I doubt any links are coming your way.
Edit: Found this here- http://groups.google.com/group/android-kernel/browse_thread/thread/f47d9d4f4e6a116a/ab1a8ab42bb0b84a
Android is using the CFS.
They are combine with RT scheduling.
When you playing the audio and video service, paltform change the
scheduling policy and change the schedule prority.
search the platform code
dalvik has policy n proiorty setting code, also framework related with
audio n video
check the init.rc and cutil folder
u need to search the platform after eclair release (Froyo)
cicada said:
( most of the community kernels are using BFS scheduler however )
Click to expand...
Click to collapse
Actually, no Epic kernel uses BFS. It isn't stable on our hardware, and its not worth porting. Android uses CFS by default, and then the CFQ scheduler I think, but most have switched from CFS/CFQ to CFS/BFQ combination. I know mine & Devin's kernels have.
Geniusdog254 said:
Actually, no Epic kernel uses BFS. It isn't stable on our hardware, and its not worth porting. Android uses CFS by default, and then the CFQ scheduler I think, but most have switched from CFS/CFQ to CFS/BFQ combination. I know mine & Devin's kernels have.
Click to expand...
Click to collapse
Ok then, so in your professional opinion is this patch a possibility still?
Enter your search termsSubmit search formWeblkml.org
Subject [RFC/RFT PATCH] sched: automated per tty task groups
From Mike Galbraith <>
Date Tue, 19 Oct 2010 11:16:04 +0200
Greetings,
Comments, suggestions etc highly welcome.
This patch implements an idea from Linus, to automatically create task groups
per tty, to improve desktop interactivity under hefty load such as kbuild. The
feature is enabled from boot by default, The default setting can be changed via
the boot option ttysched=0, and can be can be turned on or off on the fly via
echo [01] > /proc/sys/kernel/sched_tty_sched_enabled.
Link to code: http://forums.opensuse.org/english/...ernel-speed-up-patch-file-mike-galbraith.html
Thanks for the clarification Geniusdog254.
ZenInsight, any chance you can prune down that post and just use a link? The patch is all over the web right now, and it's hard to scroll by on a phone
ZenInsight said:
Ok then, so in your professional opinion is this patch a possibility still?
Click to expand...
Click to collapse
I'm sure its possible, I just haven't looked at it yet. Like I stated before, until we get 2.6.32 FroYo kernel source I'm not doing any devving besides app work (maybe)
EDIT: Devin said on the last page that he'll look into it. I know IntersectRavens Nexus kernel has it, but I haven't looked into any reports of how much it helps.
Also found this:
Phoronix recently published an article regarding a ~200 lines Linux Kernel patch that improves responsiveness under system strain. Well, Lennart Poettering, a RedHat developer replied to Linus Torvalds on a maling list with an alternative to this patch that does the same thing yet all you have to do is run 2 commands and paste 4 lines in your ~/.bashrc file. I know it sounds unbelievable, but apparently someone even ran some tests which prove that Lennart's solution works. Read on!
Lennart explains you have to add this to your ~/.bashrc file (important: this won't work on Ubuntu. See instructions for Ubuntu further down the post!):
CODE:
if [ "$PS1" ] ; then
mkdir -m 0700 /sys/fs/cgroup/cpu/user/$$
echo $$ > /sys/fs/cgroup/cpu/user/$$/tasks
fi
Linux terminal:
mount -t cgroup cgroup /sys/fs/cgroup/cpu -o cpu
mkdir -m 0777 /sys/fs/cgroup/cpu/user
Further more, a reply to Lennart's email states that his approach is actually better then the actual Kernel patch:
I've done some tests and the result is that Lennart's approach seems to work best. It also _feels_ better interactively compared to the vanilla kernel and in-kernel cgrougs on my machine. Also it's really nice to have an interface to actually see what is going on. With the kernel patch you're totally in the dark about what is going on right now.
-Markus Trippelsdorf
The reply also includes some benchmarks you can see @ http://lkml.org/lkml/2010/11/16/392
Found all this here (Ubuntu patch info too):
http://www.webupd8.org/2010/11/alternative-to-200-lines-kernel-patch.html

JIT on galaxy s

Recently, I finished finally modifying the xperia x10 that I have and one of the greatest improvements that those guys achieved was getting a lot more processing power by enabling JIT.
So naturally after seeing it work miracles on the x10, I went to look for it on my captivate and so far have come up with nothing anywhere. I saw some discussion in the past about it but nothing beyond.
Hopefully somebody can enlighten and if such a thing does seriously want to be worked on.... Here's a thread!
Sent from my GT-I9000 using Tapatalk
FroYo (2.2) and later have Just-In-Time compilation (JIT) out of the box. There are probably a dozen FroYo ROMs on the front page alone, so pick one have fun!
From what I've read, it's been a part of the ROMs since 2.1 but it needs to be enabled. Is this the case here or am I missing something?
Sent from my GT-I9000 using Tapatalk
kr3w1337 said:
From what I've read, it's been a part of the ROMs since 2.1 but it needs to be enabled. Is this the case here or am I missing something?
Sent from my GT-I9000 using Tapatalk
Click to expand...
Click to collapse
That is one of the biggest parts of froyo is that it enables JIT...so if you run a rom that has froyo its enabled unless I'm mistaken...
Sent from my GT-I9000 using XDA App
JIT is enabled by default in all Froyo ROM's. You can check the build.prop for ****s and giggles though.
The Galaxy S won't show the kinds of scores that Qualcomm based devices will, even with JIT enabled. Qualcomm included 128-bit SIMD Floating Point extensions with Snapdragon, while the Hummingbird only has 64-bit extensions.
yes with jit we get a 60-70% improvement but a qualcom gets 300%+ in floating point operation. in modern computers there are many other factors though and quardrant cpu scores are still very high for our chip (if you use the pay version you can see the score breakdown). so dont let the linpack scores discorage you. ive gotten as high as 18.2 in linpack with some overclocking though, which isn't bad. it's not the 50+ in the qualcom phones with some mods but not bad.
So that's what the story is, thanks guys! Wondering if things could be improved beyond overclocking...
I know it can be but don't know how. Some guys are getting 25+ in linpack on there website. So there is something else holding us back a bit.
Sent from my SAMSUNG-SGH-I897 using Tapatalk
Dani897 said:
I know it can be but don't know how. Some guys are getting 25+ in linpack on there website. So there is something else holding us back a bit.
Sent from my SAMSUNG-SGH-I897 using Tapatalk
Click to expand...
Click to collapse
You talking about xperia x10? really 25 on linpack? I dunno if that is even possible to OC to astronomical speeds. They run on older hardware and I think most are still on 2.1.....
All Android ROMs have a JIT compiler, it literally is what compiles all the Java on the fly. Newer versions are optimized for better performance.
So the better question is do the 2.2 Froyo ROMs have the latest JIT compiler version available or does the Nexus S have a newer more performant version we can steal. More than likely it will not work with 2.2 since the Nexus S is using 2.3.
From everything I have heard the newer JIT comp versions are optimized for the Snapdragon chipset more than anything. Which doesn't do us much good.
2.2+ have the JIT. Prior to 2.2, all programs ran entirely as interpreted bytecode on an isolated virtual machine. In 2.2+, the JIT translates the most cpu "heavy" bytecode down to native instructions during execution, stores it in cache, then runs it natively on the processor in a protected mode. Dig the video below, it's an hour long but the functionality of the JIT is explained in the first 15 minutes.
^Edited the above to accurately describe the function of the JIT
http://www.youtube.com/watch?v=Ls0tM-c4Vfo There you go, dudes. JIT Demystified.
modest_mandroid said:
2.2+ have the JIT. Prior to 2.2, all programs ran as bytecode on an isolated virtual machine. In 2.2+, the JIT translates the bytecode down to native instructions just before execution, then runs it natively on the processor in a protected mode. Could be wrong, but to my understanding, that is the major difference between 2.1- and 2.2+.
http://www.youtube.com/watch?v=Ls0tM-c4Vfo There you go, dudes. JIT Demystified.
Click to expand...
Click to collapse
This is what I saw and know too. However, I believe that the compiler that we are using is rather old and has some room for improvement. The nexus S compiler working on the captivate is a possibility that could become true after the 2.3 port is finished, provided that the developers look into it.
Sent from my GT-I9000 using Tapatalk

[Q] How are apps using dual core in the new android phones?

I already posted this in general, but this might be a more suitable place.
The way I see it, android (especially the newer versions) are capable of distributing several processes over the dual cores. Apps however don't utilise both cores.
Another question: what kind of apps would benefit from using both cores? I could imagine that heavy games and home launchers could benefit from using both cores.
Are there tools available for android that enable multithreading apps? And what are the average prices of app development tools?
Im currently working in Java, but how do most people write Android apps?
Just found some information that confirms my thoughts on how android uses parallel structures, but this still leaves open my other questions:
"Android 2.2 already takes advantage of multicore. Anything that multitasks and multithreads already takes advantage of multicore. But this exploitation is a matter of degrees.
Android 2.3 takes further advantage of the multicore, because unlike Android 2.2, 2.3's file system is multithreaded one, not single threaded. When it comes to file I/O or database searches. 2.3 will be a lot faster.
Android 2.4 or 3.1 as rumored to be, will take even greater advantage of multicores with further "architecting" parts of the OS to use more multithreads."
Android 2.3 has concurrent garbage collection which I imagine will take advantage of dual core phones. This should really help to reduce any lag or stuttering in apps and games.
First, if you develop something that requires much CPU power, then you should always try to do it using multiple threads. This is a general rule, not only related to Android.
Second, main thread of an app is UI thread and you should never run CPU-consuming tasks in it. So actually you are forced to use multi-threading in Android apps.
Third, there are many things that Android itself could run in parallel to your app: garbage collector, UI changes, animations, background apps, etc.
Brut.all said:
First, if you develop something that requires much CPU power, then you should always try to do it using multiple threads. This is a general rule, not only related to Android.
Second, main thread of an app is UI thread and you should never run CPU-consuming tasks in it. So actually you are forced to use multi-threading in Android apps.
Third, there are many things that Android itself could run in parallel to your app: garbage collector, UI changes, animations, background apps, etc.
Click to expand...
Click to collapse
Thanks for you quick answer! Could I use a Java solution that makes multithreading obsolete to make this all easier? And then just pack it into an apk? Sorry, but im pretty new to this.
Stitch! said:
Thanks for you quick answer! Could I use a Java solution that makes multithreading obsolete to make this all easier? And then just pack it into an apk? Sorry, but im pretty new to this.
Click to expand...
Click to collapse
I don't understand. How does Java makes multithreading obsolete? Besides, MT isn't really that hard if you have good tools for asynchronous processing of tasks. Java/Android gives you such tools.
Brut.all said:
I don't understand. How does Java makes multithreading obsolete? Besides, MT isn't really that hard if you have good tools for asynchronous processing of tasks. Java/Android gives you such tools.
Click to expand...
Click to collapse
Not Java, but an Java extension called Ateji Parallel extensions. There is a demo here: http://www.youtube.com/watch?v=8MDbqTgCDIA
I was just wondering, if it would be worth developing for android. The video is a demo on a dual core, and the new quadcore dev kit just came in. Additions to the that I thought about now are a timer and perhaps some other figures that can indicate the difference. Do you have any ideas on this?
Really appreciate you input, thanks!
Stitch! said:
Not Java, but an Java extension called Ateji Parallel extensions. There is a demo here: http://www.youtube.com/watch?v=8MDbqTgCDIA
I was just wondering, if it would be worth developing for android. The video is a demo on a dual core, and the new quadcore dev kit just came in. Additions to the that I thought about now are a timer and perhaps some other figures that can indicate the difference. Do you have any ideas on this?
Really appreciate you input, thanks!
Click to expand...
Click to collapse
Yes it would be worth it to develop for android. The newer android phone's dual core processors are utilized by games but only when a new version of android (future ice cream sandwich and later so i have read), will be able to support multiple processors. Also android really needs some 3D HD games like what Apple has made for the Iphone. I hope you decide to develop for android.
I still don't understand why it's so important to you. You don't need Ateji to utilize multiple cores, actually their demo is just a few lines of pure Java code. Ateji could make things easier, but it doesn't do any magic.
Stitch! said:
Another question: what kind of apps would benefit from using both cores? I could imagine that heavy games and home launchers could benefit from using both cores.
Click to expand...
Click to collapse
Anything involving image processing is a good candidate. For example, if you want to sharpen a photo, you can have one core processing the top half and one core processing the bottom half. Saying that though, I've found the single threaded performance of newer processors is fast enough for typical image filters.

[Q] Does Linaro, GCC version, -O3, etc really make a difference?

Looking at all the roms that are out there, I see that many of them advertise features like Linaro, GCC 4.7-4.9, -O3, strict aliasing, and isoc++11 mode. To the end user on JB, do these really make a difference? If they do make a difference are there any objective benchmarks to prove it? Or do these features only impact the speed at which android compiles with no impact on the end user?
PacifistRiot said:
Looking at all the roms that are out there, I see that many of them advertise features like Linaro, GCC 4.7-4.9, -O3, strict aliasing, and isoc++11 mode. To the end user on JB, do these really make a difference? If they do make a difference are there any objective benchmarks to prove it? Or do these features only impact the speed at which android compiles with no impact on the end user?
Click to expand...
Click to collapse
It's a difficult question to reply.
This requires that you build the same codebase with different compilers / different optflags and benchmark their performances.
Theoretically, newer gcc should generate better binaries, but that's not always the case.
Palatis said:
It's a difficult question to reply.
This requires that you build the same codebase with different compilers / different optflags and benchmark their performances.
Theoretically, newer gcc should generate better binaries, but that's not always the case.
Click to expand...
Click to collapse
It would require some effort but adding these as "features" to roms without further explanation would imply that the end user is benefited by this. And to actually benefit from it we would need some objective evidence on their performance increase for JB.

On the 2 GB variant's multitasking issues

I think I may have found a workaround for 2gb users to get more apps running on the device: run in 32 bit mode. See this study by linaro for details on the ram impact of arm64 for some background,
https://wiki.linaro.org/Platform/Android/MemoryFootprintAnalysis
Starting web browser to play youtube video in 768M Android (64bit) triggered lowmemorykiller many times and even had kernel dump_backtrace.
For Android-32b with 768M memory, the lowmemorykiller does not active when opening web browser until I the 2nd tab is opened.
Click to expand...
Click to collapse
As for how to do this, I recompiled an aosp room with a flag set in boardconfig.mk, TARGET_PREFER_32_BIT := true, then forced gapps arm to install on aarch64... but I think there might be an easier way. Any devs care to weigh in?
Ran some benchmarks... BrowserBench Speedometer gave 32.7 which is surprisingly high, IIRC I was getting 25ish before. [Edit: this is a normal score] My Pixel just returned 23.
Here's my 32-bit geekbench result, overall about 10% slower but it's actually faster in a few areas (single core HTML5 DOM by almost 50%!)
https://browser.primatelabs.com/v4/cpu/3386873
https://browser.primatelabs.com/v4/cpu/compare/3370913?baseline=3386873
Wow, I didn't know you could use a 32bit ROM! Have you noticed any better multitasking, maybe you could post a screenshot of developer options-> running services?
Also, an issues?
most probable reason there may be running apps in background..try using greenify to hibernate all running apps
iG0tB0lts said:
Wow, I didn't know you could use a 32bit ROM! Have you noticed any better multitasking, maybe you could post a screenshot of developer options-> running services?
Also, an issues?
Click to expand...
Click to collapse
I haven't used the phone extensively but first impression is yes, better multitasking. Although this might be due to just coming from an older LineageOS build. Taking a look at the running services screen right now, I'm sitting at 885/309/652 System/Apps/Free. This isn't a true 32-bit ROM, just running most things on the 32-bit zygote. It's still using a 64-bit kernel and userland, so it could be possible that pointers and such are still all 64-bit.
blackbodypie said:
I haven't used the phone extensively but first impression is yes, better multitasking. Although this might be due to just coming from an older LineageOS build. Taking a look at the running services screen right now, I'm sitting at 885/309/652 System/Apps/Free. This isn't a true 32-bit ROM, just running most things on the 32-bit zygote. It's still using a 64-bit kernel and userland, so it could be possible that pointers and such are still all 64-bit.
Click to expand...
Click to collapse
I see. Just if you wanted my low-level solution for the multitasking issue, I use a custom kernel that has lz4 zram compression, and set it to 384mb(I'm using EX kernel + RR) and use greenify and freeze system apps like email, etc. I would say the zram setting makes a noticeable difference.

Categories

Resources