Infrastructure Software

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

Swapping performance, and large pages

807559Apr 14 2007 — edited Apr 23 2007

I am trying to run glpsol (the standalone solver from the GNU Linear Programming Kit) on a very large model. I don't have enough physical memory to fit the entire model, so I configured a lot of swap. glpsol, unfortunately, uses more memory to parse and preprocess the model than it does to actually run the core solver, so my approximately 2-3GB model requires 11GB of memory to get started. (However, much of this acess is sequential.)

What I am encountering is that my new machine, running Solaris 10 (11/06) on a dual-core Athlon (64-bit, naturally) with 2GB or memory, is starting up much, much more slowly than my old desktop machine, running Linux (2.6.3) on a single-core Athlon 64 with 1GB of memory. Both machines are using identical SATA drives for swap, though with different motherboard controllers. The Linux machine gets started in about three hours, while Solaris takes 9 hours or more.

So, here's what I've found out so far, and tried.

On Solaris, swapping takes place 1 page (4KB) at a time. You can see from this example iostat output that I'm getting about 6-7ms latency from the disk but that each of the reads is just 4KB. (629KB/s / 157 read/s = 4KB/read )

device       r/s    w/s   kr/s   kw/s wait actv  svc_t  %w  %b
cmdk0      157.2   14.0  628.8  784.0  0.1  1.0    6.6   2  99
cmdk1        0.0    0.0    0.0    0.0  0.0  0.0    0.0   0   0
sd0          0.0    0.0    0.0    0.0  0.0  0.0    0.0   0   0

Linux has a feature called page clustering which swaps in multiple 4KB pages at once--- currently set to 8 pages (32KB).

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
hda            1270.06     2.99  184.23    6.39 11635.93    76.65    61.45     1.50    7.74   5.21  99.28
hdc               0.00     0.00    0.40    0.20     4.79     1.60    10.67     0.00    0.00   0.00   0.00
md0               0.00     0.00    1.00    0.00    11.18     0.00    11.20     0.00    0.00   0.00   0.00
hdg               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00

(11636 sectors/sec = 5818KB/sec. Divided by 184 reads/sec gives just under 32KB.)

I didn't find anything I could tune in the Solaris kernel that would increase the granularity at which pages are swapped to disk.

I did find that Solaris supports large pages (2MB on x64, verified with "pagesize -a"), so I modified glpsol to use larger chunks (16MB) for its custom allocator and used memalign to allocate these chunks at 2MB boundaries. Then I rebooted the system and ran glpsol with

ppgsz -o heap=2MB glpsol ...

I verified with pmap -s that 2MB pages were being used, but only a very few of them.

8148:   glpsol --cpxlp 3cljf-5.cplex --output solution-5 --log log-5 
         Address       Bytes Pgsz Mode   Mapped File
0000000000400000        116K    - r-x--  /usr/local/bin/glpsol
000000000041D000          4K   4K r-x--  /usr/local/bin/glpsol
000000000041E000        432K    - r-x--  /usr/local/bin/glpsol
0000000000499000          4K    - rw---  /usr/local/bin/glpsol
0000000000800000      25556K    - rw---    [ heap ]
00000000020F5000        944K   4K rw---    [ heap ]
00000000021E1000          4K    - rw---    [ heap ]
00000000021E2000         68K   4K rw---    [ heap ]
00000000021F3000          4K    - rw---    [ heap ]
....
00000000087C3000          4K   4K rw---    [ heap ]
00000000087C4000       2288K    - rw---    [ heap ]
0000000008A00000       2048K   2M rw---    [ heap ]
0000000008C00000       2876K    - rw---    [ heap ]
0000000008ECF000        480K   4K rw---    [ heap ]
0000000008F47000          4K    - rw---    [ heap ]
...
000000003F4E8000          4K   4K rw---    [ heap ]
000000003F4E9000       5152K    - rw---    [ heap ]
000000003F9F1000         60K   4K rw---    [ heap ]
000000003FA00000       2048K   2M rw---    [ heap ]
000000003FC00000       6360K    - rw---    [ heap ]
0000000040236000        368K   4K rw---    [ heap ]
etc.

There are only 19 large pages listed (a total of 38MB of physical memory.)

I think my next step, if I don't receive any advice, is to try to preallocate the entire region of memory which stores (most of) the model as a single allocation. But I'd appreciate any insight as to how to get better performance, without a complete rewrite of the GLPK library.

1. When using large pages, is the entire 2MB page swapped out at once? Or is the 'large page' only used for mapping in the TLB? The documentation I read on swap/paging and on large pages didn't really explain the interaction. (I wrote a dtrace script which logs which pages get swapped into glpsol but I haven't tried using it to see if any 2MB pages are swapped in yet.)

2. If so, how can I increase the amount of memory that is mapped using large pages? Is there a command I can run that will tell me how many large pages are available? (Could I boot the kernel in a mode which uses 2MB pages only, and no 4KB pages?)

3. Is there anything I should do to increase the performance of swap? Can I give a hint to the kernel that it should assume sequential access? (Would "madvise" help in this case? The disk appears to be 100% active so I don't think adding more requests for 4KB pages is the answer--- I want to do more efficient disk access by loading bigger chunks of data.)

Locked Post

New comments cannot be posted to this locked post.

Locked on May 21 2007

Added on Apr 14 2007

#oracle-solaris, #solaris-on-x86

1 comment

516 views