Infrastructure Software

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

too slow ZFS

807559Jul 24 2009 — edited Jul 27 2009

Hello,

I'm using Solaris 10 10/08 with ZFS.
There are processes which flush to write data about 10~100MB every 10 minutes, but sometimes writing doesn't finish in 10 minutes!

Exactly 30 processes are running and each process fires from 10 to 100MB every 10 minutes.
It means that 1.5GB can be fired to write at a moment in the worst case.
But each process read/write only files which exclusively belong to the process and flushes data at different time.
The probability that the worst case I mentioned occur is very low.

But too slow writing occurs quite often. That's the problem I wanna solve.

The system makes a snapshot every hour, so there are very many snapshots of evolving data.
Every data file has less than 1.1GB size.

I wanna know what makes the writing too slow.

The program is using the BerkeleyDB and the BerkeleyDB writes the data to file system.
Of course there is some probability that the BerkeleyDB is the reason.
If the problem is in the BerkeleyDB, CPU use should be high. But it's not.

No CPU is busy. No process uses CPU more than 0.20%, No disk is busy. And the system has 16GB physical memory!
Is 16GB memory too small for ZFS?

The system specification is following.

CPU : Intel Xeon 2.0GHz x2 (total 8 core)
Memory : 16G real, 1116M free, 1174M swap in use, 7686M swap free

[root@host:/tmp] # /opt/HPQacucli/sbin/hpacucli controller slot=2 physicaldrive all show

Smart Array P800 in Slot 2

   array A

      physicaldrive 2E:1:1 (port 2E:box 1:bay 1, SATA, 500 GB, OK)
      physicaldrive 2E:1:2 (port 2E:box 1:bay 2, SATA, 500 GB, OK)
      physicaldrive 2E:1:3 (port 2E:box 1:bay 3, SATA, 500 GB, OK)
      physicaldrive 2E:1:4 (port 2E:box 1:bay 4, SATA, 500 GB, OK)
      physicaldrive 2E:1:5 (port 2E:box 1:bay 5, SATA, 500 GB, OK, spare)

   array B

      physicaldrive 2E:1:7 (port 2E:box 1:bay 7, SATA, 500 GB, OK)
      physicaldrive 2E:1:8 (port 2E:box 1:bay 8, SATA, 500 GB, OK)
      physicaldrive 2E:1:9 (port 2E:box 1:bay 9, SATA, 500 GB, OK)
      physicaldrive 2E:1:10 (port 2E:box 1:bay 10, SATA, 500 GB, OK)
      physicaldrive 2E:1:11 (port 2E:box 1:bay 11, SATA, 500 GB, OK, spare)


[root@host:/] # /opt/HPQacucli/sbin/hpacucli controller slot=2 physicaldrive 2e:1:1 show

Smart Array P800 in Slot 2

   array A

      physicaldrive 2E:1:1
         Port: 2E
         Box: 1
         Bay: 1
         Active Path:
            Port: 2E Box: 1 Bay: 1 
         Status: OK
         Drive Type: Data Drive
         Interface Type: SATA
         Size: 500 GB
         Firmware Revision:     HPG3
         Serial Number: 5QG26WQV            
         Model: ATA     GB0500C4413     
         SATA NCQ Capable: True
         SATA NCQ Enabled: True
         PHY Count: 1
         PHY Transfer Rate: 1.5GBPS
(*** Every physical drive has same specification. ***)

[doomvas@host user]$ top
load averages:  0.09,  0.12,  0.23
138 processes: 137 sleeping, 1 on cpu
CPU states: 98.9% idle,  0.3% user,  0.8% kernel,  0.0% iowait,  0.0% swap
Memory: 16G real, 995M free, 1177M swap in use, 7570M swap free

   PID USERNAME LWP PRI NICE  SIZE   RES STATE    TIME    CPU COMMAND
  2692 doomvas   13  59    0  118M  115M sleep   12:25  0.05% db
  2537 doomvas   13  59    0   57M   55M sleep    6:37  0.03% db
  2329 doomvas   13  59    0   52M   50M sleep    6:27  0.03% db
  2472 doomvas   13  59    0   51M   48M sleep    7:34  0.03% db
  2368 doomvas   13  59    0   54M   52M sleep    6:31  0.03% db
  2303 doomvas   13  59    0   66M   63M sleep    5:18  0.03% db
  2407 doomvas   13  59    0   42M   39M sleep    5:29  0.02% db
  2381 doomvas   13  59    0   43M   41M sleep    5:15  0.02% db
  2394 doomvas   13  59    0   45M   43M sleep    4:34  0.02% db
  2420 doomvas   13  59    0   42M   39M sleep    5:29  0.02% db
  2446 doomvas   13  59    0   38M   35M sleep    5:05  0.02% db
  2316 doomvas   13  59    0   36M   33M sleep    4:29  0.02% db
  2342 doomvas   13  59    0   38M   35M sleep    4:28  0.02% db
  2459 doomvas   13  59    0   29M   27M sleep    4:08  0.02% db
  2550 doomvas   13  59    0   37M   35M sleep    3:49  0.02% db
(*** CPU use for a process doesn't be over 0.20% ***)

I made a dtrace script to monitor every fbt:zfs:: probe and profiled for about 30 minutes.

the most frequent probe : zio_wait_for_children
total count = 5855583
total time (nanoseconds) = 7941992752
average time = 1356

the most long-take probe : txg_thread_wait
total count = 270
total time = 3235383561647 (about 3235 seconds)
average time = 12072326722 (about 12 seconds)

If anyone has solution, knows the reason or has an idea to find out the reason, please reply.

Thank you.

Edited by: shintaek on Jul 23, 2009 11:48 PM

Locked Post

New comments cannot be posted to this locked post.

Locked on Aug 24 2009

Added on Jul 24 2009

#oracle-solaris, #solaris-zones

5 comments

342 views