Académique Documents
Professionnel Documents
Culture Documents
Jobs are waiting in the queue, or staying in the queue, after writing ha... Page 1 of 4
Available drives are not being used. Jobs are waiting in the queue, or staying in the queue,
after writing has completed. New jobs are taking an extended time to appear in the queue.
Defunct bpsched processes. Exit status 96s and 54s.
Details:
Identifying this problem can be done by looking in
the /usr/openv/netbackup/logs/bpsched/log.date and
the /usr/openv/netbackup/logs/bptm/log.date.
As the workload increases on the master server, the response time for the start_bptm -countmedia
function call to the volume database can take a long time to return. This delay is typically seen in
environments with volume databases containing over 25000 pieces of media and classes configured
to use ANY AVAILABLE STORAGE UNIT. Typically the reason the MAIN bpsched process gets
behind is due to a large number of user directed backups being submitted at one time. This is
common in environments with clients running VERITAS NetBackup (tm) database extensions such
as Oracle, Sybase, etc. By configuring all the classes to use ANY AVAILABLE STORAGE UNIT,
the taxing of bpsched is dramatically increased because the start_bptm -countmedia function
must count media on all configured storage units for each backup. This increases the probability of
seeing these problems.
With volume databases containing over 25000 pieces of media, many start_bptm -countmedia
requests in a short period of time will cause the MAIN bpsched process to fall behind because of
delayed response from VMD. If the MAIN bpsched process falls behind on its work, the waiting
bpsched main_empty's child processes will show up as defunct processes in a ps output. Once the
MAIN bpsched process catches up however, it will start to clean up those defunct processes. This
performance delay causes problems with getting jobs active and can make jobs fail with:
Notice the long time to finish (one minute in this example). Normal countmedia is about one
file://H:\study\netbackup\Upload_site_done\done\New Folder\Available drives are not being used_ Jobs are ... 7/6/2010
Available drives are not being used. Jobs are waiting in the queue, or staying in the queue, after writing ha... Page 2 of 4
second. A delay will block the scheduler from doing other processing, and keep jobs from going
active and drives from being used. It will also prevent completed jobs from being removed from the
queue.
From /usr/openv/netbackup/logs/bptm/log.(date)
00:27:33 [29212] <2> bptm: INITIATING: -countmedia -cmd -rt
1 -rn 0 -stunit 9740-0 -den 14 -p RMAN_pool1 -rl 5
00:27:33 [29212] <2> add_to_vmhost_list: added
<masterserver>.domain.com to vmhost list
00:27:33 [29212] <2> add_to_vmhost_list: added
<mediaserver>.domain.com to vmhost list
00:27:33 [29212] <2> getsockconnected:
host=<masterserver>.domain.com service=vmd
address=192.x.x.1 protocol=tcp non-reserved port=13701
00:27:33 [29212] <2> vmdb_get_scratch_list: server
returned: Scratch_pool
00:27:33 [29212] <2> vmdb_get_scratch_list: server
returned: EXIT_STATUS 0
00:27:33 [29212] <2> getsockconnected:
host=<masterserver>.domain.com service=vmd
address=192.x.x.1 protocol=tcp non-reserved port=13701
00:28:33 [29212] <2> bptm: EXITING with status 0 <----------
Workaround:
- Reduce/eliminate other applications fighting for system resources on the system where vmd is
running
- Ensure that the underlying system is using and has enough cache to handle the volume database
- Ensure that the file system on which the volume database is resident handles disk I/O quickly
- Ensure that the network is fast enough to deliver meta data between vmd and its requesters (bptm)
- Tune the tcp_time_wait_interval to a shorter period of time so the socket resources are more
available for the countmedia processes
file://H:\study\netbackup\Upload_site_done\done\New Folder\Available drives are not being used_ Jobs are ... 7/6/2010
Available drives are not being used. Jobs are waiting in the queue, or staying in the queue, after writing ha... Page 3 of 4
- Purchase new tape technology with higher tape capacity that reduces the need for the number of
individual volumes required
- Use multiple, smaller robotic libraries so that storage unit queries don't need to return a large
number of volumes on each query
- In the upcoming release of VERITAS NetBackup (tm) 4.5, the use of the storage unit groups will
help reduce the number of media servers that need to be contacted during the countmedia function.
NOTE:
Disabling countmedia will only cause problems if a storage unit is out of media. Backups could fail
with a status 96 (no available media) instead of using another storage unit that has media available.
This will only be a problem if there are multiple storage units and the classes and/or schedules are
set to use ANY AVAILABLE STORAGE UNIT. Even if the storage unit is set to "Any Available,"
they will not get into this situation if they have available media in all their storage units and pools.
To avoid this situation, use the scratch pool feature of NetBackup.
NOTE: Process job complete code will re-enable counting if you get an error
EC_no_available_media(96).
i.e. If they run out of media, NetBackup starts counting again. Once media is added, or becomes
available for use, recycling the NetBackup daemons will re-enable the DISABLE_COUNTMEDIA
workaround.
NOTE: VERITAS NetBackup engineering is currently exploring ways to improve the performance
of the countmedia function.
Supplemental Material:
Products Applied:
NetBackup DataCenter 3.4, 3.4.1, 4.5 (Fixed)
file://H:\study\netbackup\Upload_site_done\done\New Folder\Available drives are not being used_ Jobs are ... 7/6/2010
Available drives are not being used. Jobs are waiting in the queue, or staying in the queue, after writing ha... Page 4 of 4
Languages:
English (US)
Operating Systems:
AIX
4.3.3
HP-UX
11.0, 11.11
Solaris
2.6, 7.0 (32-bit), 8.0 (32-bit)
THE INFORMATION PROVIDED IN THE SYMANTEC SOFTWARE KNOWLEDGE BASE IS PROVIDED "AS IS" WITHOUT WARRANTY
OF ANY KIND. SYMANTEC SOFTWARE DISCLAIMS ALL WARRANTIES, EITHER EXPRESS OR IMPLIED, INCLUDING THE
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. IN NO EVENT SHALL SYMANTEC
SOFTWARE OR ITS SUPPLIERS BE LIABLE FOR ANY DAMAGES WHATSOEVER INCLUDING DIRECT, INDIRECT, INCIDENTAL,
CONSEQUENTIAL, LOSS OF BUSINESS PROFITS OR SPECIAL DAMAGES,EVEN IF SYMANTEC SOFTWARE OR ITS SUPPLIERS
HAVE BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. SOME STATES DO NOT ALLOW THE EXCLUSION OR
LIMITATION OF LIABILITY FOR CONSEQUENTIAL OR INCIDENTAL DAMAGES SO THE FOREGOING LIMITATION MAY NOT
APPLY.
file://H:\study\netbackup\Upload_site_done\done\New Folder\Available drives are not being used_ Jobs are ... 7/6/2010