Vous êtes sur la page 1sur 4

Available drives are not being used.

Jobs are waiting in the queue, or staying in the queue, after writing ha... Page 1 of 4

Document ID: 237534


http://support.veritas.com/docs/237534
E-Mail this document to a colleague

Available drives are not being used. Jobs are waiting in the queue, or staying in the queue,
after writing has completed. New jobs are taking an extended time to appear in the queue.
Defunct bpsched processes. Exit status 96s and 54s.

Exact Error Message


Exit Status Code 54: timed out connecting to client;

Exit Status Code 96: unable to allocate new media for


backup, storage unit has none available

Details:
Identifying this problem can be done by looking in
the /usr/openv/netbackup/logs/bpsched/log.date and
the /usr/openv/netbackup/logs/bptm/log.date.

NOTE: VERBOSE=11 must be present in the /usr/openv/netbackup/bp.conf file prior to the


failure in order to identify this issue.

As the workload increases on the master server, the response time for the start_bptm -countmedia
function call to the volume database can take a long time to return. This delay is typically seen in
environments with volume databases containing over 25000 pieces of media and classes configured
to use ANY AVAILABLE STORAGE UNIT. Typically the reason the MAIN bpsched process gets
behind is due to a large number of user directed backups being submitted at one time. This is
common in environments with clients running VERITAS NetBackup (tm) database extensions such
as Oracle, Sybase, etc. By configuring all the classes to use ANY AVAILABLE STORAGE UNIT,
the taxing of bpsched is dramatically increased because the start_bptm -countmedia function
must count media on all configured storage units for each backup. This increases the probability of
seeing these problems.

With volume databases containing over 25000 pieces of media, many start_bptm -countmedia
requests in a short period of time will cause the MAIN bpsched process to fall behind because of
delayed response from VMD. If the MAIN bpsched process falls behind on its work, the waiting
bpsched main_empty's child processes will show up as defunct processes in a ps output. Once the
MAIN bpsched process catches up however, it will start to clean up those defunct processes. This
performance delay causes problems with getting jobs active and can make jobs fail with:

Exit Status Code 54: timed out connecting to client;


and
Exit Status Code 96: unable to allocate new media for
backup, storage unit has none available

Below is the media count up example in the bptm log:


bptm: INITIATING: -countmedia

Notice the long time to finish (one minute in this example). Normal countmedia is about one

file://H:\study\netbackup\Upload_site_done\done\New Folder\Available drives are not being used_ Jobs are ... 7/6/2010
Available drives are not being used. Jobs are waiting in the queue, or staying in the queue, after writing ha... Page 2 of 4

second. A delay will block the scheduler from doing other processing, and keep jobs from going
active and drives from being used. It will also prevent completed jobs from being removed from the
queue.

From /usr/openv/netbackup/logs/bptm/log.(date)
00:27:33 [29212] <2> bptm: INITIATING: -countmedia -cmd -rt
1 -rn 0 -stunit 9740-0 -den 14 -p RMAN_pool1 -rl 5
00:27:33 [29212] <2> add_to_vmhost_list: added
<masterserver>.domain.com to vmhost list
00:27:33 [29212] <2> add_to_vmhost_list: added
<mediaserver>.domain.com to vmhost list
00:27:33 [29212] <2> getsockconnected:
host=<masterserver>.domain.com service=vmd
address=192.x.x.1 protocol=tcp non-reserved port=13701
00:27:33 [29212] <2> vmdb_get_scratch_list: server
returned: Scratch_pool
00:27:33 [29212] <2> vmdb_get_scratch_list: server
returned: EXIT_STATUS 0
00:27:33 [29212] <2> getsockconnected:
host=<masterserver>.domain.com service=vmd
address=192.x.x.1 protocol=tcp non-reserved port=13701
00:28:33 [29212] <2> bptm: EXITING with status 0 <----------

Workaround:

Touch /usr/openv/netbackup/DISABLE_COUNTMEDIA on the master server.


This eliminates the start_bptm -countmedia from being started.

Other possible workarounds are:


- Configure all of your classes for specific storage units rather than ANY AVAILABLE STORAGE
UNIT

- Find a more powerful system for hosting vmd

- Reduce/eliminate other applications fighting for system resources on the system where vmd is
running

- Ensure that the underlying system is using and has enough cache to handle the volume database

- Ensure that the file system on which the volume database is resident handles disk I/O quickly

- Ensure that the network is fast enough to deliver meta data between vmd and its requesters (bptm)

- Tune the tcp_time_wait_interval to a shorter period of time so the socket resources are more
available for the countmedia processes

file://H:\study\netbackup\Upload_site_done\done\New Folder\Available drives are not being used_ Jobs are ... 7/6/2010
Available drives are not being used. Jobs are waiting in the queue, or staying in the queue, after writing ha... Page 3 of 4

- Purchase new tape technology with higher tape capacity that reduces the need for the number of
individual volumes required

- Use multiple, smaller robotic libraries so that storage unit queries don't need to return a large
number of volumes on each query

- In the upcoming release of VERITAS NetBackup (tm) 4.5, the use of the storage unit groups will
help reduce the number of media servers that need to be contacted during the countmedia function.

NOTE:
Disabling countmedia will only cause problems if a storage unit is out of media. Backups could fail
with a status 96 (no available media) instead of using another storage unit that has media available.
This will only be a problem if there are multiple storage units and the classes and/or schedules are
set to use ANY AVAILABLE STORAGE UNIT. Even if the storage unit is set to "Any Available,"
they will not get into this situation if they have available media in all their storage units and pools.
To avoid this situation, use the scratch pool feature of NetBackup.

NOTE: Process job complete code will re-enable counting if you get an error
EC_no_available_media(96).

i.e. If they run out of media, NetBackup starts counting again. Once media is added, or becomes
available for use, recycling the NetBackup daemons will re-enable the DISABLE_COUNTMEDIA
workaround.

NOTE: VERITAS NetBackup engineering is currently exploring ways to improve the performance
of the countmedia function.

NOTE: This issue has been resolved in NetBackup 4.5.

Supplemental Material:

System: Ref.# Description


DEFECT: RSVmn15294 Large number of volumes in volDB affects NBU scheduler

Products Applied:
NetBackup DataCenter 3.4, 3.4.1, 4.5 (Fixed)

Last Updated: November 15 2004 06:57 AM


GMT
Expires on: 11-15-2005
Subscribe to receive critical updates about this
document
Subjects:
NetBackup DataCenter
Application: Documentation, Notification, Usability

file://H:\study\netbackup\Upload_site_done\done\New Folder\Available drives are not being used_ Jobs are ... 7/6/2010
Available drives are not being used. Jobs are waiting in the queue, or staying in the queue, after writing ha... Page 4 of 4

Database: Configuration, Documentation, Faq, Media Management

Languages:
English (US)

Operating Systems:
AIX
4.3.3

HP-UX
11.0, 11.11

Solaris
2.6, 7.0 (32-bit), 8.0 (32-bit)

Symantec World Headquarters:


20330 Stevens Creek Blvd. Cupertino, CA 95014
World Wide Web: http://www.symantec.com/,
Tech Support Web: http://entsupport.symantec.com/,
E-Mail Support:
http://seer.entsupport.symantec.com/email_forms,
FTP: ftp://ftp.entsupport.symantec.com/ or
ftp://ftp.entsupport.symantec.com/

THE INFORMATION PROVIDED IN THE SYMANTEC SOFTWARE KNOWLEDGE BASE IS PROVIDED "AS IS" WITHOUT WARRANTY
OF ANY KIND. SYMANTEC SOFTWARE DISCLAIMS ALL WARRANTIES, EITHER EXPRESS OR IMPLIED, INCLUDING THE
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. IN NO EVENT SHALL SYMANTEC
SOFTWARE OR ITS SUPPLIERS BE LIABLE FOR ANY DAMAGES WHATSOEVER INCLUDING DIRECT, INDIRECT, INCIDENTAL,
CONSEQUENTIAL, LOSS OF BUSINESS PROFITS OR SPECIAL DAMAGES,EVEN IF SYMANTEC SOFTWARE OR ITS SUPPLIERS
HAVE BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. SOME STATES DO NOT ALLOW THE EXCLUSION OR
LIMITATION OF LIABILITY FOR CONSEQUENTIAL OR INCIDENTAL DAMAGES SO THE FOREGOING LIMITATION MAY NOT
APPLY.

file://H:\study\netbackup\Upload_site_done\done\New Folder\Available drives are not being used_ Jobs are ... 7/6/2010

Vous aimerez peut-être aussi