Oracle® Secure Enterprise Search Administrator's Guide 10g Release 1 (10.1.8) Part Number B32259-01 |
|
|
PDF · Mobi · ePub |
The crawler uses a set of codes to indicate the result of the crawled URL. Besides the standard HTTP status code, it uses its own code for non-HTTP related situations.
Only URLs with status 200 will be indexed. If the record exists in EQ$URL
but the status is something other than 200, then the crawler encountered an error trying to fetch the document. A status of less than 600 maps directly to the HTTP status code.
The following table lists the URL status codes, document container codes used by the crawler plug-in, and EQG codes.
Code | Description | Document Container Code | EQG Codes |
---|---|---|---|
200 | URL OK | STATUS_OK_FOR_INDEX |
N/A |
400 | Bad request | STATUS_BAD_REQUEST |
30009 |
401 | Authorization required | STATUS_AUTH_REQUIRED |
30007 |
402 | Payment required | 30011 | |
403 | Access forbidden | STATUS_ACCESS_FORBIDDEN |
30010 |
404 | Not found | STATUS_NOTFOUND |
30008 |
405 | Method not allowed | 30012 | |
406 | Not acceptable | 30013 | |
407 | Proxy authentication required | STATUS_PROXY_REQUIRED |
30014 |
408 | Request timeout | STATUS_REQUEST_TIMEOUT |
30015 |
409 | Conflict | 30016 | |
410 | Gone | 30017 | |
414 | Request URI too large | 30066 | |
500 | Internal server error | STATUS_SERVER_ERROR |
10018 |
501 | Not implemented | 10019 | |
502 | Bad gateway | STATUS_BAD_GATEWAY |
10020 |
503 | Service unavailable | STATUS_FETCH_ERROR |
10021 |
504 | Gateway timeout | 10022 | |
505 | HTTP version not supported | 10023 | |
902 | Timeout reading document | STATUS_READ_TIMEOUT |
30057 |
903 | Filtering failed | STATUS_FILTER_ERROR |
30065 |
904 | Out of memory error | STATUS_OUT_OF_MEMORY |
30003 |
905 | IOEXCEPTION in processing URL | STATUS_IO_EXCEPTION |
30002 |
906 | Connection refused | STATUS_CONNECTION_REFUSED |
30025 |
907 | Socket bind exception | 30079 | |
908 | Filter not available | 30081 | |
909 | Duplicate document detected | 30082 | |
910 | Duplicate document ignored | STATUS_DUPLICATE_DOC |
30083 |
911 | Empty document | STATUS_EMPTY_DOC |
30106 |
951 | URL not indexed (this can happen if robots.txt specifies that a certain document should not be indexed) |
STATUS_OK_BUT_NO_INDEX |
N/A |
952 | URL crawled | STATUS_OK_CRAWLED |
N/A |
953 | Metatag redirection | N/A | |
954 | HTTP redirection | 30000 | |
955 | Black list URL | N/A | |
956 | URL is not unique | 31017 | |
957 | Sentry URL (URL as a place holder) | N/A | |
958 | Document read error | STATUS_CANNOT_READ |
30173 |
959 | Form login failed | STATUS_LOGIN_FAILED |
30183 |
960 | Document size too big, ignored | STATUS_DOC_SIZE_TOO_BIG |
30209 |
962 | Document was excluded based on mime type | STATUS_DOC_MIME_TYPE_EXCLUDED |
30041 |
964 | Document was excluded based on boundary rules | STATUS_DOC_BOUNDARY_RULE_EXCLUDED |
30258 |
1001 | Datatype is not TEXT/HTML | 30001 | |
1002 | Broken network data stream | 30004 | |
1003 | HTTP redirect location does not exist | 30005 | |
1004 | Bad relative URL | 30006 | |
1005 | HTTP error | 30024 | |
1006 | Error parsing HTTP header | 30058 | |
1007 | Invalid URL table column name | 30067 | |
1009 | Binary document reported as text document | 30126 | |
1010 | Invalid display URL | 30112 | |
1011 | Invalid XML from OracleAS Portal | PORTAL_XMLURL_FAIL |
31011 |
1020-1024 | URL is not reachable. The status starts at 1020, and it increases by one with each try. After five tries (if it reaches 1025), the URL is deleted. | N/A | |
1111 | URL remained in the queue even after a successful crawl. This indicates that the crawler had a problem processing this document. You could investigate the URL by crawling it in a separate source to check for errors in the crawler log. | N/A |