- Running command: /opt/funnelback/linbin/java/bin/java
- With arguments: -cp /opt/funnelback/lib/java/all/*:/opt/funnelback/lib/java/groovy:/opt/funnelback/bin/funnelback-crawler.jar -server -Xms256m -Xmx640m -Dfile.encoding=UTF-8 com.funnelback.crawler.FunnelBack /opt/funnelback/VERSION/funnelback.lic /opt/funnelback/conf/LECC_web/collection.cfg
- Logging STDOUT into /opt/funnelback/data/LECC_web/offline/log/crawl.log STDERR into /opt/funnelback/data/LECC_web/offline/log/crawl.log
- Command will not read from STDIN
- Environment: {TEMP=/tmp/1481708223539-0, LD_LIBRARY_PATH=/opt/funnelback/lib/java, TMP=/tmp/1481708223539-0, SEARCH_HOME=/opt/funnelback, java.home=/opt/funnelback/linbin/java, TMPDIR=/tmp/1481708223539-0}
- ####################################################################################################
- FunnelBack: Version: 15.6.0.0
- JVM: Java HotSpot(TM) 64-Bit Server VM 25.25-b02 (Oracle Corporation)
- Operating System: Linux 2.6.32-642.6.2.el6.x86_64 (amd64)
- Encoding: UTF-8
- FunnelBack: Started at: Wed Dec 14 20:37:04 EST 2016
- FunnelBack: License verified.
- FunnelBack: Overall Crawl Timeout: 86400000 (ms)
- Funnelback: Using pre-crawl authentication.
- FunnelBack: Processing forms based on: /opt/funnelback/conf/LECC_web/form_interaction.cfg
- FunnelBack: Loaded cookie(s) from form, forcing use of HTTPClient library for cookie support.
- FunnelBack: crawler.accept_cookies=true. Try setting to 'false' if authentication is not working.
- FunnelBack: Configured 1 authentication cookies
- HTTPClient Cookies:
- SQ_SYSTEM_SESSION=fqgjrabl1i07s4kcau1s9k0itj8ng3hftif9buhudbrpgg23p23pq63fu8kuhbh3s7s6ahvgtc4alu3oe6fbet87o7uqp03p7c353l1; path=/; domain=lecc.clients.squiz.net
- FunnelBack: Additional HTTP request header: [Cookie: SQ_SYSTEM_SESSION=fqgjrabl1i07s4kcau1s9k0itj8ng3hftif9buhudbrpgg23p23pq63fu8kuhbh3s7s6ahvgtc4alu3oe6fbet87o7uqp03p7c353l1]
- Funnelback: Warning: crawler.packages.httplib set to 'HTTPClient', which may override any explicit Cookie: HTTP header field.
- FunnelBack: File Store Limit: 5000
- MultipleRequestsFrontier: Using specified internal frontier type for deferred request queue: com.funnelback.common.frontier.DiskFIFOFrontier
- FunnelBack: Loaded: com.funnelback.crawler.NetCrawler
- FunnelBack: Loaded: com.funnelback.common.frontier.MultipleRequestsFrontier:com.funnelback.common.frontier.DiskFIFOFrontier:1000
- FunnelBack: Loaded: com.funnelback.crawler.scanner.RegExpHTMLScanner
- FunnelBack: Loaded: com.funnelback.common.store.WarcStore
- FunnelBack: Loaded: com.funnelback.crawler.StandardPolicy
- FunnelBack: Loaded: com.funnelback.common.revisit.AlwaysRevisitPolicy
- Cache: Table Initial Capacity: 10000
- Cache: LRUCache Max Size: 500000
- INFO: No portfolio information file: /opt/funnelback/conf/LECC_web/sites-by-portfolio.csv
- INFO: No seed servers information file
- CrawlStatistics: Loaded statistics classes.
- FunnelBack: Loaded caches.
- FunnelBack: Mime-types parsed [text/html,text/plain,text/xml,application/xhtml+xml,application/rss+xml,application/atom+xml,application/json,application/rdf+xml,application/xml]
- FunnelBack: Protocols accepted [http,https]
- FunnelBack: Robot agent matching [FunnelBack]
- FunnelBack: Max Size In-Memory URL Buffer Cache: 10000
- FunnelBack: Storing header information
- Funnelback: Added 2 URLs to frontier.
- FunnelBack: Control passed to coordinator.
- Coordinator: Added 2 URLs to URL Cache from start_urls_file
- Coordinator: Started 20 crawler thread(s) ...
- Coordinator: Using overall timeout.
- Monitor: Interval (secs): 30 Checkpoint Interval (secs): 1800
- Monitor: Checking Config File: /opt/funnelback/conf/LECC_web/collection.cfg
- Monitor: Printing statistics to monitor.log and crawl.log.1
- HTTPClientTimedRequest: Trust Everyone
- HTTPClientTimedRequest: Accept/send all cookies
- Coordinator: Crawler 1 signalled completion.
- Coordinator: Printing out final values to servers.log and domains.log
- Coordinator: Final Checkpoint and Totals ...
- DNSCache: Maximum cache size: 200000
- Coordinator: Finished final checkpoint.
- Timing_Avg: 2016:12:14:20:37:24 8141 1078 0 1 1 0
- Timing_Totals_(mins): 2016:12:14:20:37:24 0 0 0 0 0 0 0
- Timing: Crawler 1 Processed: 2 Stored: 2 Total Crawl Time (ms): 16781
- Timing: Local URL Processing (ms): 16283 Calls: 2 Avg: 8141
- Timing: Local HTTP GET (ms): 3234 Calls: 3 Avg: 1078
- Timing: Local Binary Storage (ms): 0 Calls: 1 Avg: 0
- Timing: Local Single Address Processing (ms): 91 Calls: 74 Avg: 1
- Timing: Local Accept Address [incl. robots.txt] (ms): 80 Calls: 42 Avg: 1
- Timing: Local Get Canonical Server (ms): 0 Calls: 1 Avg: 0
- Date: Wed Dec 14 20:37:24 EST 2016
- URLs Processed: 2
- Duplicates: 0
- HTTP Redirects: 0
- HTTP Bad Responses: 0
- Network (I/O) Errors: 0
- Robot NoFollow URLs: 0
- Threads Active: 1
- Frontiers Active: 0
- Bytes In (MB): 0
- Bytes Out (MB): 0
- Used Memory (MB): 115
- Total Memory (MB): 309
- Cache Size: 2
- Frontier Size: 0
- Total Data Stored (MB): 0
- Total Web Servers: 1
- Total URLs Downloaded: 2
- Total URLs Stored: 2
- Coordinator: Printing out crawl statistics to .stat files in log directory.
- Coordinator: Attempting to deactivate crawler threads ...
- Coordinator sleeping for 5 seconds before final shutdown ...
- Coordinator: Closing URLStore.
- Coordinator: Dumping frontier to log for analysis ...
- Coordinator: Finished dumping frontier.
- Coordinator: Finished at: Wed Dec 14 20:37:29 EST 2016
- Coordinator: Finished crawl. Deactivating threads and exiting ...
- Command finished with exit code: 0