Scripts/HTTP Get
HTTP Get given an URL downloads the resource from the Internet. Supports basic conventions of HTTP protocols, but no redirection.
Features
- Based on Pump utility
- Parses URL to verify optional protocol, identify port and path
- Supports "simple request" of HTTP/0.9
- Configurable request headers for HTTP/1.0 and HTTP/1.1
- Trims HTTP/1.1 header
- Processes "Transfer-Encoding: chunked" of HTTP/1.1
- Detailed but compact file logging
- Junk code cleanup using aliases
HTTP/0.9 is generally faster, but some web servers require HTTP/1.0 or greater.
Examples
This gets a set of stock quotes from Yahoo! Financial services.
load '~user/httpget.ijs' httpget 'ichart.finance.yahoo.com/table.csv?s=^DJI&a=07&b=7&c=2006&d=07&e=11&f=2006' Date,Open,High,Low,Close,Volume,Adj. Close* 11-Aug-06,11103.55,11121.40,11042.88,11088.02,2004540032,11088.02 10-Aug-06,11073.14,11176.47,10998.06,11124.37,2402190080,11124.37 9-Aug-06,11168.47,11296.22,11044.64,11076.18,2555180032,11076.18 8-Aug-06,11218.18,11319.51,11117.80,11173.59,2457840128,11173.59 7-Aug-06,11239.47,11294.14,11143.02,11219.38,2045660032,11219.38
This gets a URL from a web server with HTTP/1.0 protocol
Ver_phttpget_=: '1.0' httpget 'http://minutewar.gpsgames.org:80/Game032/board.htm' <HTML><HEAD><META HTTP-EQUIV="Expires" CONTENT="Thu, 01 Dec 1990 12:00:00 GMT">... ,,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,... 0,,W,W,W,W,W,W,W,W,W,,E,E,E,E,E,E,E,E,E,E,E,E,E,E,N,N,"N1 tom arneson 2006/08/1... 1,,W,W,W,W,W,W,W,W,W,,E,E,E,E,E,E,E,E,E,E,E,E,E,E,N,N,N,N,N,W,"W1 harleydavidso... ...
Logging
Logging is done to a file given at Log, by default ~temp/httpget-log.txt, or could be 2 to log directly into session. Logging is useful both for development and usage, as it allows to observe the response and tune the configuration, esp. Timeout and Attempts.
Here is a sample Log output with interpretation.
Timeout_phttpget_=: 50 NB. to demo timeout attempts httpget 'www.jsoftware.com:80/cgi-bin/fortune.cgi' www.jsoftware.com 80 /cgi-bin/fortune.cgi -- request: host port path 26->260 -- sent 26 bytes to socket 260 260<-... -- reading from socket 260 . -- timeout attempts . . . 18 -- 18 byte block received 62 21 read oef -- exit condition -|260 101 -- end reading socket 260, total 101 bytes
Timeout
Timeout and Attempts control overrall waiting for response, with default 2 sec timeout and 10 attempts for a total of 20 sec. Waiting is a blocking operation, so attempts allow to shorten timeout for interrupt agility.
Normally, for HTTP/1.0 and 1.1 you would count length or chunk sizes to close connection. This is not implemeted so far. So for HTTP/1.0 omitting "Connection: Keep-Alive" seems to make the server close the connection. Typically HTTP/1.1 is more persistent, so lowering Attempts count helps minimize end of request waiting time.
In log, exit condition read eof means server closed connection, while read timeout means we timed out.
Interrupting Request
Long running or unwanted request can be easily interrupted with J break utility. Due to short timeout and adequate number of attempts, the interrupt responsiveness is very good.
POST requests
Request is automatically made into POST method, when x argument is provided, which is passed as the body of the request.
Note: the GET parameters p1=v1&p2=v2... simply become the body of POST.
Get a search from J Forums
q=. 'all=httpget&exa=&one=&exc=&add=&sub=&fid=&tim=0&rng=0&dbgn=1&mbgn=1&ybgn=1998&dend=31&mend=12¥d=2007' q httpget 'http://www.jsoftware.com/cgi-bin/forumsearch.cgi'
Get an item from RecentChanges
'action=rss_rc&ddiffs=1&unique=1&items=1' httpget 'http://www.jsoftware.com/jwiki/RecentChanges'
POST method is used to perform a SOAP Call from J.
See Also
- HTTP/1.0 - RFC 1945 for "simple request"
- HTTP/1.1 - RFC 2068 for discussion of "chunked"
- JForum:programming/2006-August/002953 reading a file from the web, improved webget from "Sockets and the Internet" lab by David Mitchell
- JForum:general/2005-August/024038 Socket trouble on slow connection, original geturl by John Randall
- JForum:programming/2005-December/000510 Web connectivity in J, improved geturl
- web/gethttp - JAL addon for retrieving URL contents from the Internet using Wget or cURL
Contributed by Oleg Kobchenko