Ruby HTTPClient教程 · ZetCode 中文系列教程

# Ruby `HTTPClient`教程 > 原文： [https://zetcode.com/web/rubyhttpclient/](https://zetcode.com/web/rubyhttpclient/) 在本教程中，我们展示了如何使用 Ruby `HTTPClient`模块。我们获取数据，发布数据，使用 cookie 并连接到安全的网页。 ZetCode 也有一个简洁的 [Ruby 教程](/lang/rubytutorial/)。超文本传输协议（HTTP）是用于分布式协作超媒体信息系统的应用协议。 HTTP 是万维网数据通信的基础。 Ruby `HTTPClient`提供了用于通过 HTTP 访问 Web 资源的方法。它提供了 Ruby 中的`libwww-perl`（LWP）的功能。（有关 Perl LWP 的信息，请参见 ZetCode 的[文章](/articles/perllwp/)。）该宝石是由中村博史创建的。 ```ruby $ sudo gem install httpclient ``` 该模块通过`sudo gem install httpclient`命令安装。 ```ruby $ service nginx status * nginx is running ``` 我们在本地主机上运行 nginx Web 服务器。我们的一些示例将连接到本地运行的 nginx 服务器上的 PHP 脚本。 ## 版本第一个程序输出库和 Ruby 语言的版本。 `version.rb` ```ruby #!/usr/bin/ruby require 'httpclient' puts HTTPClient::LIB_NAME puts HTTPClient::RUBY_VERSION_STRING puts HTTPClient::VERSION ``` 这三个常数提供了库和 Ruby 版本号。 ```ruby $ ./version.rb (2.8.0, ruby 1.9.3 (2013-11-22)) ruby 1.9.3 (2013-11-22) 2.8.0 ``` 这是示例的示例输出。 ## ·get_content·函数 `get_content`是一种用于获取由给定 URL 标识的文档的高级方法。 `get_content.rb` ```ruby #!/usr/bin/ruby require 'httpclient' client = HTTPClient.new cont = client.get_content 'http://www.something.com' puts cont ``` 该脚本获取`www.something.com`网页的内容。 ```ruby cont = client.get_content 'http://www.something.com' ``` `get_content`方法将结果作为一个字符串返回。 ```ruby $ ./get_content.rb <html><head><title>Something.</title></head> <body>Something.</body> </html> ``` 这是`get_content.rb`脚本的输出。以下程序获取一个小型网页，并剥离其 HTML 标签。 `strip_tags.rb` ```ruby #!/usr/bin/ruby require 'httpclient' client = HTTPClient.new client.get_content('http://www.something.com') do |chunk| puts chunk.gsub(%r{</?[^>]+?>}, '') end ``` 该脚本会剥离`www.something.com`网页的 HTML 标签。 ```ruby client.get_content('http://www.something.com') do |chunk| puts chunk.gsub(%r{</?[^>]+?>}, '') end ``` 一个简单的正则表达式用于剥离 HTML 标记。在这种情况下，`get_content`方法以字符串块的形式返回内容。 ```ruby $ ./strip_tags.rb Something. Something. ``` 该脚本将打印网页的标题和内容。 ## 请求 HTTP 请求是从客户端发送到浏览器的消息，以检索某些信息或采取某些措施。 `HTTPClient`的`request`方法创建一个新请求。请注意，`HTTPClient`类具有诸如`get`，`post`或`put`之类的方法，这些方法为我们节省了一些输入。 `create_request.rb` ```ruby #!/usr/bin/ruby require 'httpclient' client = HTTPClient.new method = 'GET' url = URI.parse 'http://www.something.com' res = client.request method, url puts res.body ``` 该示例创建一个 GET 请求并将其发送到`http://www.something.com`。 ```ruby method = 'GET' url = URI.parse 'http://www.something.com' ``` 我们创建一个请求方法和 URL。 ```ruby res = client.request method, url ``` 使用`request`方法发出请求。 ```ruby puts res.body ``` 消息响应的`body`属性包含消息的正文。 ```ruby $ ./create_request.rb <html><head><title>Something.</title></head> <body>Something.</body> </html> ``` 这是示例的输出。 ## 状态 `HTTP::Message`表示 HTTP 请求或响应。其`status`方法返回响应的 HTTP 状态代码。 `status.rb` ```ruby #!/usr/bin/ruby require 'httpclient' client = HTTPClient.new res = client.get 'http://www.something.com' puts res.status puts HTTP::Status::successful? res.status res = client.get 'http://www.something.com/news/' puts res.status puts HTTP::Status::successful? res.status res = client.get 'http://www.urbandicionary.com/define.php?term=Dog' puts res.status puts HTTP::Status::successful? res.status ``` 我们使用`get`方法执行三个 HTTP 请求，并检查返回的状态。 ```ruby puts HTTP::Status::successful? res.status ``` `HTTP::Status::successful?`方法指示状态代码是否成功。 ```ruby $ ./status.rb 200 true 404 false 302 false ``` 200 是对成功的 HTTP 请求的标准响应，404 指示找不到请求的资源，302 指示该资源已临时重定向。 ## `head`方法 `head`方法检索文档标题。标头由字段组成，包括日期，服务器，内容类型或上次修改时间。 `head.rb` ```ruby #!/usr/bin/ruby require 'httpclient' client = HTTPClient.new res = client.head 'http://www.something.com' puts "Server: " + res.header['Server'][0] puts "Last modified: " + res.header['Last-Modified'][0] puts "Content type: " + res.header['Content-Type'][0] puts "Content length: " + res.header['Content-Length'][0] ``` 该示例打印服务器，`www.something.com`网页的上次修改时间，内容类型和内容长度。 ```ruby $ ./head.rb Server: Apache/2.4.12 (FreeBSD) OpenSSL/1.0.1l-freebsd mod_fastcgi/mod_fastcgi-SNAP-0910052141 Last modified: Mon, 25 Oct 1999 15:36:02 GMT Content type: text/html Content length: 77 ``` 这是`head.rb`程序的输出。 ## `get`方法 `get`方法向服务器发出 GET 请求。 GET 方法请求指定资源的表示形式。 `greet.php` ```ruby <?php echo "Hello " . htmlspecialchars($_GET['name']); ?> ``` 在`/usr/share/nginx/html/`目录中，有此`greet.php`文件。该脚本返回`name`变量的值，该值是从客户端检索到的。 `htmlspecialchars()`函数将特殊字符转换为 HTML 实体；例如`&`。 `mget.rb` ```ruby #!/usr/bin/ruby require 'httpclient' client = HTTPClient.new res = client.get 'http://localhost/greet.php?name=Jan' puts res.body ``` 该脚本将带有值的变量发送到服务器上的 PHP 脚本。该变量直接在 URL 中指定。 ```ruby $ ./mget.rb Hello Jan ``` This is the output of the example. ```ruby $ tail -1 /var/log/nginx/access.log 127.0.0.1 - - [08/May/2016:13:15:31 +0200] "GET /greet.php?name=Jan HTTP/1.1" 200 19 "-" "HTTPClient/1.0 (2.8.0, ruby 1.9.3 (2013-11-22))" ``` 我们检查了 nginx 访问日志。 `get`方法采用第二个参数，我们可以在其中指定查询参数。 `mget2.rb` ```ruby #!/usr/bin/ruby require 'httpclient' client = HTTPClient.new query = {'name' => 'Jan'} res = client.get 'http://localhost/greet.php', query puts res.body ``` 该示例与上一个示例基本相同。 ```ruby $ ./mget2.rb Hello Jan ``` This is the output of the example. ## 重定向重定向是将一个 URL 转发到另一个 URL 的过程。 HTTP 响应状态代码 301“永久移动”用于永久 URL 重定向。 ```ruby location = /oldpage.html { return 301 /files/newpage.html; } ``` 将这些行添加到位于 Debian 上`/etc/nginx/sites-available/default`的 Nginx 配置文件中。 ```ruby $ sudo service nginx restart ``` 编辑完文件后，我们必须重新启动 nginx 才能应用更改。 `newpage.html` ```ruby <!DOCTYPE html> <html> <head> <title>New page</title> </head> <body> This is a new page </body> </html> ``` 这是位于 nginx 文档根目录中的`newpage.html`文件。 `redirect.rb` ```ruby #!/usr/bin/ruby require 'httpclient' client = HTTPClient.new res = client.get 'http://localhost/oldpage.html', :follow_redirect => true puts res.body ``` 该脚本访问旧页面并遵循重定向。 ```ruby res = client.get 'http://localhost/oldpage.html', :follow_redirect => true ``` `:follow_redirect`选项用于遵循重定向。 ```ruby $ ./redirect.rb <!DOCTYPE html> <html> <head> <title>New page</title> </head> <body> This is a new page </body> </html> ``` This is the output of the example. ```ruby $ tail -2 /var/log/nginx/access.log 127.0.0.1 - - [09/May/2016:14:08:50 +0200] "GET /oldpage.html HTTP/1.1" 301 193 "-" "HTTPClient/1.0 (2.8.0, ruby 1.9.3 (2013-11-22))" 127.0.0.1 - - [09/May/2016:14:08:50 +0200] "GET /files/newpage.html HTTP/1.1" 200 113 "-" "HTTPClient/1.0 (2.8.0, ruby 1.9.3 (2013-11-22))" ``` 从`access.log`文件中可以看到，该请求已重定向到新的文件名。通信包含两个 GET 消息。 ## 用户代理在本节中，我们指定用户代理的名称。 `agent.php` ```ruby <?php echo $_SERVER['HTTP_USER_AGENT']; ?> ``` 在 nginx 文档根目录中，我们有这个简单的 PHP 文件。它返回用户代理的名称。 `agent.rb` ```ruby #!/usr/bin/ruby require 'httpclient' client = HTTPClient.new default_header: {"User-Agent" => "Ruby script"} res = client.get 'http://localhost/agent.php' puts res.body ``` 该脚本为`agent.php`脚本创建一个简单的 GET 请求。 ```ruby client = HTTPClient.new default_header: {"User-Agent" => "Ruby script"} ``` 在`HTTPClient`的构造器中，我们指定用户代理。 ```ruby $ ./agent.rb Ruby script ``` 服务器使用我们随请求发送的代理名称进行了响应。 ## `post`方法 `post`方法在给定的 URL 上调度 POST 请求，为填写的表单内容提供键/值对。 `target.php` ```ruby <?php echo "Hello " . htmlspecialchars($_POST['name']); ?> ``` 在本地 Web 服务器上，我们有此`target.php`文件。它只是将过帐的值打印回客户。 `post_value.rb` ```ruby #!/usr/bin/ruby require 'httpclient' client = HTTPClient.new query = {"name" => "Jan"} res = client.post 'http://localhost/target.php', query puts res.body ``` 脚本使用具有`Jan`值的`name`键发送请求。 POST 请求通过`post`方法发出。 ```ruby $ ./mpost.rb Hello Jan ``` 这是`mpost.rb`脚本的输出。 ```ruby $ tail -1 /var/log/nginx/access.log 127.0.0.1 - - [08/May/2016:13:38:57 +0200] "POST /target.php HTTP/1.1" 200 19 "-" "HTTPClient/1.0 (2.8.0, ruby 1.9.3 (2013-11-22))" ``` 使用 POST 方法时，不会在请求 URL 中发送该值。 ## 从字典中检索定义在以下示例中，我们在 [www.dictionary.com](http://www.dictionary.com) 上找到术语的定义。要解析 HTML，我们使用`nokogiri`。可以使用`sudo gem install nokogiri`命令安装。 `get_term.rb` ```ruby #!/usr/bin/ruby require 'httpclient' require 'nokogiri' client = HTTPClient.new term = 'dog' res = client.get 'http://www.dictionary.com/browse/'+term doc = Nokogiri::HTML res.body doc.css("div.def-content").map do |node| puts node.text.strip!.gsub(/\s{3,}/, " ") end ``` 在此脚本中，我们在`www.dictionary.com`上找到了术语狗的定义。 `Nokogiri::HTML`用于解析 HTML 代码。 ```ruby res = client.get 'http://www.dictionary.com/browse/'+term ``` 为了执行搜索，我们在 URL 的末尾附加了该词。 ```ruby doc = Nokogiri::HTML res.body doc.css("div.def-content").map do |node| puts node.text.strip!.gsub(/\s{3,}/, " ") end ``` 我们使用`Nokogiri::HTML`类解析内容。定义位于`<div class="def-content">`标签内。我们通过删除过多的空白来改善格式。 ## cookie HTTP cookie 是从网站发送并在用户浏览时存储在用户的 Web 浏览器或程序数据子文件夹中的一小段数据。当用户访问网页时，浏览器/程序会将 cookie 发送回服务器，以通知用户先前的活动。 Cookies 的有效期限为有效期。接收到 HTTP 请求时，服务器可以发送带有响应的`Set-Cookie`标头。之后，cookie 值将以 Cookie HTTP 标头的形式与对同一服务器的所有请求一起发送。 `cookies.php` ```ruby <?php $theme = $_COOKIE['theme']; if (isset($theme)) { echo "Your theme is $theme"; } else { echo "You are using default theme"; setcookie('theme', 'black-and-white', time() + (86400 * 7)); } ?> ``` 该 PHP 文件读取 cookie。如果 cookie 不存在，则会创建它。 cookie 存储用户的主题。 `send_cookie.rb` ```ruby #!/usr/bin/ruby require 'httpclient' url = URI.parse "http://localhost/cookies.php" cookie = WebAgent::Cookie.new cookie.name = "theme" cookie.value = "green-and-black" cookie.url = url client = HTTPClient.new client.cookie_manager.add cookie res = client.get url puts res.body ``` 我们创建一个自定义 cookie，并将其发送到`cookies.php`页面。 ```ruby cookie = WebAgent::Cookie.new cookie.name = "theme" cookie.value = "green-and-black" cookie.url = url ``` 使用`WebAgent::Cookie`类创建一个 cookie。 ```ruby client = HTTPClient.new client.cookie_manager.add cookie ``` cookie 被添加到 cookie 管理器中。 ```ruby $ ./send_cookie.rb Your theme is green-and-black ``` This is the output of the example. 接下来，我们将读取 cookie 并将其本地存储在文件中。 `read_cookie.rb` ```ruby #!/usr/bin/ruby require 'httpclient' client = HTTPClient.new res = client.get 'http://localhost/cookies.php' client.set_cookie_store 'cookie.dat' p res.header["Set-Cookie"] client.save_cookie_store ``` 该脚本从 PHP 文件读取 cookie，并将其本地存储在`cookie.dat`文件中。最后，我们读取存储的 cookie，并将其发送到同一 PHP 文件。 `send_cookie2.rb` ```ruby #!/usr/bin/ruby require 'httpclient' client = HTTPClient.new cm = HTTPClient::CookieManager.new 'cookie.dat' cm.load_cookies client.cookie_manager = cm res = client.get 'http://localhost/cookies.php' p res.body ``` `HTTPClient::CookieManager`用于读取 cookie。 ```ruby $ ./send_cookie.rb Unknown key: Max-Age = 604800 "You are using default theme" $ ./read_cookie.rb Unknown key: Max-Age = 604800 ["theme=black-and-white; expires=Sun, 15-May-2016 16:00:08 GMT; Max-Age=604800"] $ ./send_cookie.rb "Your theme is black-and-white" ``` 我们运行脚本。作者认为，警告消息[应忽略](https://github.com/nahi/httpclient/issues/242)。 ## 证书客户端的`set_auth`方法设置用于领域的名称和密码。安全领域是一种用于保护 Web 应用资源的机制。 ```ruby $ sudo apt-get install apache2-utils $ sudo htpasswd -c /etc/nginx/.htpasswd user7 New password: Re-type new password: Adding password for user user7 ``` 我们使用`htpasswd`工具创建用于基本 HTTP 认证的用户名和密码。 ```ruby location /secure { auth_basic "Restricted Area"; auth_basic_user_file /etc/nginx/.htpasswd; } ``` 在 nginx `/etc/nginx/sites-available/default`配置文件中，我们创建一个安全页面。领域的名称是`"Restricted Area"`。 `index.html` ```ruby <!DOCTYPE html> <html lang="en"> <head> <title>Secure page</title> </head> <body> This is a secure page. </body> </html> ``` 在`/usr/share/nginx/html/secure`目录中，我们有这个 HTML 文件。 `credentials.rb` ```ruby #!/usr/bin/ruby require 'httpclient' user = 'user7' passwd = '7user' client = HTTPClient.new client.set_auth 'http://localhost/secure/', user, passwd cont = client.get_content 'http://localhost/secure/' puts cont ``` 该脚本连接到安全网页；它提供访问该页面所需的用户名和密码。 ```ruby $ ./credentials.rb <!DOCTYPE html> <html lang="en"> <head> <title>Secure page</title> </head> <body> This is a secure page. </body> </html> ``` 使用正确的凭据，`credentials.rb`脚本返回受保护的页面。在本教程中，我们使用了 Ruby `HTTPClient`模块。在 ZetCode 上有类似的 [Ruby Faraday 教程](/web/rubyfaraday/)和 [Ruby `Net::HTTP`教程](/web/rubynethttp/)。