hi! | 关于
专注于Java技术、开源项目、项目管理
今天 | RSS | RDF | Atom | 其它
 
高级搜索

标签 - 分类 : 全部 | UNIX | 随笔 | 数据库 | Java技术 | 网摘文章
Extract strings from a URL.

Text within <SCRIPT></SCRIPT> tags is removed.

Text within <STYLE></STYLE> tags is removed.

The text within <PRE></PRE> tags is not altered.

The property Strings, which is the output property is null * until a URL is set. So a typical usage is:

      StringBean sb = new StringBean ();
      sb.setLinks (false);
      sb.setReplaceNonBreakingSpaces (true);
      sb.setCollapse (true);
      sb.setURL ("http://www.netbeans.org"); // the HTTP is performed here
      String s = sb.getStrings ();
  
You can also use the StringBean as a NodeVisitor on your own parser, * in which case you have to refetch your page if you change one of the * properties because it resets the Strings property:

 

      StringBean sb = new StringBean ();
      Parser parser = new Parser ("http://cbc.ca");
//或者Parser parser = Parser.createParser("<html>...</html>","GBK");
      parser.visitAllNodesWith (sb);
      String s = sb.getStrings ();
      sb.setLinks (true);
      parser.reset ();
      parser.visitAllNodesWith (sb);
      String sl = sb.getStrings ();
  
According to Nick Burch, who contributed the patch, this is handy if you * don't want StringBean to wander off and get the content itself, either * because you already have it, it's not on a website etc.
标签 :




置评

标题
正文
HTML : b, i, blockquote, br, p, pre, a href="", ul, ol, li
姓名
电邮地址
网站
记住我 是  否 

电邮地址将不会发表在公开网页上,您留下的电邮地址仅用于本文有新评论时通知您(以后可以随时拿掉).

回接到 http://www.searchfull.net:80/blog/addTrackBack.action?entry=1191298733008