Next: Security Considerations, Up: Appendices [Contents][Index]
Wget¤Ë¡¤¿Ê¹ÔÃæ¤ËÍøÍѲÄǽ¤Ê¤¹¤Ù¤Æ¤Î¥Ç¡¼¥¿¤òµÛ¤¤¾å¤²¤Ê¤¬¤é¡¤¥¦¥§¥Ö¥µ¥¤¥È Ãæ¤ò¤¢¤Æ¤â¤Ê¤¯Ê⤤ޤï¤é¤»¤ë¤³¤È¤ÏÈó¾ï¤Ë´Êñ¤Ç¤¹¡¥‘wget -r site’¤È¤½¤ÎÀßÄê¤Ç¤¹¡¥¤¹¤Ð¤é¤·¤¤¤Ç¤·¤ç¤¦¤«¡©¥µ¡¼¥Ð´ÉÍý¼Ô¤Ë¤È¤Ã¤Æ¤Ï ¤½¤¦¤Ç¤Ï¤¢¤ê¤Þ¤»¤ó¡¥
Wget¤¬ÀÅŪ¤Ê¥Ú¡¼¥¸¤ò²ó¼ý¤·¤Æ¤¤¤ë¸Â¤ê¡¤¤½¤·¤ÆÅ¬Àڤʥ졼¥È¤Ç¹Ô¤Ê¤ï¤ì¤Æ¤¤
¤ë¸Â¤ê(‘--wait’¥ª¥×¥·¥ç¥ó¤ò»²¾È¤·¤Æ¤¯¤À¤µ¤¤)¡¤Í¾¤êÌäÂꤢ¤ê¤Þ¤»¤ó¡¥
ÌäÂê¤Ï¡¤Wget¤¬ÀÅŪ¤Ê¥Ú¡¼¥¸¤È¤Û¤È¤ó¤É¤ÎCGI¤òÍ׵᤹¤ë¥Ú¡¼¥¸¤Î´Ö¤Î°ã¤¤¤òÅÁ
¤¨¤ë¤³¤È¤¬¤Ç¤¤Ê¤¤¤³¤È¤Ë¤¢¤ê¤Þ¤¹¡¥Info¥Õ¥¡¥¤¥ë¤òHTML¤ËÊÑ´¹¤¹¤ëCGI
Perl¥¹¥¯¥ê¥×¥È¤Ç½èÍý¤µ¤»¤ë¥»¥¯¥·¥ç¥ó¤ò»ý¤Ä¥µ¥¤¥È¤òÃΤäƤ¤¤Þ¤¹¡¥¥¹¥¯¥ê
¥×¥È¤ÏÃÙ¤¤¤Î¤Ç¤¹¤¬¡¤¿Í´Ö¤Î¥æ¡¼¥¶¤¬Í½È÷¤ÎInfo¥Õ¥¡¥¤¥ë¤ò±ÜÍ÷¤¹¤ë¤Î¤Ë½½Ê¬
¤¦¤Þ¤¯Æ°ºî¤·¤Þ¤¹¡¥¤·¤«¤·¡¤Wget¤ÇºÆµ¢¥À¥¦¥ó¥í¡¼¥É¤·¤Æ¤¤¤ë¿Í¤ÎÃæ¤Ë¡¤¥¹¥¯
¥ê¥×¥ÈÁ´ÂΤΤ¹¤Ù¤Æ¤ÎInfo¥Õ¥¡¥¤¥ë¤Ø¤Î¥ê¥ó¥¯¤¬¤¢¤ëº÷°ú¥Ú¡¼¥¸¤Ç¤Ä¤Þ¤Å¤¯¿Í
¤¬¤¤¤ë¤È¤¡¤¤½¤Î¥·¥¹¥Æ¥à¤Ï¥æ¡¼¥¶¤¬Ìò¤ËΩ¤Ä¤â¤Î¤òÄ󶡤¹¤ë¤³¤È̵¤¯¤ä¤Ã¤Æ
¤¤Þ¤¹(Info¥Õ¥¡¥¤¥ë¤òÊÑ´¹¤¹¤ë¤³¤Î½èÍý¤Ï¥í¡¼¥«¥ë¤Ë¹Ô¤Ê¤ï¤ì¤ë¤Ù¤¤Ç¡¤¥¤¥ó
¥¹¥È¡¼¥ë¤µ¤ì¤Æ¤¤¤ë¤¹¤Ù¤Æ¤ÎGNU¤Î¥½¥Õ¥È¥¦¥§¥¢¤ËÂФ¹¤ëInfo¥É¥¥å¥á¥ó¥È¤Ï
info
¥³¥Þ¥ó¥É¤ÇÍøÍѲÄǽ¤Ë¤¹¤Ù¤¤Ç¤¹)¡¥
¤³¤Î¤è¤¦¤ÊÌäÂê¤òÈò¤±¤ë¤¿¤á¡¤¤¦¤Þ¤¯Æ°ºî¤¹¤ë¥í¥Ü¥Ã¥È¤«¤éÊݸ¤ëɬÍפ¬¤¢ ¤ë¥É¥¥å¥á¥ó¥È¤Î¥×¥é¥¤¥Ð¥·¡¼¤òÊݸ¤ë¤³¤È¤ÈƱÍͤˡ¤¥í¥Ü¥Ã¥ÈµñÈÝ (robot exclusion)¤Î³µÇ°¤¬Æ³Æþ¤µ¤ì¤Þ¤·¤¿¡¥¤½¤Î¹Í¤¨Êý¤È¤Ï¡¤¥µ¡¼¥Ð¤Î´ÉÍý ¼Ô¤È¥É¥¥å¥á¥ó¥È¤ÎÃø¼Ô¤¬¡¤¥í¥Ü¥Ã¥È¤«¤éÊݸ¤¿¤¤¡¤¤½¤·¤Æ¥¢¥¯¥»¥¹¤òµö²Ä ¤·¤¿¤¤¥µ¥¤¥È¤ÎÉôʬ¤ò»ØÄꤹ¤ë¤³¤È¤ò²Äǽ¤Ë¤¹¤ë¤³¤È¤Ç¤¹¡¥
ºÇ¤â¿Íµ¤¤Î¤¢¤ë¥á¥«¥Ë¥º¥à¤Ç¡¤¤¹¤Ù¤Æ¤Î¼çÍפʥí¥Ü¥Ã¥È¤¬¥µ¥Ý¡¼¥È¤¹¤ë¥Ç¥Õ¥¡ ¥¯¥È¥¹¥¿¥ó¥À¡¼¥É¤Ï¡¤“Robots Exclusion Standard” (RES)¤ÇMartijn Koster¾¤Ë¤è¤Ã¤Æ1994ǯ¤Ë½ñ¤«¤ì¤Þ¤·¤¿¡¥¤½¤ì¤Ï¡¤¥í¥Ü¥Ã¥È¤ËÈò¤±¤ÆÍߤ·¤¤ URL¥Ñ¥¹¤ò»Ø¼¨¤¹¤ëÌ¿Îá¤ò´Þ¤à¥Æ¥¥¹¥È¥Õ¥¡¥¤¥ë¤Î½ñ¼°¤ò»ØÄꤷ¤Æ¤¤¤Þ¤¹¡¥¥í¥Ü¥Ã ¥È¤¬¸«¤Ä¤±¤ë¤è¤¦¤Ë¡¤¥í¥Ü¥Ã¥È¤¬¥À¥¦¥ó¥í¡¼¥É¤·²òÀϤ¹¤ë¤È¤¤¤¦¤³¤È¤ò´üÂÔ¤· ¤Æ¡¤¤½¤Î»ØÄê¤Ï¥µ¡¼¥Ð¤Î¥ë¡¼¥È¤Î/robots.txt¤ËÇÛÃÖ¤¹¤ëɬÍפ¬¤¢¤ê¤Þ ¤¹¡¥
Wget¤Ï¡¤¸·Ì©¤Ê°ÕÌ£¤Ç¤Ï¥¦¥§¥Ö¥í¥Ü¥Ã¥È¤Ç¤Ï¤¢¤ê¤Þ¤»¤ó¤¬¡¤¸ÄÊ̤Υڡ¼¥¸¤ò¥À ¥¦¥ó¥í¡¼¥É¤¹¤ë¤¿¤á¤Ë¥æ¡¼¥¶¤¬²ðÆþ¤¹¤ë¤³¤È̵¤¯¡¤¥µ¥¤¥È¤ÎÂçȾ¤ò¥À¥¦¥ó¥í¡¼ ¥É¤¹¤ë¤³¤È¤¬²Äǽ¤Ç¤¹¡¥¤½¤Î¤¿¤á¡¤ºÆµ¢Åª¤Ê¥À¥¦¥ó¥í¡¼¥É»þ¤Ë¤ÏWget¤ÏRES¤Ë½¾ ¤¤¤Þ¤¹¡¥Î㤨¤Ð¡¤°Ê²¼¤Î¤è¤¦¤Ë¤·¤Þ¤¹¡¥
wget -r http://www.server.com/
‘www.server.com’¤ÎºÇ½é¤Î¥¤¥ó¥Ç¥¯¥Ã¥¹¤¬¥À¥¦¥ó¥í¡¼¥É¤µ¤ì¤Þ¤¹¡¥Wget ¤¬¤½¤Î¥µ¡¼¥Ð¤«¤é¥À¥¦¥ó¥í¡¼¥É¤¹¤ë¤è¤ê¿¤¯¤Î¥É¥¥å¥á¥ó¥È¤ò¸«¤Ä¤±¤¿¾ì¹ç¡¤ ‘http://www.server.com/robots.txt’¤òÍ׵ᤷ¡¤¸«¤Ä¤«¤Ã¤¿¾ì¹ç¤Ï¤½¤ì°Ê ¹ß¤Î¥À¥¦¥ó¥í¡¼¥É¤Ç¤½¤ì¤ò»ÈÍѤ·¤Þ¤¹¡¥robots.txt¤Ï¤½¤ì¤¾¤ì¤Î¥µ¡¼¥Ð ¤´¤È¤Ë°ì²ó¤Î¤ß¥í¡¼¥É¤µ¤ì¤Þ¤¹¡¥
¥Ð¡¼¥¸¥ç¥ó1.8¤Þ¤Ç¤ÎWget¤Ï¡¤Martijn Koster¤¬1994ǯ¤Ë½ñ¤¤¤¿ http://www.robotstxt.org/wc/norobots.html¤ÇÍøÍѲÄǽ¤ÊºÇ½é¤Î¥Ð¡¼¥¸¥ç ¥ó¤Îɸ½à¤ò¥µ¥Ý¡¼¥È¤·¤Æ¤¤¤Þ¤·¤¿¡¥¥Ð¡¼¥¸¥ç¥ó1.8¤Ç¤Ï¡¤Wget¤Ï¥¤¥ó¥¿¡¼¥Í¥Ã¥È ¥É¥é¥Õ¥È‘<draft-koster-robots-00.txt>’¤Î“A Method for Web Robots Control”¤È¸À¤¦¥¿¥¤¥È¥ë¤Ç»ØÄꤵ¤ì¤¿ÄɲÃÌ¿Îá¤â¥µ¥Ý¡¼¥È¤·¤Æ¤¤¤Þ¤¹¡¥»ä¤ÎÃÎ ¤ë¸Â¤êRFC¤Ë¤Ï¤Ê¤Ã¤Æ¤¤¤Þ¤»¤ó¤¬¡¤¤½¤Î¥É¥é¥Õ¥È¤Ï http://www.robotstxt.org/wc/norobots-rfc.txt¤ÇÍøÍѲÄǽ¤Ç¤¹¡¥
¤³¤Î¥Þ¥Ë¥å¥¢¥ë¤Ï¡¤¤â¤Ï¤äRobot Exclusion Standard¤ò´Þ¤ó¤Ç¤¤¤Þ¤»¤ó¡¥
ÆóÈÖÌܤˡ¤¥á¥«¥Ë¥º¥à¤ÎÃμ±¤Ï¤½¤ì¤Û¤É¤¢¤ê¤Þ¤»¤ó¤¬¡¤¸Ä¡¹¤Î¥É¥¥å¥á¥ó¥È¤Î
Ãø¼Ô¤ò¥í¥Ü¥Ã¥È¤Ç¤¿¤É¤ë¥Õ¥¡¥¤¥ë¤«¤é¥ê¥ó¥¯¤·¤¿¤¤¤«¤É¤¦¤«¤ò»ØÄꤹ¤ë¤³¤È¤â
²Äǽ¤Ç¤¹¡¥¤³¤ì¤ÏMETA
¥¿¥°¤ò°Ê²¼¤Î¤è¤¦¤Ë»ÈÍѤ·¤Þ¤¹¡¥
<meta name="robots" content="nofollow">
¤³¤ì¤Ï¡¤http://www.robotstxt.org/wc/meta-user.html¤Ç´öʬ¾ÜºÙ¤ËÀâ ÌÀ¤µ¤ì¤Æ¤Þ¤¹¡¥Wget¤Ï¡¤Ä̾ï¤Î/robots.txt¤Ø¤ÎÇÓ¾Ū¤ÊÄɲäǡ¤¥í¥Ü¥Ã ¥È½ü³°¤Î¤³¤Î¼êË¡¤ò¥µ¥Ý¡¼¥È¤·¤Æ¤¤¤Þ¤¹¡¥
¥í¥Ü¥Ã¥È¤ÎµñÈݤòËÜÅö¤Ë¡¤ËÜÅö¤Ë˾¤à¤³¤È¤¬¤É¤¦¤Ê¤ë¤«ÃΤäƤ¤¤ë¾ì¹ç¡¤
.wgetrc¤Îrobots
ÊÑ¿ô¤ò‘off’¤Ë¤·¤Æ²¼¤µ¤¤¡¥Æ±¤¸¤³¤È¤Ï¡¤
Î㤨¤Ð‘wget -e robots=off url...’¤Î¤è¤¦¤Ë¡¤-e
¥¹¥¤¥Ã¥Á
¤ÇãÀ®²Äǽ¤Ç¤¹¡¥