la compilation de 2 listes à priori fiable de bots permettrait d'améliorer leur détection: cf http://seenthis.net/messages/326903#message347304 pour la suggestion et http://spip.pastebin.fr/39305 pour le code de l'écran de sécu une fois cette compilation des 2 listes faite.
Problème: la regexp obtenue n'est elle pas un peu grosse?
Designs
Éléments enfants ...
Afficher les éléments fermés
Éléments liés 0
Reliez des issues pour mettre en évidence leur relation.
En savoir plus.
if (!defined('_IS_BOT'))
define('_IS_BOT',
isset($_SERVER['HTTP_USER_AGENT'])
AND preg_match(
// mots generiques
',bot|slurp|crawler|spider|webvac|yandex|'
// MSIE 6.0 est un botnet 99,9% du temps, on traite donc ce USER_AGENT comme un bot
. 'MSIE 6\.0|'
// UA plus cibles
. '200please|360spider|80legs|a6-indexer|abachobot|aboundex|aboutusbot|accoona|aciorobot|addsearchbot|addthis|adressendeutschland|adsbot-google|ahrefsbot|aihitbot|alexa|altavista|amznkassocbot|analyticsseo|antbot|arabot|archive|askpeterbot|aspseek|backlinkcrawler|baidu|baiduspider|begunadvertising|bingbot|bingpreview|bitlybot|bixocrawler|blekkobot|blexbot|bloglines|brainbrubot|browsershots|bubing|bufferbot|butterfly|careerbot|catchbot|ccbot|changedetection|charlotte|chilkat|china|claritybot|classbot|cliqzbot|coccoc|cococrawler|compspybot|crawler|crawler4j|crowsnest|crystalsemanticsbot|dataminr|daumoa|dlweb|dotbot|dumbot|easouspider|ec2linkfinder|estyle|exabot|ezooms|facebookexternalhit|facebookplatform|fairshare|fast-webcrawler|feedfetcher|feedfetcher-google|feedly|feedlybot|fetch|figleafbot|flipboardproxy|fyberspider|genieo|geonabot|gigabot|google|googlebot|grapeshot|hatena-useragent|head|hosttracker|hubspot|ia_archiver|icc-crawler|ichiro|idbot|iltrovatore-setaccio|immediatenet|ina|infegyatlas|infohelfer|instapaper|ixebot|jabse|james|java|jikespider|jyxobot|kumkie|linkdex|linkfluence|linkwalker|litefinder|loadimpactpageanalyzer|luminate|lycos|lycosa|magpie-crawler|meanpathbot|mediapartners-google|metageneratorcrawler|metajobbot|mj12bot|mojeekbot|msai|msnbot|msnbot-media|msrbot|musobot|najdi|nalezenczbot|nekstbot|netcraftsurveyagent|netestate|netseer|nuhk|obot|omgilibot|openwebspider|panscient|parsijoo|plukkie|proximic|psbot|qihoobot|qirina|qualidator|queryseekerspider|rambler|readability|rogerbot|ru_bot|sbsearch|scooter|scrapy|scrubby|scrubbybloglines|searchbot|searchmetricsbot|semrushbot|seocheckbot|seoengworldbot|seokicks-robot|seznambot|shareaholic|shopwiki|showyoubot|sistrix|sitechecker|siteexplorer|slurp|socialbm_bot|sogou|sosoimagespider|sosospider|spbot|special_archiver|speedy|spider|spiderling|spiderman|spinn3r|spreadtrum|steeler|subscriber|suggybot|suma|superdownloads|surveybot|svenska-webbsido|teoma|thumbshots|tineye|trendiction|turnitinbot|tweetedtimes|tweetmeme|twitterbot|uaslinkchecker|umbot|undrip|unisterbot|unwindfetchor|urlappendbot|vedma|vkshare|vm|voilabot|wbsearchbot|wch|web|webalta|webcookies|webthumbnail|wesee|wise-guys|woko|woobot|woriobot|wotbox|y!j-bri|y!j-bro|y!j-brw|y!j-bsc|yacybot|yahoo|yahoo!|yahooysmcm|yandexbot|yats|yeti|yioopbot|yodaobot|youdaobot|zb-1|zeerch|zing-bottabot|zumbot'
. ',i',(string) $_SERVER['HTTP_USER_AGENT'])
);
Version cible mise à 3.2Version cible mise à 3.2Version cible mise à 3.2Version cible mise à 3.2Version cible mise à 3.2Version cible mise à 3.2Version cible mise à 3.2Version cible mise à 3.2
Version mise à jour en enlevant les ',bot|slurp|crawler|spider|webvac|yandex|' de la fin de regexp : http://spip.pastebin.fr/52828
if (!defined('_IS_BOT'))
define('_IS_BOT',
isset($_SERVER['HTTP_USER_AGENT'])
AND preg_match(
// mots generiques
',bot|slurp|crawler|spider|webvac|yandex|'
// MSIE 6.0 est un botnet 99,9% du temps, on traite donc ce USER_AGENT comme un bot
. 'MSIE 6\.0|'
// UA plus cibles
. '200please|80legs|a6-indexer|aboundex|accoona|addthis|adressendeutschland|alexa|altavista|analyticsseo|archive|aspseek|baidu|begunadvertising|bingpreview|bloglines|browsershots|bubing|butterfly|changedetection|charlotte|chilkat|china||coccoc|crowsnest|dataminr|daumoa|dlweb|ec2linkfinder|estyle|ezooms|facebookexternalhit|facebookplatform|fairshare|feedfetcher|feedfetcher-google|feedly|fetch|flipboardproxy|genieo|google|grapeshot|hatena-useragent|head|hosttracker|hubspot|ia_archiver|ichiro|iltrovatore-setaccio|immediatenet|ina|infegyatlas|infohelfer|instapaper|jabse|james|kumkie|linkdex|linkfluence|linkwalker|litefinder|loadimpactpageanalyzer|luminate|lycos|lycosa|mediapartners-google|msai|najdi|netcraftsurveyagent|netestate|netseer|nuhk|panscient|parsijoo|plukkie|proximic|qirina|qualidator|rambler|readability|sbsearch|scooter|scrapy|scrubby|scrubbybloglines|shareaholic|shopwiki|sistrix|sitechecker|siteexplorer|sogou|special_archiver|speedy|spinn3r|spreadtrum|steeler|subscriber|suma|superdownloads|svenska-webbsido|teoma|thumbshots|tineye|trendiction|tweetedtimes|tweetmeme|uaslinkchecker|undrip|unwindfetchor|vedma|vkshare|vm|wch|web|webalta|webcookies|webthumbnail|wesee|wise-guys|woko|wotbox|y!j-bri|y!j-bro|y!j-brw|y!j-bsc|yahoo|yahoo!|yahooysmcm|yats|yeti|zeerch'
. ',i',(string) $_SERVER['HTTP_USER_AGENT'])
);