1.2.0 • Published 4 months ago

crawler-user-agents v1.2.0

Weekly downloads
4,975
License
MIT
Repository
github
Last release
4 months ago

crawler-user-agents

This repository contains a list of of HTTP user-agents used by robots, crawlers, and spiders as in single JSON file.

Each pattern is a regular expression. It should work out-of-the-box wih your favorite regex library.

If you use this project in a commercial product, please sponsor it.

Install

Direct download

Download the crawler-user-agents.json file from this repository directly.

Javascript

crawler-user-agents is deployed on npmjs.com: https://www.npmjs.com/package/crawler-user-agents

To use it using npm or yarn:

npm install --save crawler-user-agents
# OR
yarn add crawler-user-agents

In Node.js, you can require the package to get an array of crawler user agents.

const crawlers = require('crawler-user-agents');
console.log(crawlers);

Python

Install with pip install crawler-user-agents

Then:

import crawleruseragents
if crawleruseragents.is_crawler("Googlebot/"):
   # do something

or:

import crawleruseragents
indices = crawleruseragents.matching_crawlers("bingbot/2.0")
print("crawlers' indices:", indices)
print(
    "crawler's URL:",
    crawleruseragents.CRAWLER_USER_AGENTS_DATA[indices[0]]["url"]
)

Note that matching_crawlers is much slower than is_crawler, if the given User-Agent does indeed match any crawlers.

Go

Go: use this package, it provides global variable Crawlers (it is synchronized with crawler-user-agents.json), functions IsCrawler and MatchingCrawlers.

Example of Go program:

package main

import (
	"fmt"

	"github.com/monperrus/crawler-user-agents"
)

func main() {
	userAgent := "Mozilla/5.0 (compatible; Discordbot/2.0; +https://discordapp.com)"

	isCrawler := agents.IsCrawler(userAgent)
	fmt.Println("isCrawler:", isCrawler)

	indices := agents.MatchingCrawlers(userAgent)
	fmt.Println("crawlers' indices:", indices)
	fmt.Println("crawler's URL:", agents.Crawlers[indices[0]].URL)
}

Output:

isCrawler: true
crawlers' indices: [237]
crawler' URL: https://discordapp.com

Contributing

I do welcome additions contributed as pull requests.

The pull requests should:

  • contain a single addition
  • specify a discriminant relevant syntactic fragment (for example "totobot" and not "Mozilla/5 totobot v20131212.alpha1")
  • contain the pattern (generic regular expression), the discovery date (year/month/day) and the official url of the robot
  • result in a valid JSON file (don't forget the comma between items)

Example:

{
  "pattern": "rogerbot",
  "addition_date": "2014/02/28",
  "url": "http://moz.com/help/pro/what-is-rogerbot-",
  "instances" : ["rogerbot/2.3 example UA"]
}

License

The list is under a MIT License. The versions prior to Nov 7, 2016 were under a CC-SA license.

Related work

There are a few wrapper libraries that use this data to detect bots:

Other systems for spotting robots, crawlers, and spiders that you may want to consider are:

1.2.0

4 months ago

1.0.165

5 months ago

1.1.0

4 months ago

1.0.166

4 months ago

1.0.164

5 months ago

1.0.161

6 months ago

1.0.160

6 months ago

1.0.163

6 months ago

1.0.162

6 months ago

1.0.159

6 months ago

1.0.158

7 months ago

1.0.156

7 months ago

1.0.157

7 months ago

1.0.155

8 months ago

1.0.154

9 months ago

1.0.153

9 months ago

1.0.152

9 months ago

1.0.145

11 months ago

1.0.144

11 months ago

1.0.147

11 months ago

1.0.146

11 months ago

1.0.149

11 months ago

1.0.148

11 months ago

1.0.150

10 months ago

1.0.151

10 months ago

1.0.143

12 months ago

1.0.142

1 year ago

1.0.141

1 year ago

1.0.140

1 year ago

1.0.139

1 year ago

1.0.138

1 year ago

1.0.137

1 year ago

1.0.136

1 year ago

1.0.135

1 year ago

1.0.134

1 year ago

1.0.133

1 year ago

1.0.132

1 year ago

1.0.131

1 year ago

1.0.130

1 year ago

1.0.129

1 year ago

1.0.128

1 year ago

1.0.125

1 year ago

1.0.124

1 year ago

1.0.127

1 year ago

1.0.126

1 year ago

1.0.123

1 year ago

1.0.122

1 year ago

1.0.121

1 year ago

1.0.120

1 year ago

1.0.119

1 year ago

1.0.118

1 year ago

1.0.117

2 years ago

1.0.109

2 years ago

1.0.108

2 years ago

1.0.110

2 years ago

1.0.112

2 years ago

1.0.111

2 years ago

1.0.114

2 years ago

1.0.113

2 years ago

1.0.116

2 years ago

1.0.115

2 years ago

1.0.107

2 years ago

1.0.106

2 years ago

1.0.105

2 years ago

1.0.104

2 years ago

1.0.101

2 years ago

1.0.103

2 years ago

1.0.102

2 years ago

1.0.100

3 years ago

1.0.95

3 years ago

1.0.94

3 years ago

1.0.93

3 years ago

1.0.99

3 years ago

1.0.98

3 years ago

1.0.97

3 years ago

1.0.96

3 years ago

1.0.91

3 years ago

1.0.90

3 years ago

1.0.92

3 years ago

1.0.89

3 years ago

1.0.88

4 years ago

1.0.87

4 years ago

1.0.86

4 years ago

1.0.85

4 years ago

1.0.84

4 years ago

1.0.83

4 years ago

1.0.80

4 years ago

1.0.82

4 years ago

1.0.81

4 years ago

1.0.1

5 years ago

1.0.79

5 years ago

1.0.77

5 years ago

1.0.78

5 years ago

1.0.76

6 years ago

1.0.75

6 years ago

1.0.74

6 years ago

1.0.73

6 years ago

1.0.72

6 years ago

1.0.71

6 years ago

1.0.70

6 years ago

1.0.69

6 years ago

1.0.68

6 years ago

1.0.67

6 years ago

1.0.66

6 years ago

1.0.65

6 years ago

1.0.64

6 years ago

1.0.63

6 years ago

1.0.62

6 years ago

1.0.61

6 years ago

1.0.60

6 years ago

1.0.59

6 years ago

1.0.58

6 years ago

1.0.57

6 years ago

1.0.56

6 years ago

1.0.55

6 years ago

1.0.54

6 years ago

1.0.53

6 years ago

1.0.52

6 years ago

1.0.51

6 years ago

1.0.50

6 years ago

1.0.49

6 years ago

1.0.48

6 years ago

1.0.47

6 years ago

1.0.46

6 years ago

1.0.45

6 years ago

1.0.44

6 years ago

1.0.43

6 years ago

1.0.42

6 years ago

1.0.41

6 years ago

1.0.40

6 years ago

1.0.39

6 years ago

1.0.38

6 years ago

1.0.37

6 years ago

1.0.36

6 years ago

1.0.35

6 years ago

1.0.34

6 years ago

1.0.33

6 years ago

1.0.32

6 years ago

1.0.31

6 years ago

1.0.30

6 years ago

1.0.29

6 years ago

1.0.28

6 years ago

1.0.27

6 years ago

1.0.26

6 years ago

1.0.25

6 years ago

1.0.24

6 years ago

1.0.23

6 years ago

1.0.22

6 years ago

1.0.21

6 years ago

1.0.20

6 years ago

1.0.19

6 years ago

1.0.18

6 years ago

1.0.17

6 years ago

1.0.16

6 years ago

1.0.15

6 years ago

1.0.14

6 years ago

1.0.13

6 years ago

1.0.12

6 years ago

1.0.11

6 years ago

1.0.10

6 years ago

1.0.9

6 years ago

1.0.8

6 years ago

1.0.7

6 years ago

1.0.6

6 years ago

1.0.5

6 years ago

1.0.4

6 years ago

1.0.3

6 years ago

1.0.2

6 years ago

1.0.0

7 years ago

0.0.1

10 years ago