1.2.0 • Published 3 months ago

crawler-user-agents v1.2.0

Weekly downloads
4,975
License
MIT
Repository
github
Last release
3 months ago

crawler-user-agents

This repository contains a list of of HTTP user-agents used by robots, crawlers, and spiders as in single JSON file.

Each pattern is a regular expression. It should work out-of-the-box wih your favorite regex library.

If you use this project in a commercial product, please sponsor it.

Install

Direct download

Download the crawler-user-agents.json file from this repository directly.

Javascript

crawler-user-agents is deployed on npmjs.com: https://www.npmjs.com/package/crawler-user-agents

To use it using npm or yarn:

npm install --save crawler-user-agents
# OR
yarn add crawler-user-agents

In Node.js, you can require the package to get an array of crawler user agents.

const crawlers = require('crawler-user-agents');
console.log(crawlers);

Python

Install with pip install crawler-user-agents

Then:

import crawleruseragents
if crawleruseragents.is_crawler("Googlebot/"):
   # do something

or:

import crawleruseragents
indices = crawleruseragents.matching_crawlers("bingbot/2.0")
print("crawlers' indices:", indices)
print(
    "crawler's URL:",
    crawleruseragents.CRAWLER_USER_AGENTS_DATA[indices[0]]["url"]
)

Note that matching_crawlers is much slower than is_crawler, if the given User-Agent does indeed match any crawlers.

Go

Go: use this package, it provides global variable Crawlers (it is synchronized with crawler-user-agents.json), functions IsCrawler and MatchingCrawlers.

Example of Go program:

package main

import (
	"fmt"

	"github.com/monperrus/crawler-user-agents"
)

func main() {
	userAgent := "Mozilla/5.0 (compatible; Discordbot/2.0; +https://discordapp.com)"

	isCrawler := agents.IsCrawler(userAgent)
	fmt.Println("isCrawler:", isCrawler)

	indices := agents.MatchingCrawlers(userAgent)
	fmt.Println("crawlers' indices:", indices)
	fmt.Println("crawler's URL:", agents.Crawlers[indices[0]].URL)
}

Output:

isCrawler: true
crawlers' indices: [237]
crawler' URL: https://discordapp.com

Contributing

I do welcome additions contributed as pull requests.

The pull requests should:

  • contain a single addition
  • specify a discriminant relevant syntactic fragment (for example "totobot" and not "Mozilla/5 totobot v20131212.alpha1")
  • contain the pattern (generic regular expression), the discovery date (year/month/day) and the official url of the robot
  • result in a valid JSON file (don't forget the comma between items)

Example:

{
  "pattern": "rogerbot",
  "addition_date": "2014/02/28",
  "url": "http://moz.com/help/pro/what-is-rogerbot-",
  "instances" : ["rogerbot/2.3 example UA"]
}

License

The list is under a MIT License. The versions prior to Nov 7, 2016 were under a CC-SA license.

Related work

There are a few wrapper libraries that use this data to detect bots:

Other systems for spotting robots, crawlers, and spiders that you may want to consider are:

1.2.0

3 months ago

1.0.165

3 months ago

1.1.0

3 months ago

1.0.166

3 months ago

1.0.164

4 months ago

1.0.161

4 months ago

1.0.160

4 months ago

1.0.163

4 months ago

1.0.162

4 months ago

1.0.159

4 months ago

1.0.158

5 months ago

1.0.156

6 months ago

1.0.157

6 months ago

1.0.155

6 months ago

1.0.154

8 months ago

1.0.153

8 months ago

1.0.152

8 months ago

1.0.145

10 months ago

1.0.144

10 months ago

1.0.147

9 months ago

1.0.146

9 months ago

1.0.149

9 months ago

1.0.148

9 months ago

1.0.150

9 months ago

1.0.151

8 months ago

1.0.143

10 months ago

1.0.142

1 year ago

1.0.141

1 year ago

1.0.140

1 year ago

1.0.139

1 year ago

1.0.138

1 year ago

1.0.137

1 year ago

1.0.136

1 year ago

1.0.135

1 year ago

1.0.134

1 year ago

1.0.133

1 year ago

1.0.132

1 year ago

1.0.131

1 year ago

1.0.130

1 year ago

1.0.129

1 year ago

1.0.128

1 year ago

1.0.125

1 year ago

1.0.124

1 year ago

1.0.127

1 year ago

1.0.126

1 year ago

1.0.123

1 year ago

1.0.122

1 year ago

1.0.121

1 year ago

1.0.120

1 year ago

1.0.119

1 year ago

1.0.118

1 year ago

1.0.117

1 year ago

1.0.109

2 years ago

1.0.108

2 years ago

1.0.110

2 years ago

1.0.112

2 years ago

1.0.111

2 years ago

1.0.114

2 years ago

1.0.113

2 years ago

1.0.116

2 years ago

1.0.115

2 years ago

1.0.107

2 years ago

1.0.106

2 years ago

1.0.105

2 years ago

1.0.104

2 years ago

1.0.101

2 years ago

1.0.103

2 years ago

1.0.102

2 years ago

1.0.100

3 years ago

1.0.95

3 years ago

1.0.94

3 years ago

1.0.93

3 years ago

1.0.99

3 years ago

1.0.98

3 years ago

1.0.97

3 years ago

1.0.96

3 years ago

1.0.91

3 years ago

1.0.90

3 years ago

1.0.92

3 years ago

1.0.89

3 years ago

1.0.88

3 years ago

1.0.87

3 years ago

1.0.86

4 years ago

1.0.85

4 years ago

1.0.84

4 years ago

1.0.83

4 years ago

1.0.80

4 years ago

1.0.82

4 years ago

1.0.81

4 years ago

1.0.1

5 years ago

1.0.79

5 years ago

1.0.77

5 years ago

1.0.78

5 years ago

1.0.76

6 years ago

1.0.75

6 years ago

1.0.74

6 years ago

1.0.73

6 years ago

1.0.72

6 years ago

1.0.71

6 years ago

1.0.70

6 years ago

1.0.69

6 years ago

1.0.68

6 years ago

1.0.67

6 years ago

1.0.66

6 years ago

1.0.65

6 years ago

1.0.64

6 years ago

1.0.63

6 years ago

1.0.62

6 years ago

1.0.61

6 years ago

1.0.60

6 years ago

1.0.59

6 years ago

1.0.58

6 years ago

1.0.57

6 years ago

1.0.56

6 years ago

1.0.55

6 years ago

1.0.54

6 years ago

1.0.53

6 years ago

1.0.52

6 years ago

1.0.51

6 years ago

1.0.50

6 years ago

1.0.49

6 years ago

1.0.48

6 years ago

1.0.47

6 years ago

1.0.46

6 years ago

1.0.45

6 years ago

1.0.44

6 years ago

1.0.43

6 years ago

1.0.42

6 years ago

1.0.41

6 years ago

1.0.40

6 years ago

1.0.39

6 years ago

1.0.38

6 years ago

1.0.37

6 years ago

1.0.36

6 years ago

1.0.35

6 years ago

1.0.34

6 years ago

1.0.33

6 years ago

1.0.32

6 years ago

1.0.31

6 years ago

1.0.30

6 years ago

1.0.29

6 years ago

1.0.28

6 years ago

1.0.27

6 years ago

1.0.26

6 years ago

1.0.25

6 years ago

1.0.24

6 years ago

1.0.23

6 years ago

1.0.22

6 years ago

1.0.21

6 years ago

1.0.20

6 years ago

1.0.19

6 years ago

1.0.18

6 years ago

1.0.17

6 years ago

1.0.16

6 years ago

1.0.15

6 years ago

1.0.14

6 years ago

1.0.13

6 years ago

1.0.12

6 years ago

1.0.11

6 years ago

1.0.10

6 years ago

1.0.9

6 years ago

1.0.8

6 years ago

1.0.7

6 years ago

1.0.6

6 years ago

1.0.5

6 years ago

1.0.4

6 years ago

1.0.3

6 years ago

1.0.2

6 years ago

1.0.0

6 years ago

0.0.1

10 years ago