1.0.135 • Published 7 days ago

crawler-user-agents v1.0.135

Weekly downloads
4,975
License
MIT
Repository
github
Last release
7 days ago

crawler-user-agents

This repository contains a list of of HTTP user-agents used by robots, crawlers, and spiders as in single JSON file.

Install

Direct download

Download the crawler-user-agents.json file from this repository directly.

Npm / Yarn

crawler-user-agents is deployed on npmjs.com: https://www.npmjs.com/package/crawler-user-agents

To use it using npm or yarn:

npm install --save crawler-user-agents
# OR
yarn add crawler-user-agents

In Node.js, you can require the package to get an array of crawler user agents.

const crawlers = require('crawler-user-agents');
console.log(crawlers);

Usage

Each pattern is a regular expression. It should work out-of-the-box wih your favorite regex library:

  • JavaScript: if (RegExp(entry.pattern).test(req.headers['user-agent']) { ... }
  • PHP: add a slash before and after the pattern: if (preg_match('/'.$entry['pattern'].'/', $_SERVER['HTTP_USER_AGENT'])): ...
  • Python: if re.search(entry['pattern'], ua): ...
  • Go: use this package, it provides global variable Crawlers (it is synchronized with crawler-user-agents.json), functions IsCrawler and MatchingCrawlers.

Example of Go program:

package main

import (
	"fmt"

	"github.com/monperrus/crawler-user-agents"
)

func main() {
	userAgent := "Mozilla/5.0 (compatible; Discordbot/2.0; +https://discordapp.com)"

	isCrawler := agents.IsCrawler(userAgent)
	fmt.Println("isCrawler:", isCrawler)

	indices := agents.MatchingCrawlers(userAgent)
	fmt.Println("crawlers' indices:", indices)
	fmt.Println("crawler' URL:", agents.Crawlers[indices[0]].URL)
}

Output:

isCrawler: true
crawlers' indices: [237]
crawler' URL: https://discordapp.com

Contributing

I do welcome additions contributed as pull requests.

The pull requests should:

  • contain a single addition
  • specify a discriminant relevant syntactic fragment (for example "totobot" and not "Mozilla/5 totobot v20131212.alpha1")
  • contain the pattern (generic regular expression), the discovery date (year/month/day) and the official url of the robot
  • result in a valid JSON file (don't forget the comma between items)

Example:

{
  "pattern": "rogerbot",
  "addition_date": "2014/02/28",
  "url": "http://moz.com/help/pro/what-is-rogerbot-",
  "instances" : ["rogerbot/2.3 example UA"]
}

License

The list is under a MIT License. The versions prior to Nov 7, 2016 were under a CC-SA license.

Related work

There are a few wrapper libraries that use this data to detect bots:

Other systems for spotting robots, crawlers, and spiders that you may want to consider are:

1.0.135

7 days ago

1.0.134

18 days ago

1.0.133

18 days ago

1.0.132

1 month ago

1.0.131

1 month ago

1.0.130

1 month ago

1.0.129

1 month ago

1.0.128

1 month ago

1.0.125

1 month ago

1.0.124

1 month ago

1.0.127

1 month ago

1.0.126

1 month ago

1.0.123

2 months ago

1.0.122

2 months ago

1.0.121

2 months ago

1.0.120

2 months ago

1.0.119

3 months ago

1.0.118

3 months ago

1.0.117

4 months ago

1.0.109

8 months ago

1.0.108

9 months ago

1.0.110

8 months ago

1.0.112

8 months ago

1.0.111

8 months ago

1.0.114

8 months ago

1.0.113

8 months ago

1.0.116

6 months ago

1.0.115

6 months ago

1.0.107

1 year ago

1.0.106

1 year ago

1.0.105

1 year ago

1.0.104

1 year ago

1.0.101

1 year ago

1.0.103

1 year ago

1.0.102

1 year ago

1.0.100

1 year ago

1.0.95

2 years ago

1.0.94

2 years ago

1.0.93

2 years ago

1.0.99

1 year ago

1.0.98

1 year ago

1.0.97

1 year ago

1.0.96

2 years ago

1.0.91

2 years ago

1.0.90

2 years ago

1.0.92

2 years ago

1.0.89

2 years ago

1.0.88

2 years ago

1.0.87

2 years ago

1.0.86

2 years ago

1.0.85

3 years ago

1.0.84

3 years ago

1.0.83

3 years ago

1.0.80

3 years ago

1.0.82

3 years ago

1.0.81

3 years ago

1.0.1

3 years ago

1.0.79

4 years ago

1.0.77

4 years ago

1.0.78

4 years ago

1.0.76

4 years ago

1.0.75

4 years ago

1.0.74

4 years ago

1.0.73

4 years ago

1.0.72

4 years ago

1.0.71

4 years ago

1.0.70

4 years ago

1.0.69

4 years ago

1.0.68

4 years ago

1.0.67

4 years ago

1.0.66

4 years ago

1.0.65

4 years ago

1.0.64

5 years ago

1.0.63

5 years ago

1.0.62

5 years ago

1.0.61

5 years ago

1.0.60

5 years ago

1.0.59

5 years ago

1.0.58

5 years ago

1.0.57

5 years ago

1.0.56

5 years ago

1.0.55

5 years ago

1.0.54

5 years ago

1.0.53

5 years ago

1.0.52

5 years ago

1.0.51

5 years ago

1.0.50

5 years ago

1.0.49

5 years ago

1.0.48

5 years ago

1.0.47

5 years ago

1.0.46

5 years ago

1.0.45

5 years ago

1.0.44

5 years ago

1.0.43

5 years ago

1.0.42

5 years ago

1.0.41

5 years ago

1.0.40

5 years ago

1.0.39

5 years ago

1.0.38

5 years ago

1.0.37

5 years ago

1.0.36

5 years ago

1.0.35

5 years ago

1.0.34

5 years ago

1.0.33

5 years ago

1.0.32

5 years ago

1.0.31

5 years ago

1.0.30

5 years ago

1.0.29

5 years ago

1.0.28

5 years ago

1.0.27

5 years ago

1.0.26

5 years ago

1.0.25

5 years ago

1.0.24

5 years ago

1.0.23

5 years ago

1.0.22

5 years ago

1.0.21

5 years ago

1.0.20

5 years ago

1.0.19

5 years ago

1.0.18

5 years ago

1.0.17

5 years ago

1.0.16

5 years ago

1.0.15

5 years ago

1.0.14

5 years ago

1.0.13

5 years ago

1.0.12

5 years ago

1.0.11

5 years ago

1.0.10

5 years ago

1.0.9

5 years ago

1.0.8

5 years ago

1.0.7

5 years ago

1.0.6

5 years ago

1.0.5

5 years ago

1.0.4

5 years ago

1.0.3

5 years ago

1.0.2

5 years ago

1.0.0

5 years ago

0.0.1

9 years ago