1.0.1 • Published 6 years ago

srt2corpus v1.0.1

Weekly downloads
3
License
MIT
Repository
github
Last release
6 years ago

srt2corpus

What can srt2corpus do?

It can help you to convert srt file into CN-? parallel corpus

Install

npm i srt2corpus

Usage

var fs = require('fs')
var corpus = require('srt2corpus')

// or use fs.readFile with callback
var data = fs.readFileSync(file.path, 'utf8') 
var results = corpus(data)

// add options
var results = corpus(data, {
  'verbose-each-line': false,
  'verbose-reason': false
})

Options

keyDescriptionDefault
skip-position-annotationSkip SRT Formate (Not Implemented)true
skip-no-parallel-corpusSkip when just chinese (not parallel)true
skip-multiple-lineSkip when multiple line subtitletrue
skip-not-acceptable-charactersSkip special characterstrue
acceptable-symbolsWhitelist for special characters'!!??「」,,.。
auto-convert-to-traditional-chineseConvert chinese to Traditional Chinesetrue
auto-convert-to-simplified-chineseConvert chinese to Simplified Chinesefalse
verbose-each-lineVerbose each subtitletrue
verbose-reasonVerbose when skip in which ruletrue
parallel-corpus-separate-symbolCustom your separate symbol/word\n

Sample Result

{ id: '16',
  startTime: '00:00:35,720',
  endTime: '00:00:39,500',
  text: '就算一點欺詐行為  風投公司都認為是死罪\nEven a whiff of fraud is a mortal sin for VCs.' }
{ id: '17',
  startTime: '00:00:40,320',
  endTime: '00:00:42,760',
  text: '\'魔笛手\'品牌及其名下的所有資產現在\nPied Piper and all of its assets are now officially' }
{ id: '18',
  startTime: '00:00:42,760',
  endTime: '00:00:44,700',
  text: '正式屬于\'巴赫頭有限責任公司\'了\nthe property of Bachmanity LLC.' }
{ id: '19',
  startTime: '00:00:44,700',
  endTime: '00:00:48,010',
  text: '-你剛才說\'巴赫頭\'  -他們出價最高\n- Did you just say "Bachmanity?" - They had the highest bid?' }
{ id: '20',
  startTime: '00:00:48,010',
  endTime: '00:00:49,880',
  text: '那么  就賣給巴赫頭了\nBachmanity it is then.' }
{ id: '21',
  startTime: '00:00:50,550',
  endTime: '00:00:52,790',
  text: '理查德  是你救了我們\nRichard, you pulled us out of a nosedive.' }
{ id: '22',
  startTime: '00:00:52,800',
  endTime: '00:00:56,300',
  text: '當然了  我們必定會失敗\nOf course, inevitably we will plummet towards the earth,' }
array = [
  '就算一點欺詐行為  風投公司都認為是死罪\nEven a whiff of fraud is a mortal sin for VCs.',
  '\'魔笛手\'品牌及其名下的所有資產現在\nPied Piper and all of its assets are now officially',
  '正式屬於\'巴赫頭有限責任公司\'了\nthe property of Bachmanity LLC.',
  '那麼  就賣給巴赫頭了\nBachmanity it is then.',
  '理查德  是你救了我們\nRichard, you pulled us out of a nosedive.',
  '當然了  我們必定會失敗\nOf course, inevitably we will plummet towards the earth,'
]