1.6.2 • Published 9 years ago

parliament-parser v1.6.2

Weekly downloads
-
License
MIT
Repository
-
Last release
9 years ago

parliament-parser

Data

The pdf folder contains original PDFs from the Myanmar Union Election Commission.

Word Docs are not available to the public but were essential to getting the organized-txt files and final constituency lists.

The organized-txt folder contains a list of constituencies using the format:

1st_KNOWN_CONSTITUENCY_NAME
list of wards in the 1st constituency (typically separated by a ၊ mark)
additional lines with more wards in the first constituency
(BLANK LINE)
list of wards in the 2nd constituency
...
2nd_KNOWN_CONSTITUENCY_NAME

The csv folder contains CSVs with a row for each ward.

Install

Get pdftotext command line tool and pdftk (see https://github.com/nisaacson/pdf-extract)

Install parliament-parser

npm install

Run

node parser.js constituencies/AmyotharHulttaw.pdf constituencies/AmyotharHulttaw.csv

Language notes

The legislature PDFs are based on Unicode characters and Myanmar3 font, but have unusual errors with the order and placement of diacritics. Use my-diacritic-sort module to sort them.

Myanmar numerals are converted to digits 0-9 for readability

License

Open Source under an MIT License