1.0.1 • Published 2 years ago

@docktools/csv v1.0.1

Weekly downloads
-
License
ISC
Repository
-
Last release
2 years ago

csv

this project is dedicated to simplify csv parsing for developers.

automatic data retrieving, data parsing (json), dynamic writing along with custom control of a lot of functionalities.

usage

this project is written in typescript and you can compile the code into javascript by running npm run build or tsc,

for developement use ts-node src instead.

const csv = new CSV('./files/test.csv', {
    // separator of csv columns, default: ;
    sep: ';'
})

reading data

data retrieving

to read from file we use:

const data = await csv.read(['name', 'url', 'price', 'date', 'extra'])

we get something like

console.log(data)
>>>
[
    {
        value: {
            name: 'product 1',
            url: 'example.com/productPage',
            price: '99.99',
            date: 'Mon Nov 22 2021 15:01:59 GMT+0100 (GMT+01:00)',
            extra: ''
        }
    },
    ...
]

in this case we had property extra empty because it's empty in the file, if you want to exclude it add an option excludeEmpty: true as follows

const data = await csv.read(['name', 'url', 'price', 'date', 'extra'], {
    excludeEmpty: true
})

now sometimes we work with large files and iterating on every line and push to an array and then iterate again from your side to use the parsed data is resource consuming.

for that you can use Javascript Generator Functions with the option ticks: true as follows

const cursor = await csv.read(['name', 'url', 'price', 'date', 'extra'], {
    ticks: true
});
while (true) {
    const tick = await cursor.next()
    if (tick.done) break
    const { value: info } = tick.value
    console.log(info)
}

data parsing

you can parse data automatically by specifying types of columns using csv headers

for now we can parse number, date more will be added in the future, an auto type detector can be possibly integrated in the future using machine learning for data labeling.

const data = await csv.read(['name', 'url', 'price', 'date', 'extra'], {
    types: {
        price: 'number',
        date: 'date'
    }
})
console.log(data)
>>>
[
    {
        value: {
            name: 'product 1', 
            url: 'example.com/productPage', 
            price: 99.99, 
            date: 2021-11-22T14:01:59.000Z
        }
    },
    ...
]

extra info

by default you get an object with property value containing the columns data.

additionally you can get pos, line, columns

pos refers to line position in the file making it easier for debugging

line returns the whole line

columns return an array of all columns but not labeled with headers

you can get the extra info using getters option

const data = await csv.read(['name', 'url', 'price', 'date', 'extra'], {
    // default: ['value']
    getters: ['value', 'pos', 'line', 'columns']
})
console.log(data)
>>>
[
    {
        value: {
            name: 'product 1',
            url: 'example.com/productPage',
            price: '99.99',
            date: 'Mon Nov 22 2021 15:01:59 GMT+0100 (GMT+01:00)',
            extra: ''
        }
        pos: 2,
        line: "product 1;example.com/productPage;99.99;Mon Nov 22 2021 15:01:59 GMT+0100 (GMT+01:00);",
        columns: ["product 1, "example.com/productPage", "99.99", "Mon Nov 22 2021 15:01:59 GMT+0100 (GMT+01:00)"]
    },
    ...
]

Note:

as you may see in src/utils/splitLine.ts i created a custom parser for lines not depending on ; separator alone

we may face a case when one columns has the delimiter ; but it's inside double quotes like "product 1 ; for product 2"

so we have to ignore that delimiter and that's what this function do

writing data

writing simplified

writing data made simple, you just pass an array of objects and the header positions in the file will be automatically recognized

await csv.write(['name', 'url', 'price', 'date'], [
    { name: 'product 2', url: "exp.com/productPage/2", price: 69.96, date: new Date() },
    ...
])
>>> test.csv
name;url;price;date
product 2;exp.com/productPage/2;69.96;Mon Nov 22 2021 15:01:59 GMT+0100 (GMT+01:00)

dynamic writing

optionally, this function can handle new columns of data that dont exist in file's headers

for that you can use the option dynamic: true to let the function know that you want dynamic headers

Note: this functionality is resource consuming as it write the whole file again instead of appending

await csv.write(['name', 'url', 'price', 'date', 'newHeader'], [
    {
        name: 'product 3',
        url: "exp.com/productPage/3",
        price: 69.96,
        date: new Date(),
        newHeader: 'newData'
    },
    ...
])
>>> test.csv
name;url;price;date;newHeader
product 2;exp.com/productPage/2;69.96;Mon Nov 22 2021 15:01:59 GMT+0100 (GMT+01:00);newData