<stdin> |

My Thoughts, Trials and Adventures

Reading CSV files in C

Posted at — Jun 14, 2020 | Last Modified on — May 11, 2023

Introduction

In this, I will show you how to read CSV files in C.

Example.csv

Let’s take a look at example.csv

date,open,high,low,close,volume
2020-03-11 09:15:00,26272.75,26453.65,26241.5,26443.85,0
2020-03-11 09:16:00,26447.1,26542.2,26423.6,26520.45,0
2020-03-11 09:17:00,26526.75,26543.15,26465.55,26522.15,0
2020-03-11 09:18:00,26531.3,26556,26504.55,26543.95,0
2020-03-11 09:19:00,26544.55,26633.85,26544.55,26633.85,0
2020-03-11 09:20:00,26633.9,26676.1,26619.15,26647.05,0
2020-03-11 09:21:00,26647,26654.5,26611.7,26652.6,0
2020-03-11 09:22:00,26659.05,26698.1,26646.7,26690.25,0
2020-03-11 09:23:00,26691.9,26711.9,26649.5,26659.1,0
2020-03-11 09:24:00,26659,26784.2,26659,26753.85,0
2020-03-11 09:25:00,26751.9,26767.95,26721.1,26727.3,0
2020-03-11 09:26:00,26729.95,26742.45,26684.55,26688.4,0
2020-03-11 09:27:00,26684.25,26751.25,26670.4,26728.15,0
2020-03-11 09:28:00,26727.9,26768.1,26725.8,26755.95,0
2020-03-11 09:29:00,26755.65,26779.7,26755.65,26760.65,0

This is an example of OHLC data - Open High Low Close Data. This type of data is widely used to represent movement in a financial instruments. we will use this example data to demonstrate how to read CSV files in C.

Setup

  • gcc version 10.1.0 (GCC)

Hello World

CSV stands for Comma Separated Values. The process of reading CSV can be broken down into two steps:

  • Read a line -> L
  • Split L by ,

Now, you have the values of the row.


Reading a line

we use fgets() from string.h.

let’s demonstrate this.

#include<stdio.h>
#include<string.h>
#include<stdbool.h>

#define MAXCHAR 1000

int main(){

    FILE *fp;
    char row[MAXCHAR];

    fp = fopen("example.csv","r");

    ;

    while (feof(fp) != true)
    {
        fgets(row, MAXCHAR, fp);
        printf("Row: %s", row);
    }
    

    return 0;
}

Output

Row: date,open,high,low,close,volume
Row: 2020-03-11 09:15:00,26272.75,26453.65,26241.5,26443.85,0
Row: 2020-03-11 09:16:00,26447.1,26542.2,26423.6,26520.45,0
Row: 2020-03-11 09:17:00,26526.75,26543.15,26465.55,26522.15,0
Row: 2020-03-11 09:18:00,26531.3,26556,26504.55,26543.95,0
Row: 2020-03-11 09:19:00,26544.55,26633.85,26544.55,26633.85,0
Row: 2020-03-11 09:20:00,26633.9,26676.1,26619.15,26647.05,0
Row: 2020-03-11 09:21:00,26647,26654.5,26611.7,26652.6,0
Row: 2020-03-11 09:22:00,26659.05,26698.1,26646.7,26690.25,0
Row: 2020-03-11 09:23:00,26691.9,26711.9,26649.5,26659.1,0
Row: 2020-03-11 09:24:00,26659,26784.2,26659,26753.85,0
Row: 2020-03-11 09:25:00,26751.9,26767.95,26721.1,26727.3,0
Row: 2020-03-11 09:26:00,26729.95,26742.45,26684.55,26688.4,0
Row: 2020-03-11 09:27:00,26684.25,26751.25,26670.4,26728.15,0
Row: 2020-03-11 09:28:00,26727.9,26768.1,26725.8,26755.95,0
Row: 2020-03-11 09:29:00,26755.65,26779.7,26755.65,26760.65,0%   

Tokenizing

Tokenizing is the processing of splitting a string by any given character, in our case we want to split it by ,.

we use strtok() from string.h

code

#include<stdio.h>
#include<string.h>
#include<stdbool.h>

#define MAXCHAR 1000

int main(){

    FILE *fp;
    char row[MAXCHAR];
    char *token;

    fp = fopen("example.csv","r");


    while (feof(fp) != true)
    {
        fgets(row, MAXCHAR, fp);
        printf("Row: %s", row);

        token = strtok(row, ",");

        while(token != NULL)
        {
            printf("Token: %s\n", token);
            token = strtok(NULL, ",");
        }

    }
    

    return 0;

}

Output

Row: date,open,high,low,close,volume
Token: date
Token: open
Token: high
Token: low
Token: close
Token: volume

Row: 2020-03-11 09:15:00,26272.75,26453.65,26241.5,26443.85,0
Token: 2020-03-11 09:15:00
Token: 26272.75
Token: 26453.65
Token: 26241.5
Token: 26443.85
Token: 0

Row: 2020-03-11 09:16:00,26447.1,26542.2,26423.6,26520.45,0
Token: 2020-03-11 09:16:00
Token: 26447.1
Token: 26542.2
Token: 26423.6
Token: 26520.45
Token: 0

Row: 2020-03-11 09:17:00,26526.75,26543.15,26465.55,26522.15,0
Token: 2020-03-11 09:17:00
Token: 26526.75
Token: 26543.15
Token: 26465.55
Token: 26522.15
Token: 0

Row: 2020-03-11 09:18:00,26531.3,26556,26504.55,26543.95,0
Token: 2020-03-11 09:18:00
Token: 26531.3
Token: 26556
Token: 26504.55
Token: 26543.95
Token: 0

Row: 2020-03-11 09:19:00,26544.55,26633.85,26544.55,26633.85,0
Token: 2020-03-11 09:19:00
Token: 26544.55
Token: 26633.85
Token: 26544.55
Token: 26633.85
Token: 0

....

Conclusion

For large CSV files, it’s advisable to read using a combination of fseek() and ftell()