Reading CSV files in C
Introduction
In this, I will show you how to read CSV files in C.
Example.csv
Let’s take a look at example.csv
date,open,high,low,close,volume
2020-03-11 09:15:00,26272.75,26453.65,26241.5,26443.85,0
2020-03-11 09:16:00,26447.1,26542.2,26423.6,26520.45,0
2020-03-11 09:17:00,26526.75,26543.15,26465.55,26522.15,0
2020-03-11 09:18:00,26531.3,26556,26504.55,26543.95,0
2020-03-11 09:19:00,26544.55,26633.85,26544.55,26633.85,0
2020-03-11 09:20:00,26633.9,26676.1,26619.15,26647.05,0
2020-03-11 09:21:00,26647,26654.5,26611.7,26652.6,0
2020-03-11 09:22:00,26659.05,26698.1,26646.7,26690.25,0
2020-03-11 09:23:00,26691.9,26711.9,26649.5,26659.1,0
2020-03-11 09:24:00,26659,26784.2,26659,26753.85,0
2020-03-11 09:25:00,26751.9,26767.95,26721.1,26727.3,0
2020-03-11 09:26:00,26729.95,26742.45,26684.55,26688.4,0
2020-03-11 09:27:00,26684.25,26751.25,26670.4,26728.15,0
2020-03-11 09:28:00,26727.9,26768.1,26725.8,26755.95,0
2020-03-11 09:29:00,26755.65,26779.7,26755.65,26760.65,0
This is an example of OHLC data - Open High Low Close Data. This type of data is widely used to represent movement in a financial instruments. we will use this example data to demonstrate how to read CSV files in C.
Setup
- gcc version 10.1.0 (GCC)
Hello World
CSV stands for Comma Separated Values. The process of reading CSV can be broken down into two steps:
- Read a line ->
L
- Split
L
by,
Now, you have the values of the row.
Reading a line
we use fgets()
from string.h
.
let’s demonstrate this.
#include<stdio.h>
#include<string.h>
#include<stdbool.h>
#define MAXCHAR 1000
int main(){
FILE *fp;
char row[MAXCHAR];
fp = fopen("example.csv","r");
;
while (feof(fp) != true)
{
fgets(row, MAXCHAR, fp);
printf("Row: %s", row);
}
return 0;
}
Output
Row: date,open,high,low,close,volume
Row: 2020-03-11 09:15:00,26272.75,26453.65,26241.5,26443.85,0
Row: 2020-03-11 09:16:00,26447.1,26542.2,26423.6,26520.45,0
Row: 2020-03-11 09:17:00,26526.75,26543.15,26465.55,26522.15,0
Row: 2020-03-11 09:18:00,26531.3,26556,26504.55,26543.95,0
Row: 2020-03-11 09:19:00,26544.55,26633.85,26544.55,26633.85,0
Row: 2020-03-11 09:20:00,26633.9,26676.1,26619.15,26647.05,0
Row: 2020-03-11 09:21:00,26647,26654.5,26611.7,26652.6,0
Row: 2020-03-11 09:22:00,26659.05,26698.1,26646.7,26690.25,0
Row: 2020-03-11 09:23:00,26691.9,26711.9,26649.5,26659.1,0
Row: 2020-03-11 09:24:00,26659,26784.2,26659,26753.85,0
Row: 2020-03-11 09:25:00,26751.9,26767.95,26721.1,26727.3,0
Row: 2020-03-11 09:26:00,26729.95,26742.45,26684.55,26688.4,0
Row: 2020-03-11 09:27:00,26684.25,26751.25,26670.4,26728.15,0
Row: 2020-03-11 09:28:00,26727.9,26768.1,26725.8,26755.95,0
Row: 2020-03-11 09:29:00,26755.65,26779.7,26755.65,26760.65,0%
Tokenizing
Tokenizing is the processing of splitting a string by any given character, in our case we want to split it by ,
.
we use strtok()
from string.h
code
#include<stdio.h>
#include<string.h>
#include<stdbool.h>
#define MAXCHAR 1000
int main(){
FILE *fp;
char row[MAXCHAR];
char *token;
fp = fopen("example.csv","r");
while (feof(fp) != true)
{
fgets(row, MAXCHAR, fp);
printf("Row: %s", row);
token = strtok(row, ",");
while(token != NULL)
{
printf("Token: %s\n", token);
token = strtok(NULL, ",");
}
}
return 0;
}
Output
Row: date,open,high,low,close,volume
Token: date
Token: open
Token: high
Token: low
Token: close
Token: volume
Row: 2020-03-11 09:15:00,26272.75,26453.65,26241.5,26443.85,0
Token: 2020-03-11 09:15:00
Token: 26272.75
Token: 26453.65
Token: 26241.5
Token: 26443.85
Token: 0
Row: 2020-03-11 09:16:00,26447.1,26542.2,26423.6,26520.45,0
Token: 2020-03-11 09:16:00
Token: 26447.1
Token: 26542.2
Token: 26423.6
Token: 26520.45
Token: 0
Row: 2020-03-11 09:17:00,26526.75,26543.15,26465.55,26522.15,0
Token: 2020-03-11 09:17:00
Token: 26526.75
Token: 26543.15
Token: 26465.55
Token: 26522.15
Token: 0
Row: 2020-03-11 09:18:00,26531.3,26556,26504.55,26543.95,0
Token: 2020-03-11 09:18:00
Token: 26531.3
Token: 26556
Token: 26504.55
Token: 26543.95
Token: 0
Row: 2020-03-11 09:19:00,26544.55,26633.85,26544.55,26633.85,0
Token: 2020-03-11 09:19:00
Token: 26544.55
Token: 26633.85
Token: 26544.55
Token: 26633.85
Token: 0
....
Conclusion
For large CSV files, it’s advisable to read using a combination of fseek()
and ftell()