Using Ifstream to Read Files and Columns

As a data scientist, reading and writing data from/to CSV is 1 of the most common tasks I do on the daily. R, my language of option, makes this like shooting fish in a barrel with read.csv() and write.csv() (although I tend to apply fread() and fwrite() from the data.table package).

Hot Have . C++ is not R.

As far equally I know, there is no CSV reader/writer built into the C++ STL. That'south not a knock confronting C++; it's just a lower level linguistic communication. If nosotros want to read and write CSV files with C++, we'll accept to deal with File I/O, data types, and some low level logic on how to read, parse, and write data. For me, this is a necessary pace in lodge to build and test more fun programs like machine learning models.

Writing to CSV

We'll start by creating a uncomplicated CSV file with one cavalcade of integer information. And we'll requite it the header Foo.

                          #include              <fstream>                                                        int              main              ()              {              // Create an output filestream object                                          std              ::              ofstream              myFile              (              "foo.csv"              );              // Ship data to the stream                                          myFile              <<              "Foo              \n              "              ;              myFile              <<              "1              \n              "              ;              myFile              <<              "ii              \northward              "              ;              myFile              <<              "iii              \due north              "              ;              // Close the file                                          myFile              .              close              ();              render              0              ;              }                      

Here, ofstream is an "output file stream". Since it's derived from ostream, we tin can treat information technology just like cout (which is as well derived from ostream). The result of executing this plan is that we get a file called foo.csv in the same directory as our executable. Let's wrap this into a write_csv() part that's a little more dynamic.

                          #include              <cord>                                          #include              <fstream>                                          #include              <vector>                                                        void              write_csv              (              std              ::              string              filename              ,              std              ::              cord              colname              ,              std              ::              vector              <              int              >              vals              ){              // Brand a CSV file with one column of integer values                                          // filename - the name of the file                                          // colname - the proper noun of the one and simply column                                          // vals - an integer vector of values                                          // Create an output filestream object                                          std              ::              ofstream              myFile              (              filename              );              // Ship the column name to the stream                                          myFile              <<              colname              <<              "              \n              "              ;              // Send data to the stream                                          for              (              int              i              =              0              ;              i              <              vals              .              size              ();              ++              i              )              {              myFile              <<              vals              .              at              (              i              )              <<              "              \n              "              ;              }              // Close the file                                          myFile              .              close              ();              }              int              main              ()              {              // Make a vector of length 100 filled with 1s                                          std              ::              vector              <              int              >              vec              (              100              ,              1              );              // Write the vector to CSV                                          write_csv              (              "ones.csv"              ,              "Col1"              ,              vec              );              render              0              ;              }                      

Cool. Now we can apply write_csv() to write a vector of integers to a CSV file with ease. Let'south expand on this to support multiple vectors of integers and corresponding column names.

                          #include              <string>                                          #include              <fstream>                                          #include              <vector>                                          #include              <utility> // std::pair                                                        void              write_csv              (              std              ::              string              filename              ,              std              ::              vector              <              std              ::              pair              <              std              ::              string              ,              std              ::              vector              <              int              >>>              dataset              ){              // Make a CSV file with 1 or more columns of integer values                                          // Each column of information is represented by the pair <column name, column data>                                          //   as std::pair<std::string, std::vector<int>>                                          // The dataset is represented as a vector of these columns                                          // Annotation that all columns should be the same size                                          // Create an output filestream object                                          std              ::              ofstream              myFile              (              filename              );              // Ship column names to the stream                                          for              (              int              j              =              0              ;              j              <              dataset              .              size              ();              ++              j              )              {              myFile              <<              dataset              .              at              (              j              ).              beginning              ;              if              (              j              !=              dataset              .              size              ()              -              1              )              myFile              <<              ","              ;              // No comma at end of line                                          }              myFile              <<              "              \north              "              ;              // Send data to the stream                                          for              (              int              i              =              0              ;              i              <              dataset              .              at              (              0              ).              second              .              size              ();              ++              i              )              {              for              (              int              j              =              0              ;              j              <              dataset              .              size              ();              ++              j              )              {              myFile              <<              dataset              .              at              (              j              ).              second              .              at              (              i              );              if              (              j              !=              dataset              .              size              ()              -              one              )              myFile              <<              ","              ;              // No comma at end of line                                          }              myFile              <<              "              \northward              "              ;              }              // Shut the file                                          myFile              .              close              ();              }              int              chief              ()              {              // Make three vectors, each of length 100 filled with 1s, 2s, and 3s                                          std              ::              vector              <              int              >              vec1              (              100              ,              1              );              std              ::              vector              <              int              >              vec2              (              100              ,              2              );              std              ::              vector              <              int              >              vec3              (              100              ,              three              );              // Wrap into a vector                                          std              ::              vector              <              std              ::              pair              <              std              ::              string              ,              std              ::              vector              <              int              >>>              vals              =              {{              "One"              ,              vec1              },              {              "Two"              ,              vec2              },              {              "Three"              ,              vec3              }};              // Write the vector to CSV                                          write_csv              (              "three_cols.csv"              ,              vals              );              return              0              ;              }                      

Here we've represented each column of data equally a std::pair of <column proper name, column values>, and the whole dataset every bit a std::vector of such columns. Now we tin can write a variable number of integer columns to a CSV file.

Reading from CSV

Now that nosotros've written some CSV files, permit's attempt to read them. For now allow's correctly assume that our file contains integer data plus one row of column names at the elevation.

                          #include              <string>                                          #include              <fstream>                                          #include              <vector>                                          #include              <utility> // std::pair                                          #include              <stdexcept> // std::runtime_error                                          #include              <sstream> // std::stringstream                                                        std              ::              vector              <              std              ::              pair              <              std              ::              cord              ,              std              ::              vector              <              int              >>>              read_csv              (              std              ::              string              filename              ){              // Reads a CSV file into a vector of <cord, vector<int>> pairs where                                          // each pair represents <column name, column values>                                          // Create a vector of <string, int vector> pairs to store the result                                          std              ::              vector              <              std              ::              pair              <              std              ::              cord              ,              std              ::              vector              <              int              >>>              effect              ;              // Create an input filestream                                          std              ::              ifstream              myFile              (              filename              );              // Make sure the file is open                                          if              (              !              myFile              .              is_open              ())              throw              std              ::              runtime_error              (              "Could non open file"              );              // Helper vars                                          std              ::              string              line              ,              colname              ;              int              val              ;              // Read the column names                                          if              (              myFile              .              practiced              ())              {              // Excerpt the get-go line in the file                                          std              ::              getline              (              myFile              ,              line              );              // Create a stringstream from line                                          std              ::              stringstream              ss              (              line              );              // Extract each column name                                          while              (              std              ::              getline              (              ss              ,              colname              ,              ','              )){              // Initialize and add together <colname, int vector> pairs to result                                          result              .              push_back              ({              colname              ,              std              ::              vector              <              int              >              {}});              }              }              // Read data, line by line                                          while              (              std              ::              getline              (              myFile              ,              line              ))              {              // Create a stringstream of the current line                                          std              ::              stringstream              ss              (              line              );              // Go along track of the current column alphabetize                                          int              colIdx              =              0              ;              // Extract each integer                                          while              (              ss              >>              val              ){              // Add the current integer to the 'colIdx' column's values vector                                          event              .              at              (              colIdx              ).              second              .              push_back              (              val              );              // If the next token is a comma, ignore it and move on                                          if              (              ss              .              peek              ()              ==              ','              )              ss              .              ignore              ();              // Increment the column index                                          colIdx              ++              ;              }              }              // Shut file                                          myFile              .              close              ();              return              outcome              ;              }              int              main              ()              {              // Read three_cols.csv and ones.csv                                          std              ::              vector              <              std              ::              pair              <              std              ::              string              ,              std              ::              vector              <              int              >>>              three_cols              =              read_csv              (              "three_cols.csv"              );              std              ::              vector              <              std              ::              pair              <              std              ::              string              ,              std              ::              vector              <              int              >>>              ones              =              read_csv              (              "ones.csv"              );              // Write to another file to cheque that this was successful                                          write_csv              (              "three_cols_copy.csv"              ,              three_cols              );              write_csv              (              "ones_copy.csv"              ,              ones              );              render              0              ;              }                      

This program reads our previously created CSV files and writes each dataset to a new file, substantially creating copies of our original files.

Going further

So far we've seen how to read and write datasets with integer values only. Extending this to read/write a dataset of only doubles or only strings should be fairly straight-forrad. Reading a dataset with unknown, mixed data types is another beast and beyond the telescopic of this commodity, simply meet this code review for possible solutions.

Special thank you to papagaga and Incomputable for helping me with this topic via codereview.stackexchange.com.

carsonmiturnenings.blogspot.com

Source: https://www.gormanalysis.com/blog/reading-and-writing-csv-files-with-cpp/

0 Response to "Using Ifstream to Read Files and Columns"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel