Creating smaller files from one large file, and data is being replaced with NaN

2 views (last 30 days)
Hello,
I am working on some data cleaning. I have a large metadata file. The metadata file contains many different data types. I have taken the metadata file and created 3 new, smaller files. Each of these three smaller files contains only one data type. This is where I encounter a problem.
In the new, smaller files, columns of data are being replaced with "NaN," even though there is data in that column in the metadat file. This is happening to several of the colums of data. In the metadata file, the columns contain data. However, in the smaller files that data is being replaced with "NaN."
I cannot find a common denominator between the colomns of data that don't transfer form the large file to the smaller ones. Also, a column that transfers to one of the smaller files may not transfer to another of the smaller files, even though both of the new files were created the same way. For example, the external ID (colunmn B) transfered between the metadata file and the smaller metabolomics file I created. However, the external ID (column B) did not transfer between the metadata file and the smaller transcriptomics data file I created.
I attempted to use the 'fillmissing' function, but it was unsuccessful. I seem to be at a roadblock. I am using R2020b. I would greatly appreciate any help or suggestions on what to try next! (please let me know if I was unsuccessful in attaching my code to this question)
  1 Comment
Voss
Voss on 25 Jun 2022
Can you also attach the large metadata file (hmp2_metadata.csv)? If it's too large to attach, reduce the size by removing some data, but please try to do so such that the smaller large metadata file still exhibits the NaN-columns-on-transfer problem you describe, if at all possible.
You might also try using readcell/writecell rather than readtable/writetable, to see if the problem still happens.

Sign in to comment.

Answers (1)

Ayush Modi
Ayush Modi on 18 Oct 2023
Hi Emma,
As per my understanding, you are getting NaN values in some of the columns when storing the data into smaller files, even though there are values in these respective places in the original file.
A few possible ways around this are as follows:
  • Using “readcell”:
You can use “readcell” instead of “readtable”. Please refer to the below MathWorks documentation for more information on “readcell” function:
  • You can try importing the data via the import tool. You can see which data is being replaced by NaN. Change the datatype as per the data in the columns.
I hope this resolves the issue you were facing.

Products


Release

R2020b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!