Conversion of PDF data to Spreadsheet
8 views (last 30 days)
Show older comments
HI I have several reports in PDF format. I would like to write an m-script to capture the data into spreadsheet. I thought the best method would be to add all the headers to an array, capturing each page's data in the PDF to different sheets in Excel and then populate the fields with the values corresponding to the headers. Is there a better way to achieve this?
0 Comments
Answers (1)
Guillaume
on 24 May 2017
Well, your first hurdle will be to capture the data from the pdf. There is no built-in tool for this in matlab and depending on the structure of the pdf this will be either a fair amount of work (data is actually stored as continuous text in the file) or extremely hard (data is stored as text but scattered through the file, or data is just an image of the text which will require ocr).
pdf is not really designed to transfer structured data to a computer. It's mostly meant to be read by a human.
2 Comments
Guillaume
on 25 May 2017
Shaili Bulusu's comment posted as an answer moved here:
I understand the difficulties. But I have a script that will read the data for me from the pdf. My query is on the approach of sorting the headers as an array or if there is a better way to capture the data into a spreadsheet.
Guillaume
on 25 May 2017
More details on what the approach of sorting the headers as an array means would be required to answer your question. What form does the inputs come in, and what form of output do you want?
See Also
Categories
Find more on Spreadsheets in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!