Here is some C# code to read pdf content into a list. It's very simple as you can see. As far as regex is concerned, I am not an expert. There are many on this site that can help you if you give them some concrete examples.
using iTextSharp.text.pdf;
using iTextSharp.text.pdf.parser;
private List<string> ReadPdfFile()
{
var result = new List<string>();
using (var reader = new PdfReader(SourceFile))
{
for (var page = 1; page <= reader.NumberOfPages; page++)
{
var rr = PdfTextExtractor.GetTextFromPage(reader, page);
var temp = Regex.Split(rr, "\n");
result.AddRange(temp);
}
}
return result;
}
Edited by user
2019-02-25T11:35:00Z
|
Reason: Not specified