c# - Capturing a HTML DOM after a page has fully loaded - using a service -


we have requirement load html pages , capture full html dom after page has executed page load scripts , page has loaded completely. needs server based approach because of potential volume , need spread load across multiple machines.

we're hoping in .net without using visual controls such web browser control tie sta environment , message pumps.

we can download html pages no problem, can't wait until scripts etc, have completed executing , capture content @ stage.

maybe it's possible

  • to use parts of new edge library capture dom without rendering canvas
  • there may custom components available allow emulating hosting environment (aka browser) , allowing access dom once has loaded.

any information on solving problem appreciated, if need move outside of .net world.

this sounds functionality included in web crawler. may possible use abot.


Comments