extractHTMLText extracts only a part of a web page text

42 views (last 30 days)
Sim
Sim on 21 Nov 2024 at 11:16
Answered: Umeshraja on 22 Nov 2024 at 4:39
I was trying to extract the entire text of an open-access scientific article (I hope this is legal! :-) ), but it looks like that extractHTMLText extracts only a small part of that text (it looks like just a part of the references, please see my comment to see what I exactly get):
url = "https://www.nature.com/articles/s44172-024-00270-9";
code = webread(url);
str = extractHTMLText(code)
How can I extract the entire manuscript?
  1 Comment
Sim
Sim on 21 Nov 2024 at 11:16
Here below the result I get:
str =
'Gates, B. et al. New approaches to nanofabrication: molding, printing, and other techniques. Chem. Rev. 105, 1171–1196 (2005).
Article Google Scholar
Quake, S. & Scherer, A. From micro- to nanofabrication with soft materials. Science 290, 15361540 (2000).
Article Google Scholar
Douglas, S. M., Bachelet, I. & Church, G. M. A logic-gated nanorobot for targeted transport of molecular payloads. Science 335, 831834 (2012).
Article Google Scholar
Li, S. et al. A DNA nanorobot functions as a cancer therapeutic in response to a molecular trigger in vivo. Nat. Biotechnol. 36, 258+ (2018).
Article Google Scholar
Gratton, S. E. A. et al. The effect of particle design on cellular internalization pathways. Proc. Natl Acad. Sci. USA 105, 1161311618 (2008).
Article Google Scholar
Wang, J., Byrne, J. D., Napier, M. E. & DeSimone, J. M. More effective nanomedicines through particle design. Small 7, 19191931 (2011).
Article Google Scholar
Seiler, H. Secondary-electron emission in the scanning electron-microscope. J. Appl. Phys. 54, R1R18 (1983).
Article Google Scholar
Egerton, R., Li, P. & Malac, M. Radiation damage in the TEM and SEM. Micron 35, 399409 (2004).
Article Google Scholar
Binnig, G., Quate, C. F. & Gerber, C. Atomic force microscope. Phys. Rev. Lett. 56, 930933 (1986).
Article Google Scholar
Tian, F., Qian, X. & Villarrubia, J. S. Blind estimation of general tip shape in afm imaging. Ultramicroscopy 109, 4453 (2008).
Article Google Scholar
Golek, F., Mazur, P., Ryszka, Z. & Zuber, S. AFM image artifacts. Appl. Surf. Sci. 304, 1119 (2014).
Article Google Scholar
Velegol, S., Pardi, S., Li, X., Velegol, D. & Logan, B. AFM imaging artifacts due to bacterial cell height and AFM tip geometry. Langmuir 19, 851857 (2003).
Article Google Scholar
Westra, K., Mitchell, A. & Thomson, D. Tip artifacts in atomic-force microscope imaging of thin-film surfaces. J. Appl. Phys. 74, 36083610 (1993).
Article Google Scholar
Martin, Y. & Wickramasinghe, H. K. Method for imaging sidewalls by atomic-force microscopy. Appl. Phys. Lett. 64, 24982500 (1994).
Article Google Scholar
Orji, N. G. & Dixson, R. G. Higher order tip effects in traceable CD-AFM-based linewidth measurements. Meas. Sci. Technol. 18, 448455 (2007).
Article Google Scholar
Thiesler, J., Tutsch, R., Fromm, K. & Dai, G. True 3D-AFM sensor for nanometrology. Meas. Sci. Technol. 31, 074012 (2020).
Article Google Scholar
Geng, J., Zhang, H., Meng, X., Rong, W. & Xie, H. Sidewall imaging of microarray-based biosensor using an orthogonal cantilever probe. IEEE Trans. Instrum. Meas. 70, 18 (2021).
Article Google Scholar
Nguyen, C. et al. Carbon nanotube scanning probe for profiling of deep-ultraviolet and 193 nm photoresist patterns. Appl. Phys. Lett. 81, 901903 (2002).
Article Google Scholar
Cho, S.-J. et al. Three-dimensional imaging of undercut and sidewall structures by Atomic Force Microscopy. Rev. Sci. Instrum. 82, 23707 (2011).
Kizu, R., Misumi, I., Hirai, A., Kinoshita, K. & Gonda, S. Development of a metrological atomic force microscope with a tip-tilting mechanism for 3D nanometrology. Meas. Sci. Technol. 29, 075005 (2018).
Article Google Scholar
Xie, H., Hussain, D., Yang, F. & Sun, L. Development of three-dimensional atomic force microscope for sidewall structures imaging with controllable scanning density. IEEE/ASME Trans. Mechatron. 21, 316328 (2016).
Google Scholar
Wu, J.-W. et al. Effective tilting angles for a dual probes AFM system to achieve high-precision scanning. IEEE/ASME Trans. Mechatron. 21, 25122521 (2016).
Article Google Scholar
Xie, H., Hussain, D., Yang, F. & Sun, L. Atomic force microscope caliper for critical dimension measurements of micro and nanostructures through sidewall scanning. Ultramicroscopy 158, 816 (2015).
Article Google Scholar
Zhao, X., Fu, J., Chu, W., Nguyen, C. & Vorburger, T. V. An image stitching method to eliminate the distortion of the sidewall in linewidth measurement. In Metrology, Inspection, and Process Control for Microlithography XVIII, vol. 5375, 363373 (2004).
Pan, S.-P., Liou, H.-C., Chen, C.-C. A., Chen, J.-R. & Liu, T.-S. Precision measurement of sub-50 nm linewidth by stitching double-tilt images. Jpn J. Appl. Phys. 49, 06GK06 (2010).
Article Google Scholar
Kawata, S., Sun, H., Tanaka, T. & Takada, K. Finer features for functional microdevices - micromachines can be created with higher resolution using two-photon absorption. Nature 412, 697698 (2001).
Article Google Scholar
Jaiswal, A. et al. Two decades of two-photon lithography: materials science perspective for additive manufacturing of 2D/3D nano-microstructures. Iscience 26, 106374 (2023).
Li, J. & Pumera, M. 3D printing of functional microrobots. Chem. Soc. Rev. 50, 27942838 (2021).
Article Google Scholar
Dabbagh, S. R. et al. 3D-printed microrobots from design to translation. Nat. Commun. 13, 5875 (2022).
Jun, Y.-w, Choi, J.-s & Cheon, J. Shape control of semiconductor and metal oxide nanocrystals through nonhydrolytic colloidal routes. Angew. Chem. Int. Ed. 45, 34143439 (2006).
Article Google Scholar
Izadi, S. et al. Kinectfusion: Real-time 3D reconstruction and interaction using a moving depth camera. In Proceedings of the 24th annual ACM symposium on User interface software and technology, 559568 (2011).
Curless, B. & Levoy, M. A volumetric method for building complex models from range images. In Proceedings of the 23rd annual conference on Computer graphics and interactive techniques, 303312 (1996).
Nießner, M., Zollhöfer, M., Izadi, S. & Stamminger, M. Real-time 3D reconstruction at scale using voxel hashing. ACM Trans. Graph. (ToG) 32, 111 (2013).
Google Scholar
Xie, Y. et al. Neural fields in visual computing and beyond. Computer Graph. Forum 41, 641676 (2022).
Article Google Scholar
Weder, S., Schonberger, J. L., Pollefeys, M. & Oswald, M. R. NeuralFusion: Online depth fusion in latent space. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 31623172 (2021).
Li, K., Tang, Y., Prisacariu, V. A. & Torr, P. H. BNV-fusion: dense 3D reconstruction using bi-level neural volume fusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 61666175 (2022).
Mildenhall, B. et al. NeRF: representing scenes as neural radiance fields for view synthesis. Commun. ACM 65, 99106 (2021).
Article Google Scholar
Oechsle, M., Peng, S. & Geiger, A. UNISURF: unifying neural implicit surfaces and radiance fields for multi-view reconstruction. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 55895599 (2021).
Yariv, L. et al. Multiview neural surface reconstruction by disentangling geometry and appearance. Adv. Neural Inf. Process. Syst. 33, 24922502 (2020).
Google Scholar
Yariv, L., Gu, J., Kasten, Y. & Lipman, Y. Volume rendering of neural implicit surfaces. Adv. Neural Inf. Process. Syst. 34, 48054815 (2021).
Google Scholar
Sucar, E., Liu, S., Ortiz, J. & Davison, A. J. iMAP: Implicit mapping and positioning in real-time. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 62296238 (2021).
Wang, P. et al. NeuS: Learning neural implicit surfaces by volume rendering for multi-view reconstruction. Adv. Neural. Inf. Process. Syst. 34, 2717127183 (2021).
Bettencourt, A. & Almeida, A. J. Poly(methyl methacrylate) particulate carriers in drug delivery. J. Microencapsul. 29, 353367 (2012).
Article Google Scholar
Tang, E., Cheng, G., Pang, X., Ma, X. & Xing, F. Synthesis of nano-ZnO/poly(methyl methacrylate) composite microsphere through emulsion polymerization and its UV-shielding property. Colloid Polym. Sci. 284, 422428 (2006).
Article Google Scholar
Zhu, A., Shi, Z., Cai, A., Zhao, F. & Liao, T. Synthesis of core-shell PMMA-SiO2 nanoparticles with suspension-dispersion-polymerization in an aqueous system and its effect on mechanical properties of PVC composites. Polym. Test. 27, 540547 (2008).
Article Google Scholar
Zhong, G., Liu, D. & Zhang, J. The application of ZIF-67 and its derivatives: adsorption, separation, electrochemistry and catalysts. J. Mater. Chem. A 6, 18871899 (2018).
Article Google Scholar
Qian, J., Sun, F. & Qin, L. Hydrothermal synthesis of zeolitic imidazolate framework-67 (ZIF-67) nanocrystals. Mater. Lett. 82, 220223 (2012).
Article Google Scholar
Wang, L. et al. Flexible solid-state supercapacitor based on a metal-organic framework interwoven by electrochemically-deposited PANI. J. Am. Chem. Soc. 137, 49204923 (2015).
Article Google Scholar
Yang, J. et al. Hollow Zn/Co ZIF particles derived from core-shell ZIF-67@ZIF-8 as selective catalyst for the semi-hydrogenation of acetylene. Angew. Chem.-Int. Ed. 54, 1088910893 (2015).
Article Google Scholar
Rusinkiewicz, S. & Levoy, M. Efficient variants of the ICP algorithm. In Proceedings third international conference on 3-D digital imaging and modeling, 145152 (2001).
Do, C. B. & Batzoglou, S. What is the expectation maximization algorithm? Nat. Biotechnol. 26, 897899 (2008).
Article Google Scholar
Moon, T. The expectation-maximization algorithm. IEEE Signal Process. Mag. 13, 4760 (1996).
Article Google Scholar
Gropp, A., Yariv, L., Haim, N., Atzmon, M. & Lipman, Y. Implicit geometric regularization for learning shapes. Proceedings of the 37th International Conference on Machine Learning 3789–3799 (2020).
Hecht-Nielsen, R. Theory of the backpropagation neural network. In Neural networks for perception, 6593 (1992).
Müller, T., Evans, A., Schied, C. & Keller, A. Instant neural graphics primitives with a multiresolution hash encoding. ACM Trans. Graph. (ToG) 41, 115 (2022).
Article Google Scholar
Lorensen, W. E. & Cline, H. E. Marching cubes: a high resolution 3D surface construction algorithm. In Seminal Graphics: Pioneering Efforts That Shaped The Field, 347353 (1998).
Selimis, A., Mironov, V. & Farsari, M. Direct laser writing: principles and materials for scaffold 3D printing. Microelectron. Eng. 132, 8389 (2015).
Article Google Scholar
Shen, J., Zhang, D., Zhang, F.-H. & Gan, Y. AFM tip-sample convolution effects for cylinder protrusions. Appl. Surf. Sci. 422, 482491 (2017).
Article Google Scholar
Lee, J. H. et al. Electrically pumped sub-wavelength metallo-dielectric pedestal pillar lasers. Opt. Express 19, 2152421531 (2011).
Article Google Scholar
Chaubey, S. K. & Jain, N. K. State-of-art review of past research on manufacturing of meso and micro cylindrical gears. Precis. Eng. 51, 702728 (2018).
Article Google Scholar
Community, B. O. Blender - a 3D Modelling And Rendering Package (Blender Foundation, Stichting Blender Foundation, 2018).
Reis, C. P., Neufeld, R. J., Ribeiro, A. J. & Veiga, F. Nanoencapsulation I. Methods for preparation of drug-loaded polymeric nanoparticles. Nanomed. Nanotechnol. Biol. Med. 2, 821 (2006).
Article Google Scholar
Saliba, D., Ammar, M., Rammal, M., Al-Ghoul, M. & Hmadeh, M. Crystal growth of ZIF-8, ZIF-67, and their mixed-metal derivatives. J. Am. Chem. Soc. 140, 18121823 (2018).
Article Google Scholar
Nordin, N. A. H. M., Ismail, A. F., Mustafa, A., Murali, R. S. & Matsuura, T. The impact of ZIF-8 particle size and heat treatment on CO 2/CH 4 separation using asymmetric mixed matrix membrane. RSC Adv. 4, 5253052541 (2014).
Article Google Scholar
Xia, Y., Xiong, Y., Lim, B. & Skrabalak, S. E. Shape-controlled synthesis of metal nanocrystals: simple chemistry meets complex physics? Angew. Chem. Int. Ed. 48, 60103 (2009).
Article Google Scholar
Amyot, R. & Flechsig, H. BioAFMviewer: an interactive interface for simulated AFM scanning of biomolecular structures and dynamics. PLoS Comput. Biol. 16, e1008444 (2020).
Sitzmann, V., Martel, J., Bergman, A., Lindell, D. & Wetzstein, G. Implicit neural representations with periodic activation functions. Adv. Neural Inf. Process. Syst. 33, 74627473 (2020).
Google Scholar
Chen, Y., Liu, S. & Wang, X. Learning continuous image representation with local implicit image function. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 86288638 (2021).
Pumarola, A., Corona, E., Pons-Moll, G. & Moreno-Noguer, F. D-NeRF: neural radiance fields for dynamic scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 1031810327 (2021).
Uchihashi, T., Kodera, N. & Ando, T. Guide to video recording of structure dynamics and dynamic processes of proteins by high-speed atomic force microscopy. Nat. Protoc. 7, 11931206 (2012).
Article Google Scholar
Zhou, Q.-Y., Park, J. & Koltun, V. Open3D: a modern library for 3D data processing. arXiv https://doi.org/10.48550/arXiv.1801.09847 (2018).
Guo, Y.-C. Instant neural surface reconstruction. Github https://github.com/bennyguo/instant-nsr-pl (2022).
Paszke, A. et al. PyTorch: an imperative style, high-performance deep learning library. Adv. Neural Inf. Process. Syst. 32, 80248035 (2019).
Markiewicz, P. & Goh, M. Identifying locations on a substrate for the repeated positioning of AFM samples. Ultramicroscopy 68, 215221 (1997).
Article Google Scholar
Abu Quba, A. A., Schaumann, G. E., Karagulyan, M. & Diehl, D. A new approach for repeated tip-sample relocation for AFM imaging of nano and micro sized particles and cells in liquid environment. Ultramicroscopy 211, 112945 (2020).
Liu, Z. et al. Mechanically engraved mica surface using the atomic force microscope tip facilitates return to a specific sample location. Microsc. Res. Tech. 66, 156162 (2005).
Article Google Scholar
Grupp, M. evo: Python package for the evaluation of odometry and SLAM. Github https://github.com/MichaelGrupp/evo (2017).
Dai, J. S. Eulerrodrigues formula variations, quaternion conjugation and intrinsic connections. Mech. Mach. Theory 92, 144152 (2015).
Article Google Scholar
Zeng, A. et al. Volumetric TSDF Fusion of RGB-D images in python. Github https://github.com/andyzeng/tsdf-fusion-python (2017).
Chen, S. et al. Multi-view neural 3D reconstruction of micro- and nanostructures with atomic force microscopy. Github https://github.com/zju3dv/MVN-AFM (2024).
Download references'

Sign in to comment.

Answers (1)

Umeshraja
Umeshraja on 22 Nov 2024 at 4:39
Hi @Sim,
I understand you wanted to extract entire text from the HTML page. To achieve this, you could set the 'ex' argument to 'all-text', which will extract all text within the HTML body, excluding scripts and CSS styles. Here's the script you can use:
extractHTMLText(code,"ex","all-text")
For more details, please refer to the following MATLAB docuemntation on 'extractHTMLText'
Hope it helps!

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!