Module fusus.about.howto
Install and update
code and documentation fusus.about.install
- get code
- update docs
- update code
Run
Straight from the command line fusus.convert
- run the OCR pipeline from the command line
- run the PDF extraction from the command line
- convert TSV to TF
Contribute more sources
From "no comments" to "more comments" fusus.works
- add commentaries as works
Explore
Page by page in a notebook
- do example Run the pipeline in a notebook on the examples;
- do Afifi Run the pipeline in a notebook on the Afifi edition of the Fusus;
- inspect Inspect intermediate results in a notebook.
- ocr Read the proofs of Kraken-OCR.
- notebooks on nbviewer. All notebooks.
Tweak
Sickness and cure by parameters
- tweak Basic parameter tweaking;
fusus.parameters
All parameters.- comma A ministudy in cleaning: tweak mark templates and parameters to wipe commas.
- lines Follow the line detection algorithm in a wide variety of cases.
- piece What to do if you have an image that is a small fragment of a page.
Engineer
Change the flow
-
fusus.lakhnawi
PDF reverse engineering. -
drilldown Narrow down to specific pages and lines and see what text is extracted from which portion.
- pages Work with pages, follow line division, extract text and save to disk.
- characters See which characters are in the PDF and how they are converted.
- final See in the effect of final characters on spacing.
- border
See how black borders get removed from a page.
See also
cropBorders()
andremoveBorders()
.
Work
Do data science with the results
fusus
Description of the TSV output of the pipeline and the PDF text extraction- useTsv Use the TSV output of the pipeline.
- useTf Use the Text-Fabric output of the pipeline.
- boxes Work with bounding boxes in the Text-Fabric data of the Lakhnawi.
Expand source code Browse git
"""
.. include:: ../docs/about/howto.md
"""