To load recognizers from file, use RecognizerRegistryProvider
to instantiate the recognizer registry and then pass it through to the analyzer engine:
from presidio_analyzer import AnalyzerEngine
from presidio_analyzer.recognizer_registry import RecognizerRegistryProvider
recognizer_registry_conf_file = "./analyzer/recognizers-config.yml"
provider = RecognizerRegistryProvider(
conf_file=recognizer_registry_conf_file
)
registry = provider.create_recognizer_registry()
analyzer = AnalyzerEngine(registry=registry)
results = analyzer.analyze(text="My name is Morris", language="en")
print(results)
global_regex_flags: 26
supported_languages:
- en
recognizers:
...
The configuration file consists of two parts:
global_regex_flags
: regex flags to be used in regex matching (see regex flags).supported_languages
: A list of supported languages that the registry will support.recognizers
: a list of recognizers to be loaded by the recognizer registry. This list consists of two different types of recognizers:
!!! note "Note"
supported_languages must be identical to the same field in analyzer_engine
The recognizer list comprises of both the predefined and custom recognizers, for example:
...
- name: CreditCardRecognizer
supported_languages:
- language: en
context: [credit, card, visa, mastercard, cc, amex, discover, jcb, diners, maestro, instapayment]
- language: es
context: [tarjeta, credito, visa, mastercard, cc, amex, discover, jcb, diners, maestro, instapayment]
- language: it
- language: pl
type: predefined
- name: UsBankRecognizer
supported_languages:
- en
type: predefined
- name: MedicalLicenseRecognizer
type: predefined
- name: ExampleCustomRecognizer
patterns:
- name: "zip code (weak)"
regex: "(\\b\\d{5}(?:\\-\\d{4})?\\b)"
score: 0.01
- name: "zip code (weak)"
regex: "(\\b\\d{5}(?:\\-\\d{4})?\\b)"
score: 0.01
supported_languages:
- language: en
context: [zip, code]
- language: es
context: [código, postal]
supported_entity: "ZIP"
type: custom
enabled: true
- name: "TitlesRecognizer"
supported_language: "en"
supported_entity: "TITLE"
deny_list: [Mr., Mrs., Ms., Miss, Dr., Prof.]
deny_list_score: 1
supported_languages
: A list of supported languages that the analyzer will support. In case this field is missing, a recognizer will be created for each supported language provided to the AnalyzerEngine
.
In addition to the language code, this field also contains a list of context words, which increases confidence in the detection in case it is found in the surroundings of a detected entity (as seen in the credit card example above).type
: this could be either predefined or custom. As this is optional, if not stated otherwise, the default type is custom.name
: Different per the type of the recognizer. For predefined recognizers, this is the class name as defined in presidio, while for custom recognizers, it will be set as the name of the recognizer.patterns
: a list of objects of type Pattern
that contains a name, score and regex that define matching patterns.enabled
: enables or disables the recognizer.supported_entity
: the detected entity associated by the recognizer.deny_list
: A list of words to detect, in case the recognizer uses a predefined list of words.deny_list_score
: confidence score for a term identified using a deny-list.Вы можете оставить комментарий после Вход в систему
Неприемлемый контент может быть отображен здесь и не будет показан на странице. Вы можете проверить и изменить его с помощью соответствующей функции редактирования.
Если вы подтверждаете, что содержание не содержит непристойной лексики/перенаправления на рекламу/насилия/вульгарной порнографии/нарушений/пиратства/ложного/незначительного или незаконного контента, связанного с национальными законами и предписаниями, вы можете нажать «Отправить» для подачи апелляции, и мы обработаем ее как можно скорее.
Опубликовать ( 0 )