Document how documents are filtered out before being processed by the ASRAEL workflow
Not ALL XML documents should be kept in the search engine. So far, we have decided to exclude the documents that have :
- the value "Text Program" for the
FormalName
attribute of theGenre
element - one of the values ["agenda", "advisory", "COMMUNIQUÉ-BUSINESS-WIRE", "ephéméride"] for the
FormalName = Keyword
attribute of aProperty
element
@Teyssou What should we do with documents like:
-
http://asrael.eurecom.fr/news/787097d5-caee-34f2-81da-865c9b208ba0 ...
Genre = Program
etkeyword = Editor's Choice Advisory
? -
http://asrael.eurecom.fr/news/b9dbfb60-4b46-3a8a-89ae-d40b15deb0a3 ...
keyword = PRESS-RELEASE-BUSINESS-WIRE