converter issueshttps://gitlab.eurecom.fr/asrael/converter/-/issues2021-01-15T18:04:02Zhttps://gitlab.eurecom.fr/asrael/converter/-/issues/8Document how documents are filtered out before being processed by the ASRAEL ...2021-01-15T18:04:02ZRaphael Troncyraphael.troncy@eurecom.frDocument how documents are filtered out before being processed by the ASRAEL workflowNot ALL XML documents should be kept in the search engine. So far, we have decided to **exclude** the documents that have :
* the value "Text Program" for the `FormalName` attribute of the `Genre` element
* one of the values ["agenda",...Not ALL XML documents should be kept in the search engine. So far, we have decided to **exclude** the documents that have :
* the value "Text Program" for the `FormalName` attribute of the `Genre` element
* one of the values ["agenda", "advisory", "COMMUNIQUÉ-BUSINESS-WIRE", "ephéméride"] for the `FormalName = Keyword` attribute of a `Property` element
@Teyssou What should we do with documents like:
* http://asrael.eurecom.fr/news/787097d5-caee-34f2-81da-865c9b208ba0 ... `Genre = Program` et `keyword = Editor's Choice Advisory` ?
* http://asrael.eurecom.fr/news/b9dbfb60-4b46-3a8a-89ae-d40b15deb0a3 ... `keyword = PRESS-RELEASE-BUSINESS-WIRE`Thibault Ehrhartthibault.ehrhart@eurecom.frThibault Ehrhartthibault.ehrhart@eurecom.frhttps://gitlab.eurecom.fr/asrael/converter/-/issues/7How is the date of the event being described?2018-09-21T17:19:47ZRaphael Troncyraphael.troncy@eurecom.frHow is the date of the event being described?Properties such as ```rnews:dateCreated```, ```rnews:dateModified``` and ```rnews:datePublished``` are used to indicate when the press release has been created, modified and published.
However, the search engine needs to know when the e...Properties such as ```rnews:dateCreated```, ```rnews:dateModified``` and ```rnews:datePublished``` are used to indicate when the press release has been created, modified and published.
However, the search engine needs to know when the event took place. What property is being used for this? How is this information recorded in the XML?Thibault Ehrhartthibault.ehrhart@eurecom.frThibault Ehrhartthibault.ehrhart@eurecom.frhttps://gitlab.eurecom.fr/asrael/converter/-/issues/6Converting the genre of AFP press release2018-09-14T09:52:27ZRaphael Troncyraphael.troncy@eurecom.frConverting the genre of AFP press releaseAt the moment, the ```<Genre>``` element is mapped to the ```rnews:genre``` property, taking the value to append to ```http://cv.iptc.org/newscodes/genre/```, potentially, creating non-existing IPTC genre. Examples of Genre values in AFP...At the moment, the ```<Genre>``` element is mapped to the ```rnews:genre``` property, taking the value to append to ```http://cv.iptc.org/newscodes/genre/```, potentially, creating non-existing IPTC genre. Examples of Genre values in AFP releases include: Lead, Update, 2ndLead ... how this should be processed?
Furthermore, an element ```<NewsLineType>``` is currently not mapped, with values such as ProductLine, AdvisoryLine ... again, should we handle them?Thibault Ehrhartthibault.ehrhart@eurecom.frThibault Ehrhartthibault.ehrhart@eurecom.frhttps://gitlab.eurecom.fr/asrael/converter/-/issues/5Exhaustive list of all elements not handled / mapped by the converter2018-09-13T08:51:42ZRaphael Troncyraphael.troncy@eurecom.frExhaustive list of all elements not handled / mapped by the converterWhat is the current list of ALL XML elements (attributes) not handled by the converter?What is the current list of ALL XML elements (attributes) not handled by the converter?Thibault Ehrhartthibault.ehrhart@eurecom.frThibault Ehrhartthibault.ehrhart@eurecom.frhttps://gitlab.eurecom.fr/asrael/converter/-/issues/1Convert AFP documents for the 2015-2018 period2018-09-12T13:49:45ZRaphael Troncyraphael.troncy@eurecom.frConvert AFP documents for the 2015-2018 periodThe tlp.limsi.fr ftp used to have problems but this has been resolved and we only lost "1 month" of data.
We should convert all press releases, both the EN and the FR ones, for the period 2015-2018 with an emphasis on 2015.The tlp.limsi.fr ftp used to have problems but this has been resolved and we only lost "1 month" of data.
We should convert all press releases, both the EN and the FR ones, for the period 2015-2018 with an emphasis on 2015.Thibault Ehrhartthibault.ehrhart@eurecom.frThibault Ehrhartthibault.ehrhart@eurecom.fr