Can XBMC or scrapers pre-process file-names before preforming lookups?

  • Hi,

    XBMC has trouble finding the films in either IMDB or other scappers because it gets confused by the file naming convention I am using.

    Let me explain : I name my files like this
    Le Fabuleux destin d'Amélie Poulain (Amelie of Montmartre) - Jean-Pierre Jeunet (2001).avi
    That's

    original name
    English name
    Director
    Year


    The thing is... when I send this info to a scrapper, it gets confused. Is there a way I can "help" the scrapper by telling it which information is which?

    Couldn't find anything in the Wiki.

    V.


  • Issue resolved!

    After I looked at the source based on your suggestion, I noticed that the content of $$1 is actually pre-processed a lot inside the code before it even reaches the scraper.

    So I started XBMC in debug mode, and found out that the query URL is actually printed, so I finally got it.

    In fact the file name "Le Fabuleux destin d'Amélie Poulain (Amelie of Montmartre) - Jean-Pierre Jeunet (2001).avi" is actually transformed to "Le Fabuleux destin d'Amélie Poulain (Amelie of Montmartre) Jean-Pierre Jeunet". Note that the in the middle becomes the . So once it is escaped, it becomes %20%20%20.

    Finally, I managed to get what I want with the following regular expression :
    ([^(]+)(%20%20%20%20%28)

    the %20%28 is here to strip out everything after the first round bracket, and in case the movie is already in English (like "The Rock - Michael Bay (1996).avi"I don't have this first bracket, so I use the %20%20%20 to get rid of the movie director name.

    V.


  • search query input is url encoded
    Thank you for yor reply, but I am not sure I understand what you mean.
    Do you mean I should replace ([^(]*) with (%5B%5E(%5D*)?

    Because according to Scraper.xml (http://www.xboxmediacenter.com/wiki/index.php?title=Scraper.xml) page, none of my characters above need encoding. Also, if I look at the reference imdb.xml (http://xbmc.svn.sourceforge.net/viewvc/xbmc/trunk/XBMC/system/scrapers/video/imdb.xml?view=markup) file from SVN, it does contains un-encoded regexps (but that is for other functions, not CreateSearchUrl).

    I'll try this tonight, but if it works, I'm not sure I'll understand why... :sad:

    V.


  • I mean... Is there any other solution than playing with the scrapper XML file (http://www.xboxmediacenter.com/wiki/index.php?title=How_To_Write_Media_Info_Scrapers)?

    V.


  • isn't this what the RegEx in Advanced Settings is for, or does that only apply to TV shows?
    Well, given the name of the property in AdvancedSettings.xml (http://www.xboxmediacenter.com/wiki/index.php?title=AdvancedSettings.xml), I doubt it...


    Contains regular expression to match the season and episode numbers in filenames.

    V.


  • Unfortunaltey, no luck :sad:

    Anybody has any idea?

    V.


  • isn't this what the RegEx in Advanced Settings is for, or does that only apply to TV shows?


  • there is a scraper development environment in tools/scrap

    however its broken (there's a binary which works) and we have been shouting at the author for months but he just do not want to respond :/

    otherwise, looking at the source itself is your best bet. this stuff would be taking place in CIMDB::GetURL()


  • INPUT, i.e. the contents of $$1


  • search query input is url encoded


  • Ok, I think I'm lost. I tried to modify the scrapper as follows:


    Original:






    Modified


    [^(-]*



    But that does not seem to help. I guess that the scrapper engine would take my file name "title (original title) - director (year).avi" and apply the above regex, which should clean it up into "title ", then send it to the search engine.

    Does not seem top work, though... :blush:

    Any idea?

    V.


  • INPUT, i.e. the contents of $$1
    Thank you for taking time to reply, spiff.

    Even if $$1 is URL encoded, the parenthesis still remains, so the file name "Le Fabuleux destin d'Amélie Poulain (Amelie of Montmartre) - Jean-Pierre Jeunet (2001).avi" becomes Le%20Fabuleux%20destin%20d'Am%E9lie%20Poulain%20(A melie%20of%20Montmartre)%20-%20Jean-Pierre%20Jeunet%20(2001).avi

    If I apply the ([^(]*) regex to this string, I still get Le%20Fabuleux%20destin%20d'Am%E9lie%20Poulain%20, which should really return the proper movie in the search (this would be the resulting URL : http://www.allocine.fr/recherche/?motcle=Le%20Fabuleux%20destin%20d'Am%E9lie%20Poul ain%20)

    By the way, is there any way to run the scraper engine in debug mode so that I can understand what it does and log the different values?

    Again, thank you for your patience.

    V.


  • OK, I've spent some time learning more about RegEx (never thought I'd need to someday), but now I am really confused as of why the below does not work, because it should really do what I want


    ([^(]*)



    ([^(]*) should match all the first characters until it finds the first '(', then return this group in the URL, then build the search string.

    Any idea anybody?...

    V.







  • #If you have any other info about this subject , Please add it free.#
    Your name:
    E-mail:
    Telphone:

    Your comments:


    If you have any other info about Can XBMC or scrapers pre-process file-names before preforming lookups? , Please add it free.