From: Erick T. <ida...@us...> - 2007-05-15 05:57:33
|
I noticed that the code for markdown.py isn't consistent in how it does spaces. I've tried to normalize it to the python coding standard, from here: http://www.python.org/dev/peps/pep-0008/ I've also made the objects subclass from object, if that's alright. This also assumes that my previous patch has been applied, so if you don't want the text preprocessors, we'll have to edit this patch. I uploaded the patch here, since it's kind of big: http://sourceforge.net/tracker/index.php?func=detail&aid=1719072&group_id=153041&atid=790200 -e |
From: Yuri T. <qar...@gm...> - 2007-05-15 13:02:45
|
Thanks for this patch. About the preprocessors: did you actually get a noticeable performance improvement with this? If so, I will be happy to put it in. - yuri On 5/15/07, Erick Tryzelaar <ida...@us...> wrote: > I noticed that the code for markdown.py isn't consistent in how it does > spaces. I've tried to normalize it to the python coding standard, from here: > > http://www.python.org/dev/peps/pep-0008/ > > I've also made the objects subclass from object, if that's alright. This > also assumes that my previous patch has been applied, so if you don't > want the text preprocessors, we'll have to edit this patch. > > I uploaded the patch here, since it's kind of big: > > http://sourceforge.net/tracker/index.php?func=detail&aid=1719072&group_id=153041&atid=790200 > > -e > > ------------------------------------------------------------------------- > This SF.net email is sponsored by DB2 Express > Download DB2 Express C - the FREE version of DB2 express and take > control of your XML. No limits. Just data. Click to get it now. > http://sourceforge.net/powerbar/db2/ > _______________________________________________ > Python-markdown-discuss mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/python-markdown-discuss > -- Yuri Takhteyev UC Berkeley School of Information http://www.freewisdom.org/ |
From: Erick T. <ida...@us...> - 2007-05-15 17:08:21
|
With the whole text preprocessor, I can use re.finditer to find all the matches in a string, instead of having to test a regex against each line and maintain state between lines, so it can be a little easier to use. I haven't done too many performance tests, but on a large string, it ought to be faster since the string searching should remain in the c kernel. -e Yuri Takhteyev wrote: > Thanks for this patch. About the preprocessors: did you actually get > a noticeable performance improvement with this? If so, I will be > happy to put it in. > > - yuri > > On 5/15/07, Erick Tryzelaar <ida...@us...> wrote: >> I noticed that the code for markdown.py isn't consistent in how it does >> spaces. I've tried to normalize it to the python coding standard, >> from here: >> >> http://www.python.org/dev/peps/pep-0008/ >> >> I've also made the objects subclass from object, if that's alright. This >> also assumes that my previous patch has been applied, so if you don't >> want the text preprocessors, we'll have to edit this patch. >> >> I uploaded the patch here, since it's kind of big: >> >> http://sourceforge.net/tracker/index.php?func=detail&aid=1719072&group_id=153041&atid=790200 >> >> >> -e >> >> ------------------------------------------------------------------------- >> >> This SF.net email is sponsored by DB2 Express >> Download DB2 Express C - the FREE version of DB2 express and take >> control of your XML. No limits. Just data. Click to get it now. >> http://sourceforge.net/powerbar/db2/ >> _______________________________________________ >> Python-markdown-discuss mailing list >> Pyt...@li... >> https://lists.sourceforge.net/lists/listinfo/python-markdown-discuss >> > > |
From: Yuri T. <qar...@gm...> - 2007-05-15 19:08:02
Attachments:
markdown-test.html
|
Yes, but how much performance do you gain compared to doing a "\n".join, storing the string, doing whatever you want to it, then returning s.split("\n")? I mean it as a serious empirical question. If it makes a substantial difference, it would be worth making the API a bit more complicated. If it gains something like 1% in performance, I am not so it makes sense to introduce a new type of post processor. I applied your patch and then changed HTML_PREPROCESSOR to work as a "text" preprocessor, but this gave me no performance improvement on the "markdown-test". I am attaching the output of test-markdown.py with repeat set to 100 times (i.e., 10x the default value. (The gray numbers are values for pre-patch version.) I do understand that this may not be a fair test. Can you send me one that shows more of a difference. test-markdown.py by default runs all the files in a directory without any extensions. However, if the directory name starts with "ext-x-" then whatever follows "-x-" is taken as a "-"-delimited llist of extensions. So, if you write an extention "foo" which uses text preprocessors, the test cases this extension should go under "ext-x-foo" (The reason there are no "ext-x" test directories in SVN is that I started making them, discovered that the wikilinks extension is broken in the new version and haven't had time to fix it since March.) - yuri On 5/15/07, Erick Tryzelaar <ida...@us...> wrote: > With the whole text preprocessor, I can use re.finditer to find all the > matches in a string, instead of having to test a regex against each line > and maintain state between lines, so it can be a little easier to use. I > haven't done too many performance tests, but on a large string, it ought > to be faster since the string searching should remain in the c kernel. > > -e > > > Yuri Takhteyev wrote: > > Thanks for this patch. About the preprocessors: did you actually get > > a noticeable performance improvement with this? If so, I will be > > happy to put it in. > > > > - yuri > > > > On 5/15/07, Erick Tryzelaar <ida...@us...> wrote: > >> I noticed that the code for markdown.py isn't consistent in how it does > >> spaces. I've tried to normalize it to the python coding standard, > >> from here: > >> > >> http://www.python.org/dev/peps/pep-0008/ > >> > >> I've also made the objects subclass from object, if that's alright. This > >> also assumes that my previous patch has been applied, so if you don't > >> want the text preprocessors, we'll have to edit this patch. > >> > >> I uploaded the patch here, since it's kind of big: > >> > >> http://sourceforge.net/tracker/index.php?func=detail&aid=1719072&group_id=153041&atid=790200 > >> > >> > >> -e > >> > >> ------------------------------------------------------------------------- > >> > >> This SF.net email is sponsored by DB2 Express > >> Download DB2 Express C - the FREE version of DB2 express and take > >> control of your XML. No limits. Just data. Click to get it now. > >> http://sourceforge.net/powerbar/db2/ > >> _______________________________________________ > >> Python-markdown-discuss mailing list > >> Pyt...@li... > >> https://lists.sourceforge.net/lists/listinfo/python-markdown-discuss > >> > > > > > > -- Yuri Takhteyev UC Berkeley School of Information http://www.freewisdom.org/ |
From: Erick T. <ida...@us...> - 2007-05-15 21:17:56
|
Yuri Takhteyev wrote: > Yes, but how much performance do you gain compared to doing a > "\n".join, storing the string, doing whatever you want to it, then > returning s.split("\n")? > > I mean it as a serious empirical question. If it makes a substantial > difference, it would be worth making the API a bit more complicated. > If it gains something like 1% in performance, I am not so it makes > sense to introduce a new type of post processor. You're right, I don't see any performance benefits. I do find it semantically easier to work directly with regexes, but I guess I can do that in my extensions. |
From: Yuri T. <qar...@gm...> - 2007-05-15 22:25:00
|
Actually, how about this: We can add two subclasses of Preprocessor: TextPreprocessor and LinePreprocessor. (For now LinePreprocessor would behave as Preprocessor but we can deprecate it later). Each will have a get_input_type() method which would return "lines" or "text" which will signify what input it expects. Either would be allowed to return a list of lines _or_ text. Markdown will check type and do the conversion if adjacent preprocessors want different formats. So, you would be able to do this: class FooPreprocessor (TextPreprocessor) : def run(self, text) return foo(text) # foo could return a single string _or_ a list of lines class BarPreprocessor (LinesPreprocessor) : def run(self, lines) return bar(lines) # bar could return a single string _or_ a list of lines You could then insert the BarPreprocessor and the FooPreprocessor into the queue in any order. If you put Foo after Bar and Foo.run returns a string, it will be split into lines before being fed to Bar.run. If Foo.run returns a list, it will be fed into Bar.run as is. - yuri On 5/15/07, Erick Tryzelaar <ida...@us...> wrote: > Yuri Takhteyev wrote: > > Yes, but how much performance do you gain compared to doing a > > "\n".join, storing the string, doing whatever you want to it, then > > returning s.split("\n")? > > > > I mean it as a serious empirical question. If it makes a substantial > > difference, it would be worth making the API a bit more complicated. > > If it gains something like 1% in performance, I am not so it makes > > sense to introduce a new type of post processor. > > You're right, I don't see any performance benefits. I do find it > semantically easier to work directly with regexes, but I guess I can do > that in my extensions. > -- Yuri Takhteyev UC Berkeley School of Information http://www.freewisdom.org/ |