Archived
1
0

Commit Graph

  • 4f2c046c9c rewrote parse_synonyms and new_synonym to use an internal dictionary structure RTB 2014-04-17 22:06:45 +02:00
  • 2e95d35283 modified parse_synonyms and new_synonym to include a Selector for future edits RTB 2014-04-17 21:30:53 +02:00
  • be63315ca2 regex Bas Vb 2014-04-16 17:01:35 +02:00
  • bd7fb38497 Merge branch 'feature/chemspider-parser' of code.giphouse.nl:giphouse/descartes-2 into feature/chemspider-parser Jip J. Dekker 2014-04-16 17:01:09 +02:00
  • 3aaed985fe Merge branch 'develop' into feature/chemspider-parser Jip J. Dekker 2014-04-16 17:00:59 +02:00
  • 873231439c Merge branch 'develop' into feature/Wikipedia Jip J. Dekker 2014-04-16 16:59:25 +02:00
  • d603e388e6 Merge branch 'hotfix/1-searchable' into develop Jip J. Dekker 2014-04-16 16:58:53 +02:00
  • 8cf307127a Merge branch 'hotfix/1-searchable' Jip J. Dekker 2014-04-16 16:57:43 +02:00
  • ab2a3fdc08 typo! Jip J. Dekker 2014-04-16 16:57:27 +02:00
  • f0d10902b5 Searchable can't be a list! Jip J. Dekker 2014-04-16 16:57:08 +02:00
  • efacc08a3d Merge branch 'develop' into feature/Wikipedia Jip J. Dekker 2014-04-16 16:49:03 +02:00
  • 6f82b117c9 new function to clean up the datapoints Bas Vb 2014-04-16 16:23:33 +02:00
  • 9a78e186bc chemspider parser now grabs data from ExtendedCompoundInfo() of chemspider API (no units) Rob tB 2014-04-16 16:22:47 +02:00
  • 74aa446f40 minor edits (comments etc.) Bas Vb 2014-04-16 15:27:36 +02:00
  • caf7d3df4e fixed ExtendedCompoundInfo url to have csid parameter instead of query Rob tB 2014-04-16 15:27:10 +02:00
  • 34c3a8b4d6 remove empty data points Bas Vb 2014-04-16 15:22:47 +02:00
  • 2d314aee6a created stub to parse ExtendedCompoundInfo from ChemSpider MassSpec API Rob tB 2014-04-16 15:21:33 +02:00
  • 7fc980befe chemspider should now only generate new Requests for wikipedia links from 'expert confirmed' synonyms Rob tB 2014-04-16 15:02:37 +02:00
  • ce3105f3c1 went to a general loop over all values, this way getting all elements from the Wikipedia infobox (except for those with a colspan, because these mess up) Bas Vb 2014-04-16 14:56:32 +02:00
  • 87282fc572 new properties in parse_properties now use dictionary syntax Rob tB 2014-04-16 14:26:27 +02:00
  • 93a6f098a9 log messages are now DEBUG instead of WARNING Rob tB 2014-04-16 13:28:59 +02:00
  • f1280dd66d get value not list from xpath Bas Vb 2014-04-16 13:23:50 +02:00
  • c1b5f810cb unused Result properties are now empty string instead of None Rob tB 2014-04-16 11:53:59 +02:00
  • 92a74de9e0 Added the include and exclude options. Jip J. Dekker 2014-04-16 11:17:48 +02:00
  • d99548e3b6 Added density, molar entropy and heat capacity Bas Vb 2014-04-16 11:14:02 +02:00
  • e0e64bd65a Implemented source exclusion Jip J. Dekker 2014-04-16 11:03:59 +02:00
  • d823c105e6 Implemented source inclusion Jip J. Dekker 2014-04-16 10:48:29 +02:00
  • d778050f36 Able to parse the weblinks to other databases, one example done Bas Vb 2014-04-16 10:37:57 +02:00
  • 7b57d86178 Removed redundant source loader Jip J. Dekker 2014-04-16 10:36:46 +02:00
  • 9dcb150356 Merge branch 'develop' into feature/chemspider-parser Jip J. Dekker 2014-04-16 10:24:52 +02:00
  • a06bf643f1 Made sourceloader a class and implemented the listing of all sources Jip J. Dekker 2014-04-16 10:14:29 +02:00
  • 8b7cfac2de Added an new command to the CLI, implementation will follow. Jip J. Dekker 2014-04-16 09:33:07 +02:00
  • cd1637b0fe Both Boiling point and melting point are now parsed from chemical Wikipedia pages, there's one error about different types of attributes in the Result-items, this needs to be fixed by cleaning up the retrieved data. Bas Vb 2014-04-16 00:50:50 +02:00
  • 1ca3593ae1 Parse is runnable now. Bas Vb 2014-04-16 00:35:19 +02:00
  • 6799a1a956 Merge branch 'release/v0.1.0' into develop 1-searchable Jip J. Dekker 2014-04-15 19:49:07 +02:00
  • 2d5e39de81 Merge branch 'release/v0.1.0' v0.1.0 Jip J. Dekker 2014-04-15 19:48:55 +02:00
  • 972e5da0d2 Removed debug code and typos. Jip J. Dekker 2014-04-15 19:48:27 +02:00
  • d770f79a7a Bumped version number Jip J. Dekker 2014-04-15 19:46:10 +02:00
  • 878d8e5efb Merge branch 'feature/CLI' into develop Jip J. Dekker 2014-04-15 19:44:41 +02:00
  • 61ca2520e3 Added feed export functionality Jip J. Dekker 2014-04-15 19:40:54 +02:00
  • e65d3a6898 Added the options for the Feed exports Jip J. Dekker 2014-04-15 18:57:51 +02:00
  • 8e46762a9e fix: if no experimental data, return predicted acd/labs data instead of None RTB 2014-04-15 18:56:38 +02:00
  • ffb3861034 Search for single compound, filename should be lowercase Jip J. Dekker 2014-04-15 18:49:30 +02:00
  • 91ed053ac5 Stopped log from interfering with STDOUT Jip J. Dekker 2014-04-15 18:17:35 +02:00
  • a4dd6e1835 Made logging work Jip J. Dekker 2014-04-14 21:31:20 +02:00
  • 2ad33080c6 First setup of the CLI, decided on a structure Jip J. Dekker 2014-04-14 20:45:07 +02:00
  • ee01e697d3 Added Docopt as an CLI framework Jip J. Dekker 2014-04-14 20:21:41 +02:00
  • ff0eb309da ChemSpider parser now handles the Predicted - ACD/Labs tab for scraping properties RTB 2014-04-14 17:27:02 +02:00
  • 2ae3ac9c51 added parse_properties to scrape the Experimental Physico-chemical Properties table if it exists RTB 2014-04-14 13:09:14 +02:00
  • 31a63829f8 chemspider parser now makes new synonym requests with the scraped synonyms RTB 2014-04-14 01:23:15 +02:00
  • e95df8eaa3 ignore_list now contains the intended names instead of Result objects RTB 2014-04-14 01:20:24 +02:00
  • 564dbc3292 added ignore list to new_compound_request for synonyms found by chemspider parser RTB 2014-04-14 00:33:25 +02:00
  • b1b969a16c corrected usage of __spider variable RTB 2014-04-14 00:28:47 +02:00
  • 0ad98905e3 added scraping for wikipedia links in synonym tab RTB 2014-04-13 23:35:25 +02:00
  • 5565c28a1e moved parsing of synonyms to 'parse_synonyms' function RTB 2014-04-13 23:14:23 +02:00
  • 859a18c61a added parsing of synonyms RTB 2014-04-12 22:27:28 +02:00
  • 22fa67735d added parse_searchrequest function RTB 2014-04-12 19:41:36 +02:00
  • 246463b450 simplified debug output, WARNING label should be temporary RTB 2014-04-12 19:19:56 +02:00
  • 423cb90a6a Merge branch 'develop' into feature/chemspider-parser RTB 2014-04-12 19:13:02 +02:00
  • 0e3ef9a792 hardcoded ChemSpider API token into ChemSpider.py RTB 2014-04-08 16:14:47 +02:00
  • f9799c30d8 Parse is runnable now. Bas Vb 2014-04-08 14:59:09 +02:00
  • a4dc8c8711 corrected Chemspider parser to be a subclass of Parser RTB 2014-04-08 13:10:02 +02:00
  • 0da286c907 created basic structure of ChemSpider search parser RTB 2014-04-08 12:08:45 +02:00
  • e10ac12d04 Merge branch 'develop' into feature/Wikipedia Jip J. Dekker 2014-04-08 11:45:23 +02:00
  • debbc5e62a Merge branch 'hotfix/none-requests' into develop Jip J. Dekker 2014-04-08 11:44:42 +02:00
  • 199fa5419e Merge branch 'hotfix/none-requests' Jip J. Dekker 2014-04-08 11:44:26 +02:00
  • 622dd4ad00 Small fix to ensure unique classes and load all parsers Jip J. Dekker 2014-04-08 11:43:32 +02:00
  • da17a149c0 Spider is now able to handle none-request from parsers while handling new compounds Jip J. Dekker 2014-04-08 11:42:43 +02:00
  • 4b0c4acf96 Updated the wikipedia parser as an rightful subclass of Parser Jip J. Dekker 2014-04-08 11:40:30 +02:00
  • f3807c3018 Fixed the errors, but still not able to run/test the parse() function Bas Vb 2014-04-06 20:28:03 +02:00
  • add4a13a4d Trying to make a start with the WikipediaParser, but I can't find out with the Scrapy website (or another way) what the structure of the file should be, and how I can test/run the crawling on a page. Bas Vb 2014-04-06 18:02:09 +02:00
  • 81a93c44bb added author Nout van Deijck 2014-04-03 12:19:17 +02:00
  • 60c409da3d New file and branch for the Wikipedia parser Bas Vb 2014-04-03 12:05:06 +02:00
  • b4ff4a3c3b New file and branch for the Wikipedia parser Bas Vb 2014-04-03 12:00:27 +02:00
  • 3a074467e6 Merge branch 'hotfix/No_TABs' into develop Jip J. Dekker 2014-04-02 14:22:13 +02:00
  • 9805bb5adb Merge branch 'hotfix/No_TABs' Jip J. Dekker 2014-04-02 14:21:34 +02:00
  • f6981057df Changed everything to spaces Jip J. Dekker 2014-04-02 14:20:05 +02:00
  • 595f0253e2 Merge branch 'release/v0.0.1' into develop Jip J. Dekker 2014-04-01 21:44:31 +02:00
  • 254e8db3aa Merge branch 'release/v0.0.1' v0.0.1 Jip J. Dekker 2014-04-01 21:44:08 +02:00
  • c9e09f8ab9 Added an version message Jip J. Dekker 2014-04-01 21:42:54 +02:00
  • 2e8017c590 Merge branch 'feature/parsing-scheme' into develop Jip J. Dekker 2014-04-01 21:40:26 +02:00
  • 7bc160f676 The spider is now able to start using the synonym generator Jip J. Dekker 2014-04-01 21:38:11 +02:00
  • cd421cc2fb Replaced literal for testing with a variable fix. Jip J. Dekker 2014-04-01 21:24:04 +02:00
  • 0bf2d102c6 Fixed parser importation, so it doesn't import imported classes. Jip J. Dekker 2014-04-01 21:21:30 +02:00
  • 683f8c09d4 Quick fix, python errors Jip J. Dekker 2014-04-01 21:12:54 +02:00
  • f93dc2d160 Added an structure to get requests for all websites for a new synonym Jip J. Dekker 2014-04-01 21:07:36 +02:00
  • e39ed3b681 Added a way for parsers to access the spider. Jip J. Dekker 2014-04-01 20:56:32 +02:00
  • 4d9e5307bf Written an loader for all parsers in the parser directory. Jip J. Dekker 2014-03-31 00:48:45 +02:00
  • 0cc1b23353 Added the functionality to add parsers and automatically use them. Jip J. Dekker 2014-03-30 23:37:42 +02:00
  • 6e2df64fe4 Merge branch 'hotfix/spider-import-error' into develop Jip J. Dekker 2014-03-30 23:08:14 +02:00
  • a6d3d4a716 Merge branch 'hotfix/spider-import-error' spider-import-error Jip J. Dekker 2014-03-30 23:07:52 +02:00
  • 14c27458fc Fixed an import error Jip J. Dekker 2014-03-30 23:07:28 +02:00
  • e0556bbf16 Merge branch 'release/basic-scraper-structure' basic-scraper-structure Jip J. Dekker 2014-03-30 22:16:13 +02:00
  • e210ce8558 Merge branch 'develop', remote-tracking branch 'origin/develop' into develop Jip J. Dekker 2014-03-30 22:08:21 +02:00
  • 6bbee865c4 Merge branch 'feature/basic-structure' into develop Jip J. Dekker 2014-03-28 14:46:43 +01:00
  • 1e730e77ce Merge branch 'feature/basic-structure' of code.giphouse.nl:giphouse/descartes-2 into feature/basic-structure Jip J. Dekker 2014-03-28 14:44:29 +01:00
  • 32cedecf2e Added an basic parser class to extend, next step implementing the global function Jip J. Dekker 2014-03-28 14:43:22 +01:00
  • 325febe834 Added an basic parser class to extend, next step implementing the global function Jip J. Dekker 2014-03-28 14:43:22 +01:00
  • d91706d6e5 The script should stop sometime, added a stopping signal Jip J. Dekker 2014-03-28 14:14:39 +01:00
  • 87d1041517 Made all Python files PEP-8 Compatible Jip J. Dekker 2014-03-28 14:11:36 +01:00