Archived
1
0

71 Commits

Author SHA1 Message Date
Bas Vb
cd1637b0fe Both Boiling point and melting point are now parsed from chemical Wikipedia pages, there's one error about different types of attributes in the Result-items, this needs to be fixed by cleaning up the retrieved data. 2014-04-16 00:50:50 +02:00
Bas Vb
1ca3593ae1 Parse is runnable now. 2014-04-16 00:35:19 +02:00
Jip J. Dekker
91ed053ac5 Stopped log from interfering with STDOUT 2014-04-15 18:17:35 +02:00
Bas Vb
f9799c30d8 Parse is runnable now. 2014-04-08 14:59:09 +02:00
Jip J. Dekker
e10ac12d04 Merge branch 'develop' into feature/Wikipedia 2014-04-08 11:45:23 +02:00
Jip J. Dekker
debbc5e62a Merge branch 'hotfix/none-requests' into develop 2014-04-08 11:44:42 +02:00
Jip J. Dekker
622dd4ad00 Small fix to ensure unique classes and load all parsers 2014-04-08 11:43:32 +02:00
Jip J. Dekker
da17a149c0 Spider is now able to handle none-request from parsers while handling new
compounds
2014-04-08 11:42:43 +02:00
Jip J. Dekker
4b0c4acf96 Updated the wikipedia parser as an rightful subclass of Parser 2014-04-08 11:40:30 +02:00
Bas Vb
f3807c3018 Fixed the errors, but still not able to run/test the parse() function 2014-04-06 20:28:03 +02:00
Bas Vb
add4a13a4d Trying to make a start with the WikipediaParser, but I can't find out with the Scrapy website (or another way) what the structure of the file should be, and how I can test/run the crawling on a page. 2014-04-06 18:02:09 +02:00
Nout van Deijck
81a93c44bb added author 2014-04-03 12:19:17 +02:00
Bas Vb
60c409da3d New file and branch for the Wikipedia parser 2014-04-03 12:05:06 +02:00
Bas Vb
b4ff4a3c3b New file and branch for the Wikipedia parser 2014-04-03 12:00:27 +02:00
Jip J. Dekker
3a074467e6 Merge branch 'hotfix/No_TABs' into develop 2014-04-02 14:22:13 +02:00
Jip J. Dekker
9805bb5adb Merge branch 'hotfix/No_TABs' 2014-04-02 14:21:34 +02:00
Jip J. Dekker
f6981057df Changed everything to spaces 2014-04-02 14:20:05 +02:00
Jip J. Dekker
595f0253e2 Merge branch 'release/v0.0.1' into develop 2014-04-01 21:44:31 +02:00
Jip J. Dekker
254e8db3aa Merge branch 'release/v0.0.1' v0.0.1 2014-04-01 21:44:08 +02:00
Jip J. Dekker
c9e09f8ab9 Added an version message 2014-04-01 21:42:54 +02:00
Jip J. Dekker
2e8017c590 Merge branch 'feature/parsing-scheme' into develop 2014-04-01 21:40:26 +02:00
Jip J. Dekker
7bc160f676 The spider is now able to start using the synonym generator 2014-04-01 21:38:11 +02:00
Jip J. Dekker
cd421cc2fb Replaced literal for testing with a variable fix. 2014-04-01 21:24:04 +02:00
Jip J. Dekker
0bf2d102c6 Fixed parser importation, so it doesn't import imported classes. 2014-04-01 21:21:30 +02:00
Jip J. Dekker
683f8c09d4 Quick fix, python errors 2014-04-01 21:12:54 +02:00
Jip J. Dekker
f93dc2d160 Added an structure to get requests for all websites for a new synonym 2014-04-01 21:07:36 +02:00
Jip J. Dekker
e39ed3b681 Added a way for parsers to access the spider. 2014-04-01 20:56:32 +02:00
Jip J. Dekker
4d9e5307bf Written an loader for all parsers in the parser directory. 2014-03-31 00:48:45 +02:00
Jip J. Dekker
0cc1b23353 Added the functionality to add parsers and automatically use them. 2014-03-30 23:37:42 +02:00
Jip J. Dekker
6e2df64fe4 Merge branch 'hotfix/spider-import-error' into develop 2014-03-30 23:08:14 +02:00
Jip J. Dekker
a6d3d4a716 Merge branch 'hotfix/spider-import-error' spider-import-error 2014-03-30 23:07:52 +02:00
Jip J. Dekker
14c27458fc Fixed an import error 2014-03-30 23:07:28 +02:00
Jip J. Dekker
e0556bbf16 Merge branch 'release/basic-scraper-structure' basic-scraper-structure 2014-03-30 22:16:13 +02:00
Jip J. Dekker
e210ce8558 Merge branch 'develop', remote-tracking branch 'origin/develop' into develop 2014-03-30 22:08:21 +02:00
Jip J. Dekker
6bbee865c4 Merge branch 'feature/basic-structure' into develop 2014-03-28 14:46:43 +01:00
Jip J. Dekker
1e730e77ce Merge branch 'feature/basic-structure' of code.giphouse.nl:giphouse/descartes-2 into feature/basic-structure 2014-03-28 14:44:29 +01:00
Jip J. Dekker
32cedecf2e Added an basic parser class to extend, next step implementing the global function 2014-03-28 14:44:17 +01:00
Jip J. Dekker
325febe834 Added an basic parser class to extend, next step implementing the global function 2014-03-28 14:43:22 +01:00
Jip J. Dekker
d91706d6e5 The script should stop sometime, added a stopping signal 2014-03-28 14:14:39 +01:00
Jip J. Dekker
87d1041517 Made all Python files PEP-8 Compatible 2014-03-28 14:11:36 +01:00
Jip J. Dekker
5b17627504 The parsers however could use their own folder 2014-03-27 13:23:03 +01:00
Jip J. Dekker
8e9314e753 One spider should have it's own folder 2014-03-27 13:18:55 +01:00
Jip J. Dekker
bdcf359da7 Logical fixes to have some "working" case 2014-03-27 13:12:27 +01:00
Jip J. Dekker
8175e02f6c New Structure, splitting on parsers instead of Spiders 2014-03-27 13:08:46 +01:00
Jip J. Dekker
306a37db1a A better structure which is able to start multiple spiders. 2014-03-22 15:48:08 +01:00
Jip J. Dekker
aa65bbd459 Merge branch 'feature/basic-structure' into develop 2014-03-18 18:10:03 +01:00
Jip J. Dekker
847b4f201b Merge branch 'feature/basic-structure' of code.giphouse.nl:giphouse/descartes-2 into feature/basic-structure 2014-03-18 18:07:38 +01:00
Jip J. Dekker
328cb3808c Unix machine should be able to execute this without any problems. 2014-03-18 18:07:00 +01:00
Jip J. Dekker
826937e25e Unix machine should be able to execute this without any problems. 2014-03-18 18:05:44 +01:00
Jip J. Dekker
7355de1b20 Added an simple script to run a spider 2014-03-18 18:03:22 +01:00