230 Commits

Author SHA1 Message Date
carlos.mesquita
895aaa1b33 Merged develop into feature/training-content 2024-09-22 22:27:02 +00:00
Carlos Mesquita
aa1433e9ea UUID wasn't being converted to string, before it used the firebase id and when transitioning to mongo this bug was introduced 2024-09-22 23:25:54 +01:00
carlos.mesquita
8eb5fb6d5f Merged master into feature/training-content 2024-09-08 20:47:50 +00:00
Carlos Mesquita
c004d9c83c Pydantic was causing validation errors when passportID was an int 2024-09-08 21:47:02 +01:00
carlos.mesquita
66abc42abb Merged in feature/training-content (pull request #29)
And this is why llm code shouldn't be copy pasted blindly

Approved-by: Tiago Ribeiro
2024-09-08 08:46:06 +00:00
Carlos Mesquita
2b59119eca And this is why llm code shouldn't be copy pasted blindly 2024-09-08 02:29:56 +01:00
Tiago Ribeiro
b9a35281ec Merge branch 'master' into develop 2024-09-08 00:59:33 +01:00
carlos.mesquita
2bbc1f456d Merged in feature/training-content (pull request #28)
Forgot to str() on a uuid

Approved-by: Tiago Ribeiro
2024-09-07 23:48:39 +00:00
Carlos Mesquita
e8ec862f86 Merge remote-tracking branch 'origin/master' into feature/training-content 2024-09-08 00:39:00 +01:00
Carlos Mesquita
8d4584b8b7 Forgot to str() on a uuid 2024-09-08 00:38:35 +01:00
carlos.mesquita
7a0424aa33 Merged in feature/training-content (pull request #27)
Feature/training content

Approved-by: Tiago Ribeiro
2024-09-07 22:10:55 +00:00
Carlos Mesquita
24ce198dfd Forgot to change the tips script to mongo 2024-09-07 23:09:00 +01:00
Carlos Mesquita
81911e635c Merge remote-tracking branch 'origin/master' into feature/training-content 2024-09-07 23:04:20 +01:00
Carlos Mesquita
849db06760 Merge branch 'feature/training-content' of https://bitbucket.org/ecropdev/ielts-be into feature/training-content 2024-09-07 23:04:18 +01:00
Carlos Mesquita
6a38164f9b Merge remote-tracking branch 'origin/master' into feature/training-content 2024-09-07 23:03:25 +01:00
Tiago Ribeiro
8ae9b64f1a Merged in release/mongodb-migration (pull request #26)
Release/mongodb migration
2024-09-07 21:54:25 +00:00
Tiago Ribeiro
676f660f3e Merged master into release/mongodb-migration 2024-09-07 21:54:00 +00:00
carlos.mesquita
ddf050d692 Merged in feature/training-content (pull request #25)
ENCOA-69 Pathways 2 Reading and Writing Tips

Approved-by: Tiago Ribeiro
2024-09-07 21:50:21 +00:00
Carlos Mesquita
6cb7c07f57 Firestore to Mongodb 2024-09-07 19:14:40 +01:00
carlos.mesquita
8c60f4596f Merged master into feature/training-content 2024-09-07 10:43:53 +00:00
Carlos Mesquita
cd11fa38ae Pathways 2 Reading and Writing Tips 2024-09-07 11:42:31 +01:00
carlos.mesquita
a328f01d2e Merged in feature/level-file-upload (pull request #24)
Added missing fillBlanks mc variant that was in UTAS to custom level

Approved-by: Tiago Ribeiro
2024-09-06 08:52:42 +00:00
Carlos Mesquita
a931c5ec2e Added missing fillBlanks mc variant that was in UTAS to custom level 2024-09-06 09:36:24 +01:00
carlos.mesquita
bfc9565e85 Merged in develop (pull request #23)
Develop

Approved-by: Tiago Ribeiro
2024-09-05 11:29:08 +00:00
carlos.mesquita
3d70bcbfd1 Merged in feature/level-file-upload (pull request #22)
Feature/level file upload

Approved-by: Tiago Ribeiro
2024-09-05 10:51:26 +00:00
carlos.mesquita
a2cfa335d7 Merged develop into feature/level-file-upload 2024-09-05 10:48:22 +00:00
Carlos Mesquita
0427d6e1b4 Deleted google creds ENV from Dockerfile since those will be supplied by cloud run 2024-09-05 11:47:34 +01:00
Carlos Mesquita
31c6ed570a Merge remote-tracking branch 'origin/bug/create-default-groups-if-not-already' into feature/level-file-upload 2024-09-05 11:43:11 +01:00
Carlos Mesquita
3a27c42a69 Removed .env, will add it to gitignore in next commit 2024-09-05 11:41:56 +01:00
Tiago Ribeiro
260dba1ee6 Merged in bug/create-default-groups-if-not-already (pull request #21)
Updated the code to create the Students/Teachers group if it does not exist yet
2024-09-05 10:11:16 +00:00
Tiago Ribeiro
a88d6bb568 Updated the code to create the Students/Teachers group if it does not exist yet 2024-09-05 10:56:58 +01:00
carlos.mesquita
f0f904f2e4 Merged in feature/level-file-upload (pull request #20)
Feature/level file upload

Approved-by: Tiago Ribeiro
2024-09-04 16:14:20 +00:00
Carlos Mesquita
a23bbe581a Merge branch 'feature/level-file-upload' of https://bitbucket.org/ecropdev/ielts-be into feature/level-file-upload 2024-09-04 17:10:16 +01:00
Carlos Mesquita
bb26282d25 Forgot to change this, should not affect, but still 2024-09-04 17:09:51 +01:00
carlos.mesquita
73c29cda25 Merged master into feature/level-file-upload 2024-09-04 16:07:48 +00:00
carlos.mesquita
aaa3361575 Merged master into feature/level-file-upload 2024-09-04 16:01:12 +00:00
Carlos Mesquita
94a16b636d Merge branch 'feature/level-file-upload' of https://bitbucket.org/ecropdev/ielts-be into feature/level-file-upload 2024-09-04 17:00:03 +01:00
Carlos Mesquita
cffec795a7 Swapped .env vars 2024-09-04 16:59:47 +01:00
carlos.mesquita
b2b4dfb74e Merged in feature/level-file-upload (pull request #18)
Switched cli token to GOOGLE_APPLICATION_CREDENTIALS
2024-09-04 11:00:22 +00:00
carlos.mesquita
2716f52a0a Merged develop into feature/level-file-upload 2024-09-04 10:57:11 +00:00
Carlos Mesquita
4099d99f80 Merge branch 'feature/level-file-upload' of https://bitbucket.org/ecropdev/ielts-be into feature/level-file-upload 2024-09-04 11:56:18 +01:00
Carlos Mesquita
ab4db36445 Switched cli token to GOOGLE_APPLICATION_CREDENTIALS 2024-09-04 11:55:58 +01:00
Tiago Ribeiro
59f047afba Merge branch 'develop' 2024-09-03 22:12:23 +01:00
carlos.mesquita
09b57cb346 Merged in feature/level-file-upload (pull request #17)
Upload batches of users onto firebase

Approved-by: Tiago Ribeiro
2024-09-03 20:43:40 +00:00
carlos.mesquita
bfc3e3f083 Merged develop into feature/level-file-upload 2024-09-03 19:27:52 +00:00
Carlos Mesquita
7b5e10fd79 Upload batches of users onto firebase 2024-09-03 20:09:19 +01:00
Tiago Ribeiro
a2a160f61b Merged in develop (pull request #16)
Develop
2024-09-02 13:12:04 +00:00
carlos.mesquita
5d5cd21e1e Merged in feature/level-file-upload (pull request #15)
ENCOA-94: Added user to training content docs, added support for shuffles, tweaked training prompt

Approved-by: Tiago Ribeiro
2024-08-27 21:43:26 +00:00
Carlos Mesquita
06a8384f42 Forgot to remove comment, already tested it in a container 2024-08-26 20:15:03 +01:00
Carlos Mesquita
dd74a3d259 Removed unused latext packages, texlive already includes the needed packages for level upload 2024-08-26 20:14:22 +01:00
Carlos Mesquita
efff0b904e ENCOA-94: Added user to training content docs, added support for shuffles, tweaked training prompt 2024-08-26 18:14:57 +01:00
carlos.mesquita
cf7a966141 Merged in feature/training-content (pull request #14)
Feature/training content
2024-08-19 15:57:09 +00:00
Carlos Mesquita
03f5b7d72c Upload level exam without hooking up to firestore and running in thread, will do this when I have the edit view done 2024-08-17 09:29:58 +01:00
Cristiano Ferreira
d68617f33b Add regular ielts modules to custom level. 2024-08-15 13:58:07 +01:00
Carlos Mesquita
eeaa04f856 Added suport for speaking exercises in training content 2024-08-07 10:19:56 +01:00
Cristiano Ferreira
beccf8b501 Change model on speaking 2 grading to 4o. 2024-08-06 20:28:56 +01:00
Cristiano Ferreira
470f4cc83b Minor speaking improvements. 2024-08-05 21:57:42 +01:00
Carlos Mesquita
3ad411ed71 Forgot to remove some debugging lines 2024-08-05 21:47:17 +01:00
Carlos Mesquita
7144a3f3ca Supports now 1 exam multiple exercises, and level exercises 2024-08-05 21:41:49 +01:00
carlos.mesquita
b795a3fb79 Merged in feature/training-content (pull request #13)
Feature/training content

Approved-by: Tiago Ribeiro
2024-08-03 09:49:22 +00:00
Carlos Mesquita
034be25e8e Added created_at and score to training docs 2024-08-01 20:49:22 +01:00
Carlos Mesquita
a931f06c47 Forgot to add __name__ in getLogger() don't know if it is harmless grabbing the root logger, added __name__ just to be safe 2024-07-31 15:03:00 +01:00
Carlos Mesquita
8e56a3228b Finished training content backend 2024-07-31 14:56:33 +01:00
Cristiano Ferreira
14c5914420 Add default text size blank space custom level. 2024-07-30 22:40:26 +01:00
Tiago Ribeiro
6878e0a276 Added the ability to send the ID for the listening 2024-07-30 22:34:31 +01:00
Cristiano Ferreira
1f29ac6ee5 Fix id on custom level. 2024-07-30 19:53:17 +01:00
Cristiano Ferreira
a1ee7e47da Can now generate lots of mc in level custom. 2024-07-28 14:33:08 +01:00
Cristiano Ferreira
adfc027458 Add excerpts to reading 3. 2024-07-26 23:46:46 +01:00
Cristiano Ferreira
3a7bb7764f Writing improvements. 2024-07-26 23:33:42 +01:00
Cristiano Ferreira
19f204d74d Add default for topic on custom level and random reorder for multiple choice options. 2024-07-26 15:59:11 +01:00
carlos.mesquita
88ba9ab561 Merged in feature/ai-detection (pull request #12)
Feature/ai detection

Approved-by: Tiago Ribeiro
2024-07-25 21:02:57 +00:00
Carlos Mesquita
34afb5d1e8 Logging when GPT's Zero response != 200 2024-07-25 17:11:14 +01:00
Carlos Mesquita
eb904f836a Forgot to change the .env 2024-07-25 17:01:09 +01:00
Carlos Mesquita
ca12ad1161 Used main as base branch in the last time 2024-07-25 16:55:42 +01:00
Cristiano Ferreira
8b8460517c Merged in level-utas-custom-tests (pull request #11)
Add endpoint for custom level exams.
2024-07-24 19:00:13 +00:00
Cristiano Ferreira
9be9bfce0e Add endpoint for custom level exams. 2024-07-24 19:58:53 +01:00
Cristiano Ferreira
4776f24229 Fix speaking grading overall. 2024-07-23 13:22:52 +01:00
Cristiano Ferreira
bf9251eebb Fix array index out of bounds. 2024-07-22 15:29:01 +01:00
Cristiano Ferreira
1ecda04c6b Fix array index out of bounds. 2024-07-22 14:54:01 +01:00
Cristiano Ferreira
d5621c1793 Added new ideaMatch exercise type. 2024-07-18 23:22:23 +01:00
Cristiano Ferreira
4c41942dfe Added new ideaMatch exercise type. 2024-07-18 23:21:24 +01:00
Cristiano Ferreira
bef606fe14 Added new ideaMatch exercise type. 2024-07-18 23:20:06 +01:00
Cristiano Ferreira
358f240d16 Update reading fill the blanks. 2024-07-18 19:07:38 +01:00
Cristiano Ferreira
e7d84b9704 Fix paragraph match bug. 2024-07-16 23:38:35 +01:00
Cristiano Ferreira
b4dc6be927 Add comment to grading of writing. 2024-07-16 21:35:36 +01:00
Cristiano Ferreira
afca610c09 Fix level test generation. 2024-07-15 18:21:06 +01:00
Tiago Ribeiro
495502bc93 Merge branch 'develop' of bitbucket.org:ecropdev/ielts-be into develop 2024-07-09 12:11:46 +01:00
Cristiano Ferreira
565874ad41 Minor improvements to speaking. 2024-06-28 18:33:42 +01:00
Cristiano Ferreira
e693f5ee2a Make speaking 1 questions simple. 2024-06-27 22:48:42 +01:00
Cristiano Ferreira
a8b46160d4 Minor fixes to speaking. 2024-06-27 22:31:57 +01:00
Cristiano Ferreira
640039d372 Merged in listening-revamp (pull request #10)
Listening revamp
2024-06-27 21:13:29 +00:00
Cristiano Ferreira
a3cd1cdf59 Listening part 3 and 4. 2024-06-27 22:03:59 +01:00
Cristiano Ferreira
9a696bbeb5 Listening part 2. 2024-06-27 21:29:22 +01:00
Cristiano Ferreira
2adb7d1847 Listening part 1. 2024-06-25 20:49:27 +01:00
Cristiano Ferreira
b93ead3a7b Update speaking generation endpoints. 2024-06-25 20:47:49 +01:00
Cristiano Ferreira
ad3a32ce45 Merged in speaking-improvements (pull request #9)
Speaking improvements
2024-06-17 13:06:15 +00:00
Cristiano Ferreira
ee5f23b3d7 Update speaking 3 to have 5 questions. 2024-06-17 14:03:21 +01:00
Cristiano Ferreira
545aee1a19 Improve prompts and add suffix to speaking 2. 2024-06-17 14:03:21 +01:00
Cristiano Ferreira
3f749f1ff5 Update speaking 1 to be like interactive with 5 questions and 2 topics. 2024-06-17 14:03:21 +01:00
Cristiano Ferreira
32ac2149f5 Improve comments for each criteria in speaking grading. 2024-06-17 14:03:21 +01:00
Cristiano Ferreira
64cc207fe8 Add comment for each criteria in speaking grading. 2024-06-17 14:03:21 +01:00
Cristiano Ferreira
a4caecdb4f Merged in utas-stuff (pull request #8)
Utas stuff
2024-06-13 17:32:48 +00:00
Cristiano Ferreira
20dfd5be78 Add exercises for utas level. 2024-06-13 18:30:58 +01:00
Cristiano Ferreira
1d110d5fa9 Add exercises for utas level. 2024-06-13 18:24:42 +01:00
Cristiano Ferreira
7633822916 Add exercises for utas level. 2024-06-12 23:10:55 +01:00
Cristiano Ferreira
9bc06d8340 Start on level exam for utas. 2024-06-11 22:07:09 +01:00
Cristiano Ferreira
4ff3b02a1d Double check for english words in writing grading. 2024-06-11 21:49:27 +01:00
Cristiano Ferreira
7637322239 Double check for english words in writing grading. 2024-06-11 21:45:56 +01:00
Cristiano Ferreira
3676d7ad39 Fix check for blacklisted on free form answers. 2024-06-10 19:39:08 +01:00
Cristiano Ferreira
b7c18517de All tested except grading speaking. 2024-05-22 21:07:48 +01:00
Cristiano Ferreira
fe753fe72c Fix generating speaking task 2. 2024-05-20 18:40:29 +01:00
Cristiano Ferreira
a0a193844d Speaking on api latest version. 2024-05-20 15:24:05 +01:00
Cristiano Ferreira
9654d9ff64 Reformat code. 2024-05-20 14:40:09 +01:00
Cristiano Ferreira
e568aff4e4 Initial updates to most recent openai api version. 2024-05-20 14:33:05 +01:00
Cristiano Ferreira
070e8808b1 Add logging to speaking grading. 2024-05-19 15:42:31 +01:00
Cristiano Ferreira
c77f7178ae Add logging to speaking grading. 2024-05-19 15:38:57 +01:00
Cristiano Ferreira
5f7fe23afd Fix writing overall to avoid grades that don't make sense. 2024-05-13 14:45:44 +01:00
Tiago Ribeiro
ca93129082 (Mostly for prompting a new build) 2024-05-12 12:29:29 +01:00
Cristiano Ferreira
6e2355ee4c Clean up the code. 2024-04-10 22:21:30 +01:00
Cristiano Ferreira
f1d2ec3bf8 Fix repeated voices in listening 2024-03-26 23:35:26 +00:00
Tiago Ribeiro
08f05ac3e0 Oops 2024-03-25 00:46:38 +00:00
Tiago Ribeiro
8ec72ff539 Updated the matchSentences to work correctly 2024-03-25 00:46:27 +00:00
Cristiano Ferreira
373867d520 minor improvement to reading generation 2024-03-24 23:54:00 +00:00
Pedro Fonseca
894cabdeb0 bullet points section aware 2024-03-24 23:44:12 +00:00
Pedro Fonseca
94c2b5a052 Adding bullet points to grading_summary endpoint 2024-03-24 23:21:38 +00:00
Cristiano Ferreira
3aa33f10b4 Check if answer as enough words 2024-03-24 16:00:21 +00:00
Pedro Fonseca
cc3371c597 Updated has_10_words to has_50_words 2024-03-24 10:37:02 +00:00
Cristiano Ferreira
7049fd86d4 Improve speaking grading 2024-03-24 01:33:21 +00:00
Cristiano Ferreira
6aba83f3bb Fix reading exercise with more than 3 words. 2024-03-24 00:42:11 +00:00
Tiago Ribeiro
73532d5fed Merge branch 'master' of bitbucket.org:ecropdev/ielts-be 2024-03-21 10:50:51 +00:00
Tiago Ribeiro
f02d113f40 Updated the Firebase Storage Bucket to be from ENV Variables 2024-03-21 10:50:48 +00:00
Cristiano Ferreira
6e65732e94 Add paragraphMatch. 2024-03-19 23:05:55 +00:00
Tiago Ribeiro
bed07ca819 Added the Service Account for the Staging environment 2024-03-19 22:38:35 +00:00
Cristiano Ferreira
274bd79c6a Add prompts to video generation and final message in listening audios. 2024-03-19 19:38:09 +00:00
Cristiano Ferreira
8b83a4163d Remove multiple choice questions from reading. 2024-03-18 21:34:25 +00:00
Cristiano Ferreira
1bd012d340 Fix level exam generation 2024-02-17 15:40:19 +00:00
Cristiano Ferreira
a200b29dba Add topic choice for writing and speaking 2024-02-12 18:54:58 +00:00
Cristiano Ferreira
f3f9415665 Add a few more black listed words. 2024-02-09 21:00:12 +00:00
cristiano.ferreira
d4694e55bf Add difficulty settings for generation 2024-02-09 18:59:00 +00:00
Cristiano Ferreira
b46f6011d3 Change speaking to receive avatar from frontend. 2024-02-09 00:13:07 +00:00
Cristiano Ferreira
d532f7deb4 Filter topics and words on exercises. 2024-02-08 23:42:02 +00:00
Tiago Ribeiro
9149e4b197 Added word boundaries 2024-02-05 22:48:59 +00:00
Tiago Ribeiro
3a1dc33e1b Added the start_id to the build_write_blanks_text_form 2024-02-05 12:19:33 +00:00
Tiago Ribeiro
678ef4b6c0 Did the same thing elsewhere 2024-02-05 11:11:44 +00:00
Tiago Ribeiro
fcf2993de0 Solved a problem with the generation of listening 2024-02-05 11:08:40 +00:00
Cristiano Ferreira
45a4dbe018 Verify for duplicate exercises in level exam generation. 2024-02-04 22:37:57 +00:00
Tiago Ribeiro
81d7167cbf Revert "Updated it to use GPT-4"
This reverts commit 1c888f22e2.
2024-02-04 01:24:35 +00:00
Tiago Ribeiro
1c888f22e2 Updated it to use GPT-4 2024-02-04 01:03:26 +00:00
Cristiano Ferreira
7bbb03e4b2 Merge remote-tracking branch 'origin/master' 2024-02-03 15:59:09 +00:00
Cristiano Ferreira
97f30ea881 Verify for duplicate exercises in level exam generation. 2024-02-03 15:58:51 +00:00
Tiago Ribeiro
ad2e7a6322 Updated the .env to the new file 2024-01-25 22:53:42 +00:00
Cristiano Ferreira
bc2cedb821 Improve grading to be more strict and give 0 if the question is not addressed. 2024-01-23 23:23:17 +00:00
Cristiano Ferreira
64a4759fbc Improve correction to not add anything to the answer. 2024-01-23 22:22:21 +00:00
Tiago Ribeiro
54950e11d2 Updated the prompts to only add if there are already 2024-01-23 18:42:14 +00:00
Tiago Ribeiro
ac7ba2edfa Added two more endpoints for the Speaking generation 2024-01-23 17:32:15 +00:00
Tiago Ribeiro
a577eed013 Updated the Listening template to allow for a dynamic amount of parts 2024-01-23 11:09:55 +00:00
Cristiano Ferreira
6c03e3590c Add mini test compatibility. 2024-01-22 17:10:22 +00:00
Cristiano Ferreira
1591f8d9fb Improve speaking corrections to return fixed_text. 2024-01-17 16:37:59 +00:00
Cristiano Ferreira
92c92dfd98 Improve writing spellcheck to return fixed_text. 2024-01-17 16:11:29 +00:00
Pedro Fonseca
ccc606d5de Improving Speaking Grading Performance 2024-01-16 23:03:10 +00:00
Cristiano Ferreira
d8da4d0348 Improve generated level tests quality. 2024-01-15 22:53:42 +00:00
Cristiano Ferreira
de4042efac Add corrections for speaking. 2024-01-12 19:45:58 +00:00
Pedro Fonseca
5aedd1864d Merge branch 'master' of https://bitbucket.org/ecropdev/ielts-be 2024-01-12 17:40:32 +00:00
Pedro Fonseca
555d5e55b0 fixed the response of the level test grading summary 2024-01-12 16:44:34 +00:00
Cristiano Ferreira
61f876b3e4 Improve spellchecking for writing 2024-01-11 19:10:56 +00:00
Tiago Ribeiro
a40ce04ad2 requirements.txt edited online with Bitbucket 2024-01-08 10:12:55 +00:00
Pedro Fonseca
6baf669216 Merged in grading-summary (pull request #7)
Grading Summary Endpoint Logic

Approved-by: Tiago Ribeiro
Approved-by: Cristiano Ferreira
2024-01-07 22:47:00 +00:00
Cristiano Ferreira
e7a96c6880 Reformat. 2024-01-07 19:38:24 +00:00
Cristiano Ferreira
75df686cd1 Refactored grading summary to fit previous existing files. 2024-01-07 19:36:57 +00:00
Pedro Fonseca
046606a8ec updated collection with new endpoint 2024-01-06 19:03:44 +00:00
Pedro Fonseca
efef92343a comment 2024-01-06 19:01:31 +00:00
Pedro Fonseca
ac27239787 Calculate Grading Summary Logic 2024-01-06 18:46:29 +00:00
Pedro Fonseca
f2e8497756 Added a playground 2024-01-06 16:07:46 +00:00
Cristiano Ferreira
63823a01de Add misspelled pairs to writing grading. 2024-01-03 17:40:48 +00:00
Cristiano Ferreira
9b3997f65e Fix speaking by using heygen api v2. 2024-01-03 16:23:26 +00:00
Cristiano Ferreira
479620116d Replace logging with app.logger. 2023-12-12 23:02:00 +00:00
Cristiano Ferreira
2b91cfe26d Replace prints with proper logs. 2023-12-12 22:20:22 +00:00
Cristiano Ferreira
9f4aed52ae Save speaking asynchronously 2023-12-10 15:38:26 +00:00
Cristiano Ferreira
50c39e5f9c Add perfect answers to speaking 2023-12-05 21:43:19 +00:00
cristiano.ferreira
57d6e7ffde Merge remote-tracking branch 'origin/master' 2023-11-30 17:39:38 +00:00
cristiano.ferreira
171d72109e Add perfect answer to writing grading 2023-11-30 17:38:44 +00:00
Tiago Ribeiro
34154b1e5f Another try 2023-11-29 15:44:39 +00:00
Tiago Ribeiro
760fe27411 Revert "Made it so the save_to_db also returns the ID of the document"
This reverts commit 4a0ae88fed.
2023-11-29 15:23:30 +00:00
Tiago Ribeiro
4a0ae88fed Made it so the save_to_db also returns the ID of the document 2023-11-29 14:55:40 +00:00
Tiago Ribeiro
869e74f384 app.py edited online with Bitbucket 2023-11-29 14:21:11 +00:00
cristiano.ferreira
22de63c346 Fix save listening 2023-11-29 11:36:05 +00:00
Cristiano Ferreira
05202c5cf0 Merged in actually-save (pull request #6)
Actually save questions.
2023-11-28 11:11:42 +00:00
Cristiano Ferreira
70e442a97e Actually save questions. 2023-11-24 22:59:11 +00:00
Cristiano Ferreira
362c9f4737 Merged in save-questions (pull request #5)
Add save endpoints but dont't actually save.
2023-11-24 22:52:20 +00:00
Cristiano Ferreira
0bcf362b3f Add save endpoints but dont't actually save. 2023-11-24 22:50:13 +00:00
Cristiano Ferreira
73324909f6 Make level exam generation more consistent. 2023-11-23 22:59:31 +00:00
Cristiano Ferreira
0684314cef Add generate level exam endpoint. 2023-11-22 23:09:41 +00:00
Cristiano Ferreira
223a7dfd11 Update multiple choice questions generation. 2023-11-20 22:51:29 +00:00
Tiago Ribeiro
75985a4077 Changed the OpenAI's version to a specific one 2023-11-15 00:01:57 +00:00
Tiago Ribeiro
d1b8793885 Added one more import 2023-11-14 23:17:25 +00:00
Tiago Ribeiro
589909cd3c Added a missing dependency 2023-11-14 22:55:07 +00:00
Tiago Ribeiro
d6a008b353 Added a simple healthcheck endpoint 2023-11-14 16:32:05 +00:00
Cristiano Ferreira
695d9b589a Generate questions endpoints working for all. 2023-11-12 23:40:24 +00:00
Cristiano Ferreira
274252bf92 Endpoint generate reading kinda working. 2023-10-19 23:39:45 +01:00
Cristiano Ferreira
c3957403f6 Add new speaking version. 2023-10-10 11:45:58 +01:00
Tiago Ribeiro
6416582ee0 Updated the format of the WriteBlanks 2023-09-28 15:31:42 +01:00
Cristiano Ferreira
d6b75de856 Add new writing question on generate questions 2023-09-27 22:24:55 +01:00
Cristiano Ferreira
51085619ee Update questions to new formula 2023-09-27 20:23:41 +01:00
Cristiano Ferreira
e7eb7c96ba Update video generation to generate from template. 2023-09-25 22:37:55 +01:00
Cristiano Ferreira
035939e9a7 update postman audio. 2023-09-16 15:50:25 +01:00
Cristiano Ferreira
8d9cd2949c Add speaking task 3 grading endpoint. 2023-09-16 11:44:08 +01:00
Cristiano Ferreira
f77fafa864 Improve speaking grading. 2023-09-15 00:01:52 +01:00
Cristiano Ferreira
8f9b65281e Add endpoints to save questions to db. 2023-09-06 22:53:09 +01:00
Cristiano Ferreira
680ec00885 Fix mistake on grade speaking task 1. 2023-09-06 22:19:42 +01:00
Cristiano Ferreira
c275cb887d Add verification for words in writing grading. 2023-09-05 21:18:42 +01:00
Cristiano Ferreira
00375489e8 Add verification for words in speaking grading. 2023-09-05 21:07:44 +01:00
Cristiano Ferreira
8e043104ad Add verification for words in speaking grading. 2023-09-05 20:31:30 +01:00
Cristiano Ferreira
eb6e9b4ef7 Merged in ft-cf-3-speaking-videos (pull request #4)
Add script to create videos for speaking questions.
2023-09-05 13:35:20 +00:00
Cristiano Ferreira
64776617f2 Add script to create videos for speaking questions. 2023-09-03 18:05:13 +01:00
Pedro Fonseca
685fde0b77 Merged in feature/fetch-tips (pull request #3)
Added endpoint for /fetch_tips

Approved-by: Cristiano Ferreira
2023-09-03 15:20:14 +00:00
Pedro Fonseca
cfff3ee6dd Small bugfix to not call OpenAI twice and File Reformat 2023-09-03 15:13:41 +01:00
Pedro Fonseca
fcd7483fd9 Added endpoint for /fetch_tips 2023-09-03 11:38:12 +01:00
Cristiano Ferreira
a31489d850 Fix grading parse. 2023-08-31 09:55:04 +01:00
Cristiano Ferreira
ca6094c3e7 Add question db insert. 2023-08-24 21:28:59 +01:00
Tiago Ribeiro
18bf6d59e0 Added FFMPEG to Dockerfile 2023-07-14 10:24:26 +01:00
Tiago Ribeiro
df30edd45e Updated the requirements.txt to the correct whisper package 2023-07-13 16:27:19 +01:00
Tiago Ribeiro
7d08cd9608 Renamed the folder because of a mistake 2023-07-13 16:20:04 +01:00
Tiago Ribeiro
a67e7a4abc Added a file just to create this folder in the repo 2023-07-13 15:51:20 +01:00
Cristiano Ferreira
0a53f2c1b8 Add Speaking parts. 2023-06-29 22:44:03 +01:00
Cristiano Ferreira
0b661fe108 Merged in ft-cf-2-add-writing-task-1-grading (pull request #2)
Add writing task 1 grading.
2023-06-29 21:06:31 +00:00
Cristiano Ferreira
7a1dbb76de Add writing task 1 grading. 2023-06-29 22:06:34 +01:00
Cristiano Ferreira
a784400568 Merged in ft-cf-1-add-speaking-endpoints (pull request #1)
Add speaking endpoints and clean code.
2023-06-29 20:40:12 +00:00
Cristiano Ferreira
55ae1b28c7 Update to always use role user as role system is useless pos. 2023-06-26 22:50:40 +01:00
Cristiano Ferreira
f0b85fa500 Update postman collection with speaking endpoint requests. 2023-06-23 00:09:12 +01:00
Cristiano Ferreira
4e1ad6dc67 Add speaking endpoints and clean code. 2023-06-23 00:05:48 +01:00
80 changed files with 44810 additions and 638 deletions

View File

@@ -5,3 +5,4 @@ README.md
*.pyd
__pycache__
.pytest_cache
/scripts

3
.env
View File

@@ -1,3 +0,0 @@
OPENAI_API_KEY=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
JWT_SECRET_KEY=6e9c124ba92e8814719dcb0f21200c8aa4d0f119a994ac5e06eb90a366c83ab2
JWT_TEST_TOKEN=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJ0ZXN0In0.Emrs2D3BmMP4b3zMjw0fJTPeyMwWEBDbxx2vvaWguO0

6
.gitignore vendored
View File

@@ -1,2 +1,6 @@
__pycache__
.idea
.idea
.env
.DS_Store
/firebase-configs/test_firebase.json
/scripts

8
.idea/.gitignore generated vendored
View File

@@ -1,8 +0,0 @@
# Default ignored files
/shelf/
/workspace.xml
# Editor-based HTTP Client requests
/httpRequests/
# Datasource local storage ignored files
/dataSources/
/dataSources.local.xml

20
.idea/ielts-be.iml generated
View File

@@ -1,21 +1,17 @@
<?xml version="1.0" encoding="UTF-8"?>
<module type="PYTHON_MODULE" version="4">
<component name="Flask">
<option name="enabled" value="true" />
</component>
<component name="NewModuleRootManager">
<content url="file://$MODULE_DIR$">
<excludeFolder url="file://$MODULE_DIR$/venv" />
<excludeFolder url="file://$MODULE_DIR$/.venv" />
</content>
<orderEntry type="jdk" jdkName="Python 3.9" jdkType="Python SDK" />
<orderEntry type="jdk" jdkName="Python 3.11 (ielts-be)" jdkType="Python SDK" />
<orderEntry type="sourceFolder" forTests="false" />
</component>
<component name="TemplatesService">
<option name="TEMPLATE_CONFIGURATION" value="Jinja2" />
<option name="TEMPLATE_FOLDERS">
<list>
<option value="$MODULE_DIR$/../flaskProject\templates" />
</list>
</option>
<component name="PackageRequirementsSettings">
<option name="versionSpecifier" value="Don't specify version" />
</component>
<component name="PyDocumentationSettings">
<option name="format" value="GOOGLE" />
<option name="myDocStringFormat" value="Google" />
</component>
</module>

8
.idea/misc.xml generated
View File

@@ -1,4 +1,10 @@
<?xml version="1.0" encoding="UTF-8"?>
<project version="4">
<component name="ProjectRootManager" version="2" project-jdk-name="Python 3.9" project-jdk-type="Python SDK" />
<component name="Black">
<option name="sdkName" value="Python 3.11 (ielts-be)" />
</component>
<component name="ProjectRootManager" version="2" project-jdk-name="Python 3.11 (ielts-be)" project-jdk-type="Python SDK" />
<component name="PyCharmProfessionalAdvertiser">
<option name="shown" value="true" />
</component>
</project>

2
.idea/vcs.xml generated
View File

@@ -1,6 +1,6 @@
<?xml version="1.0" encoding="UTF-8"?>
<project version="4">
<component name="VcsDirectoryMappings">
<mapping directory="$PROJECT_DIR$" vcs="Git" />
<mapping directory="" vcs="Git" />
</component>
</project>

View File

@@ -11,6 +11,24 @@ ENV APP_HOME /app
WORKDIR $APP_HOME
COPY . ./
RUN apt update && apt install -y \
ffmpeg \
poppler-utils \
texlive-latex-base \
texlive-fonts-recommended \
texlive-latex-extra \
texlive-xetex \
pandoc \
librsvg2-bin \
curl \
&& rm -rf /var/lib/apt/lists/*
RUN curl -sL https://deb.nodesource.com/setup_20.x | bash - \
&& apt-get install -y nodejs
RUN npm install -g firebase-tools
# Install production dependencies.
RUN pip install --no-cache-dir -r requirements.txt

1805
app.py

File diff suppressed because it is too large Load Diff

Binary file not shown.

View File

@@ -0,0 +1 @@
THIS FILE ONLY EXISTS TO KEEP THIS FOLDER IN THE REPO

Binary file not shown.

Binary file not shown.

View File

@@ -0,0 +1 @@
THIS FILE ONLY EXISTS TO KEEP THIS FOLDER IN THE REPO

View File

@@ -0,0 +1 @@
THIS FILE ONLY EXISTS TO KEEP THIS FOLDER IN THE REPO

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

BIN
faiss/tips_metadata.pkl Normal file

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

View File

@@ -0,0 +1,13 @@
{
"type": "service_account",
"project_id": "encoach-staging",
"private_key_id": "5718a649419776df9637589f8696a258a6a70f6c",
"private_key": "-----BEGIN PRIVATE KEY-----\nMIIEvQIBADANBgkqhkiG9w0BAQEFAASCBKcwggSjAgEAAoIBAQC2C6Es2gY8lLvH\ndVilNtRNm9glSaPXMNw2PzZZbSGuG1uGPFaCzlq1lOb2u17YfMG4GriKIMjIQKXF\nqdvxA8CAmAFRuDjUGmpbO/X1ZW7amOs5Bjed2BYmL01dEqzzwwh7rEfNDjeghRPx\n1uKzH8A6TLT5xq+74I5K1CIgiljBpZimsERu2SDawjkdtZfA7qoylA46Nq66LuwQ\nVyv9CK2SZNpBcT3sunCmRsrCzmSTzKdbcqRPdqUKgZOH/Rjp0sw9VuUgwoxdGZV3\n5SJjObo5ceZ1OSiJm7GwLzp7uq16sqycgSYwppNLI5OtzOfSuWbGD4+a044t2Mlq\n9PHXv7H/AgMBAAECggEAAfhKlFwq8MaL6PggRJq9HbaKgQ4fcOmCmy8AQmPNF1UM\nyVKSKGndjxUfPLCWsaaunUnjZlHoKkvndKXxDyttuVaBE9EiWEqNjRLZ3KpuJ9Jm\nH+CtLbmUCnISQb1n1AlvvZAwhLZbLBL/PhYyWiLapybZAdJAaOWLVKGgBD8gVRQW\nJFCqnszX1O2YlpWHutb979R4qoY/XAf94gyMkTpXZwuETvFqZbau2vxRZ8qARix3\nmic881PwiF6Cod8UPCS9yMK+Q+Se6SomwXU9PCmlummn9xmQBAxYy8gIAVs/J9Fg\n5SvhnImAPDd+zIzzw2cHCiruNWIhroMVZDZJgWdY1QKBgQDjTKKeFOur3ijJJL2/\nWg1SE2jLP0GpXzM5YMx6jdOCNDCzugPngRucRXiTkJ2FnUgyMcQyi6hyrbWXN/6z\nXhx5fwLB4tnTcqOMvNfcay5mDk3RW9ZZJxayB54Sf1Nm/4xiDBnGPT+iHQvK+/pT\nwScWznFkmk60E796o76OLn3PEwKBgQDNCC2uPq+uOcCopIO8HH88gqdxTvpbeHUU\nrdJOmr1VtGNuvay/mfpva9+VEtGbZTFzjhfvfCEIjpj3Llh8Flb9EYa6BmscBiyp\ngszEeFuB3zHndlSCZPnGJ7JiRAdPAEgG3Gl/r9th6PDaEMq0MFS5i7GGhPBIRYCG\nUtmY5eVy5QKBgH5Nuls/YsnJFD7ZNLscziQadvPhvZnhNbSfjmBXaP2EBMAKEFtX\nCcGndN4C0RVLFbAWqWAw7LR0xGA4FEcVd5snsZ+Nb98oZ6sv0H9B67F4J1O7xXsa\n1mitBPBgYjbsr9RXxwa6SB7MJx5vMGXUAeWRZ78wY6V7B76dOKkHOo+TAoGBAJf5\nBOsPueZZFm2qK58GPGVcrsI0+StNuPLP+H+dANQC9mTCIMaQWmm2Oq5jmYwmUKZH\nX4R6rH2MPOOSrbGkWWwRTpyaX1ARX49xzVefoqw8BOB8/Bz+vYjcKcPeitBK9Bhp\nzaUAc4s6PzRTl/xBirtRSQ/df8ECC0cFKBbF6PHlAoGAGqnlpo+k8vAtg6ulCuGu\nx2Y/c5UmvXGHk60pccnW3UtENSDnl99OgMfBz8/qLAMWs6DUQ/kvSlHQPmMBHRWZ\nNTr6ceGXyNs4KdYoj1K7AU3c0Lm0wyQ2giQMoOOUQAm98Xr8z5aiihj10hHPmzzL\n9kwpOmZpjNmC/ERD69imWhY=\n-----END PRIVATE KEY-----\n",
"client_email": "firebase-adminsdk-8rs9e@encoach-staging.iam.gserviceaccount.com",
"client_id": "108221424237414412378",
"auth_uri": "https://accounts.google.com/o/oauth2/auth",
"token_uri": "https://oauth2.googleapis.com/token",
"auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
"client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/firebase-adminsdk-8rs9e%40encoach-staging.iam.gserviceaccount.com",
"universe_domain": "googleapis.com"
}

View File

@@ -0,0 +1,13 @@
{
"type": "service_account",
"project_id": "mti-ielts",
"private_key_id": "626a2dcf60916a1b5011f388495b8f9c4fc065ef",
"private_key": "-----BEGIN PRIVATE KEY-----\nMIIEvgIBADANBgkqhkiG9w0BAQEFAASCBKgwggSkAgEAAoIBAQDuaLgLNa5yb5LI\nPZYa7qav0URgCF7miK3dUXIBoABQ+U6y1LwdsIiJqHZ4Cm2lotTqeTGOIV83PuA6\n9H/TwnvsHH8jilmsPxO5OX7AyZSDPvN45nJrgQ21RKZCYQGVetBMGhclCRbYFraS\nE6X/p6gSOpSqZ5fLz8BbdCMfib6HSfDmBkYTK42X6d2eNNwLM1wLbE8RmCGwRATC\nQFfMhjlvQcSJ1EDMfkMUUE9U/ux77wfHqs1d+7utVcQTIMFAP9fo1ynJlwp8D1HQ\ntalB6kkpuDQetUR0A1FHMMJekhmuRDUMfokX1F9JfUjR0OetuD3KEH5y2asxC2+0\n8JYcwbvlAgMBAAECggEAKaaW3LJ8rxZp/NyxkDP4YAf9248q0Ti4s00qzzjeRUdA\n5gI/eSphuDb7t34O6NyZOPuCWlPfOB4ee35CpMK59qaF2bYuc2azseznBZRSA1no\nnEsaW0i5Fd2P9FHRPoWtxVXbjEdZu9e//qY7Hn5yYPjmBx1BCkTZ1MBl8HkWlbjR\nbu18uveg5Vg6Wc+rnPmH/gMRLLpq9iQBpzXWT8Mj+k48O8GnW6v8S3R027ymqUou\n3W5b69xDGn0nwxgLIVzdxjoo7RnpjD3mP0x4faiBhScVgFhwZP8hqBeVyqbV5dMh\nfF+p9zLOeilFLJEjH1lZbZAb8wwP23LozIXJWFG3oQKBgQD6COCJ7hNSx9/AzDhO\nh73hKH/KSOJtxHc8795hcZjy9HJkoM45Fm7o2QGZzsZmV+N6VU0BjoDQAyftCq+G\ndIX0wcAGJIsLuQ9K00WI2hn7Uq1gjUl0d9XEorogKa1ZNTLL/9By/xnA7sEpI6Ng\nIsKQ4R2CfqNFU4bs1nyKWCWudQKBgQD0GNYwZt3xV2YBATVYsrvg1OGO/tmkCJ8Y\nLOdM0L+8WMCgw0uQcNFF9uqq6/oFgq7tOvpeZDsY8onRy55saaMT+Lr4xs0sj5B0\ns5Hqc0L37tdXXXXEne8WABMBF9injNgNbAm9W0kqME2Stc53OJQPj2DBdYxWSr8v\n36imCwoJsQKBgH0BBSlQQo7naKFeOGRijvbLpZ//clzIlYh8r+Rtw7brqWlPz+pQ\noeB95cP80coG9K6LiPVXRmU4vrRO3FRPW01ztEod6PpSaifRmnkB+W1h91ZHLMsy\nwkgNxxofXBA2fY/p9FAZ48lGVIH51EtS9Y0zTuqX347gZJtx3E/aI/SlAoGBAJer\nCwM+F2+K352GM7BuNiDoBVLFdVPf64Ko+/sVxdzwxJffYQdZoh634m3bfBmKbsiG\nmeSmoLXKlenefAxewu544SwM0pV6isaIgQTNI3JMXE8ziiZl/5WK7EQEniDVebU1\nSQP4QYjORJUBFE2twQm+C9+I+27uuMa1UOQC/fSxAoGBANuWloacqGfws6nbHvqF\nLZKlkKNPI/0sC+6VlqjoHn5LQz3lcFM1+iKSQIGJvJyru2ODgv2Lmq2W+cx+HMeq\n0BSetK4XtalmO9YflH7uMgvOEVewf4uJ2d+4I1pbY9aI1gHaZ1EUiiy6Ds4kAK8s\nTQqp88pfTbOnkdJBVi0AWs5B\n-----END PRIVATE KEY-----\n",
"client_email": "firebase-adminsdk-dyg6p@mti-ielts.iam.gserviceaccount.com",
"client_id": "104980563453519094431",
"auth_uri": "https://accounts.google.com/o/oauth2/auth",
"token_uri": "https://oauth2.googleapis.com/token",
"auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
"client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/firebase-adminsdk-dyg6p%40mti-ielts.iam.gserviceaccount.com",
"universe_domain": "googleapis.com"
}

View File

@@ -0,0 +1,13 @@
{
"type": "service_account",
"project_id": "storied-phalanx-349916",
"private_key_id": "c9e05f6fe413b1031a71f981160075ff4b044444",
"private_key": "-----BEGIN PRIVATE KEY-----\nMIIEvgIBADANBgkqhkiG9w0BAQEFAASCBKgwggSkAgEAAoIBAQDdgavFB63nMHyb\n38ncwijTrUmqU9UyzNJ8wlZCWAWuoz25Gng988fkKNDXnHY+ap9esHyNYg9IdSA7\nAuZeHpzTZmKiWZzFWq61KWSTgIn1JwKHGHJJdmVhTYfCe9I51cFLa5q2lTFzJ0ce\nbP7/X/7kw53odgva+M8AhDTbe60akpemgZc+LFwO0Abm7erH2HiNyjoNZzNw525L\n933PCaQwhZan04s1u0oRdVlBIBwMk+J0ojgVEpUiJOzF7gkN+UpDXujalLYdlR4q\nhkGgScXQhDYJkECC3GuvOnEo1YXGNjW9D73S6sSH+Lvqta4wW1+sTn0kB6goiQBI\n7cA1G6x3AgMBAAECggEAZPMwAX/adb7XS4LWUNH8IVyccg/63kgSteErxtiu3kRv\nYOj7W+C6fPVNGLap/RBCybjNSvIh3PfkVICh1MtG1eGXmj4VAKyvaskOmVq/hQbe\nVAuEKo7W7V2UPcKIsOsGSQUlYYjlHIIOG4O5Q1HQrRmp4cPK62Txkl6uaEkZPz4u\nbvIK2BJI8aHRwxE3Phw09blwlLqQQQ8nrhK29x5puaN+ft++IlzIOVsLz+n4kTdB\n6qkG/dhenn3K8o3+NkmSN6eNRbdJd36zXTo4Oatbvqb7r0E8vYn/3Llawo2X75zn\nec7jMHrOmcwtiu9H3PsrTWtzdSjxPHy0UtEn1HWK4QKBgQD+c/V8tAvbaUGVoZf6\ntKtDSKF6IHuY2vUO33v950mVdjrTursqOG2d+SLfSnKpc+sjDlj7/S5u4uRP+qUN\ng1rb2U7oIA7tsDa2ZTSkIx6HkPUzS+fBOxELLrbgMoJ2RLzgkiPhS95YgXJ/rYG5\nWQTehzCT5roes0RvtgM0gl3EhQKBgQDe2m7PRIU4g3RJ8HTx92B4ja8W9FVCYDG5\nPOAdZB8WB6Bvu4BJHBDLr8vDi930pKj+vYObRqBDQuILW4t8wZQJ834dnoq6EpUz\nhbVEURVBP4A/nEHrQHfq0Lp+cxThy2rw7obRQOLPETtC7p3WFgSHT6PRTcpGzCCX\n+76a30yrywKBgC/5JNtyBppDaf4QDVtTHMb+tpMT9LmI7pLzR6lDJfhr5gNtPURk\nhyY1hoGaw6t3E2n0lopL3alCVdFObDfz//lbKylQggAGLQqOYjJf/K2KgvA862Df\nBgOZtxjl7PrnUsT0SJd9elotbazsxXxwcB6UVnBMG+MV4V0+b7RCr/MRAoGBAIfp\nTcVIs7roqOZjKN9dEE/VkR/9uXW2tvyS/NfP9Ql5c0ZRYwazgCbJOwsyZRZLyek6\naWYsp5b91mA435QhdwiuoI6t30tmA+qdNBTLIpxdfvjMcoNoGPpzfBmcU/L1HW58\n+mnqGalRiAPlBQvI99ASKQWAXMnaulIWrYNEhj0LAoGBALi+QZ2pp+hDeC59ezWr\nbP1zbbONceHKGgJcevChP2k1OJyIOIqmBYeTuM4cPc5ofZYQNaMC31cs8SVeSRX1\nNTxQZmvCjMyTe/WYWYNFXdgkVz4egFXbeochCGzMYo57HV1PCkPBrARRZO8OfdDD\n8sDu//ohb7nCzceEI0DnWs13\n-----END PRIVATE KEY-----\n",
"client_email": "firebase-adminsdk-3ml0u@storied-phalanx-349916.iam.gserviceaccount.com",
"client_id": "114163760341944984396",
"auth_uri": "https://accounts.google.com/o/oauth2/auth",
"token_uri": "https://oauth2.googleapis.com/token",
"auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
"client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/firebase-adminsdk-3ml0u%40storied-phalanx-349916.iam.gserviceaccount.com",
"universe_domain": "googleapis.com"
}

441
helper/api_messages.py Normal file
View File

@@ -0,0 +1,441 @@
from enum import Enum
from typing import List
class QuestionType(Enum):
LISTENING_SECTION_1 = "Listening Section 1"
LISTENING_SECTION_2 = "Listening Section 2"
LISTENING_SECTION_3 = "Listening Section 3"
LISTENING_SECTION_4 = "Listening Section 4"
WRITING_TASK_1 = "Writing Task 1"
WRITING_TASK_2 = "Writing Task 2"
SPEAKING_1 = "Speaking Task Part 1"
SPEAKING_2 = "Speaking Task Part 2"
READING_PASSAGE_1 = "Reading Passage 1"
READING_PASSAGE_2 = "Reading Passage 2"
READING_PASSAGE_3 = "Reading Passage 3"
def get_grading_messages(question_type: QuestionType, question: str, answer: str, context: str = None):
if QuestionType.WRITING_TASK_1 == question_type:
messages = [
{
"role": "user",
"content": "You are a IELTS examiner.",
},
{
"role": "user",
"content": f"The question you have to grade is of type Writing Task 1 and is the following: {question}",
}
]
if not (context is None or context == ""):
messages.append({
"role": "user",
"content": f"To grade the previous question, bear in mind the following context: {context}",
})
messages.extend([
{
"role": "user",
"content": "It is mandatory for you to provide your response with the overall grade and breakdown grades, "
"with just the following json format: {'comment': 'comment about answer quality', 'overall': 7.0, "
"'task_response': {'Task Achievement': 8.0, 'Coherence and Cohesion': 6.5, 'Lexical Resource': 7.5, "
"'Grammatical Range and Accuracy': 6.0}}",
},
{
"role": "user",
"content": "Example output: { 'comment': 'Overall, the response is good but there are some areas that need "
"improvement.\n\nIn terms of Task Achievement, the writer has addressed all parts of the question "
"and has provided a clear opinion on the topic. However, some of the points made are not fully "
"developed or supported with examples.\n\nIn terms of Coherence and Cohesion, there is a clear "
"structure to the response with an introduction, body paragraphs and conclusion. However, there "
"are some issues with cohesion as some sentences do not flow smoothly from one to another.\n\nIn "
"terms of Lexical Resource, there is a good range of vocabulary used throughout the response and "
"some less common words have been used effectively.\n\nIn terms of Grammatical Range and Accuracy, "
"there are some errors in grammar and sentence structure which affect clarity in places.\n\nOverall, "
"this response would score a band 6.5.', 'overall': 6.5, 'task_response': "
"{ 'Coherence and Cohesion': 6.5, 'Grammatical Range and Accuracy': 6.0, 'Lexical Resource': 7.0, "
"'Task Achievement': 7.0}}",
},
{
"role": "user",
"content": f"Evaluate this answer according to ielts grading system: {answer}",
},
])
return messages
elif QuestionType.WRITING_TASK_2 == question_type:
return [
{
"role": "user",
"content": "You are a IELTS examiner.",
},
{
"role": "user",
"content": f"The question you have to grade is of type Writing Task 2 and is the following: {question}",
},
{
"role": "user",
"content": "It is mandatory for you to provide your response with the overall grade and breakdown grades, "
"with just the following json format: {'comment': 'comment about answer quality', 'overall': 7.0, "
"'task_response': {'Task Achievement': 8.0, 'Coherence and Cohesion': 6.5, 'Lexical Resource': 7.5, "
"'Grammatical Range and Accuracy': 6.0}}",
},
{
"role": "user",
"content": "Example output: { 'comment': 'Overall, the response is good but there are some areas that need "
"improvement.\n\nIn terms of Task Achievement, the writer has addressed all parts of the question "
"and has provided a clear opinion on the topic. However, some of the points made are not fully "
"developed or supported with examples.\n\nIn terms of Coherence and Cohesion, there is a clear "
"structure to the response with an introduction, body paragraphs and conclusion. However, there "
"are some issues with cohesion as some sentences do not flow smoothly from one to another.\n\nIn "
"terms of Lexical Resource, there is a good range of vocabulary used throughout the response and "
"some less common words have been used effectively.\n\nIn terms of Grammatical Range and Accuracy, "
"there are some errors in grammar and sentence structure which affect clarity in places.\n\nOverall, "
"this response would score a band 6.5.', 'overall': 6.5, 'task_response': "
"{ 'Coherence and Cohesion': 6.5, 'Grammatical Range and Accuracy': 6.0, 'Lexical Resource': 7.0, "
"'Task Achievement': 7.0}}",
},
{
"role": "user",
"content": f"Evaluate this answer according to ielts grading system: {answer}",
},
]
elif QuestionType.SPEAKING_1 == question_type:
return [
{
"role": "user",
"content": "You are an IELTS examiner."
},
{
"role": "user",
"content": f"The question you need to grade is a Speaking Task Part 1 question, and it is as follows: {question}"
},
{
"role": "user",
"content": "Please provide your assessment using the following JSON format: {'comment': 'Comment about answer "
"quality will go here', 'overall': 7.0, 'task_response': {'Fluency and "
"Coherence': 8.0, 'Lexical Resource': 6.5, 'Grammatical Range and Accuracy': 7.5, 'Pronunciation': 6.0}}"
},
{
"role": "user",
"content": "Example output: {'comment': 'Comment about answer quality will go here', 'overall': 6.5, "
"'task_response': {'Fluency and Coherence': 7.0, "
"'Lexical Resource': 6.5, 'Grammatical Range and Accuracy': 7.0, 'Pronunciation': 6.0}}"
},
{
"role": "user",
"content": "Please assign a grade of 0 if the answer provided does not address the question."
},
{
"role": "user",
"content": f"Assess this answer according to the IELTS grading system: {answer}"
},
{
"role": "user",
"content": "Remember to consider Fluency and Coherence, Lexical Resource, Grammatical Range and Accuracy, "
"and Pronunciation when grading the response."
}
]
elif QuestionType.SPEAKING_2 == question_type:
return [
{
"role": "user",
"content": "You are an IELTS examiner."
},
{
"role": "user",
"content": f"The question you need to grade is a Speaking Task Part 2 question, and it is as follows: {question}"
},
{
"role": "user",
"content": "Please provide your assessment using the following JSON format: {\"comment\": \"Comment about "
"answer quality\", \"overall\": 7.0, \"task_response\": {\"Fluency and Coherence\": 8.0, \"Lexical "
"Resource\": 6.5, \"Grammatical Range and Accuracy\": 7.5, \"Pronunciation\": 6.0}}"
},
{
"role": "user",
"content": "Example output: {\"comment\": \"The candidate has provided a clear response to the question "
"and has given examples of how they spend their weekends. However, there are some issues with "
"grammar and pronunciation that affect the overall score. In terms of fluency and coherence, "
"the candidate speaks clearly and smoothly with only minor hesitations. They have also provided "
"a well-organized response that is easy to follow. Regarding lexical resource, the candidate "
"has used a range of vocabulary related to weekend activities but there are some errors in "
"word choice that affect the meaning of their sentences. In terms of grammatical range and "
"accuracy, the candidate has used a mix of simple and complex sentence structures but there "
"are some errors in subject-verb agreement and preposition use. Finally, regarding pronunciation, "
"the candidate's speech is generally clear but there are some issues with stress and intonation "
"that make it difficult to understand at times.\", \"overall\": 6.5, \"task_response\": {\"Fluency "
"and Coherence\": 7.0, \"Lexical Resource\": 6.5, \"Grammatical Range and Accuracy\": 7.0, "
"\"Pronunciation\": 6.0}}"
},
{
"role": "user",
"content": "Please assign a grade of 0 if the answer provided does not address the question."
},
{
"role": "user",
"content": f"Assess this answer according to the IELTS grading system: {answer}"
},
{
"role": "user",
"content": "Remember to consider Fluency and Coherence, Lexical Resource, Grammatical Range and Accuracy, "
"and Pronunciation when grading the response."
}
]
else:
raise Exception("Question type not implemented: " + question_type.value)
def get_speaking_grading_messages(answers: List):
messages = [
{
"role": "user",
"content": "You are an IELTS examiner."
},
{
"role": "user",
"content": "The exercise you need to grade is a Speaking Task, and it is has the following questions and answers:"
}
]
for item in answers:
question = item["question"]
answer = item["answer_text"]
messages.append({
"role": "user",
"content": f"Question: {question}; Answer: {answer}"
})
messages.extend([
{
"role": "user",
"content": f"Assess this answer according to the IELTS grading system."
},
{
"role": "user",
"content": "Please provide your assessment using the following JSON format: {'comment': 'Comment about answer "
"quality will go here', 'overall': 7.0, 'task_response': {'Fluency and "
"Coherence': 8.0, 'Lexical Resource': 6.5, 'Grammatical Range and Accuracy': 7.5, 'Pronunciation': 6.0}}"
},
{
"role": "user",
"content": "Example output: {'comment': 'Comment about answer quality will go here', 'overall': 6.5, "
"'task_response': {'Fluency and Coherence': 7.0, "
"'Lexical Resource': 6.5, 'Grammatical Range and Accuracy': 7.0, 'Pronunciation': 6.0}}"
},
{
"role": "user",
"content": "Please assign a grade of 0 if the answer provided does not address the question."
},
{
"role": "user",
"content": "Remember to consider Fluency and Coherence, Lexical Resource, Grammatical Range and Accuracy, "
"and Pronunciation when grading the response."
}
])
return messages
def get_question_gen_messages(question_type: QuestionType):
if QuestionType.LISTENING_SECTION_1 == question_type:
return [
{
"role": "user",
"content": "You are a IELTS program that generates questions for the exams.",
},
{
"role": "user",
"content": "Provide me with a transcript similar to the ones in ielts exam Listening Section 1. "
"Create an engaging transcript simulating a conversation related to a unique type of service "
"that requires getting the customer's details. Make sure to include specific details "
"and descriptions to bring"
"the scenario to life. After the transcript, please "
"generate a 'form like' fill in the blanks exercise with 6 form fields (ex: name, date of birth)"
" to fill related to the customer's details. Finally, "
"provide the answers for the exercise. The response must be a json following this format: "
"{ 'type': '<type of registration (ex: hotel, gym, english course, etc)>', "
"'transcript': '<transcript of just the conversation about a registration of some sort, "
"identify the person talking in each speech line>', "
"'exercise': { 'form field': { '1': '<form field 1>', '2': '<form field 2>', "
"'3': '<form field 3>', '4': '<form field 4>', "
"'5': '<form field 5>', '6': '<form field 5>' }, "
"'answers': {'1': '<answer to fill blank space in form field 1>', '2': '<answer to fill blank "
"space in form field 2>', '3': '<answer to fill blank space in form field 3>', "
"'4': '<answer to fill blank space in form field 4>', '5': '<answer to fill blank space in form field 5>',"
" '6': '<answer to fill blank space in form field 6>'}}}",
}
]
elif QuestionType.LISTENING_SECTION_2 == question_type:
return [
{
"role": "user",
"content": "You are a IELTS program that generates questions for the exams.",
},
{
"role": "user",
"content": "Provide me with a transcript similar to the ones in ielts exam Listening section 2. After the transcript, please "
"generate a fill in the blanks exercise with 6 statements related to the text content. Finally, "
"provide the answers for the exercise. The response must be a json following this format: "
"{ 'transcript': 'transcript about some subject', 'exercise': { 'statements': { '1': 'statement 1 "
"with a blank space to fill', '2': 'statement 2 with a blank space to fill', '3': 'statement 3 with a "
"blank space to fill', '4': 'statement 4 with a blank space to fill', '5': 'statement 5 with a blank "
"space to fill', '6': 'statement 6 with a blank space to fill' }, "
"'answers': {'1': 'answer to fill blank space in statement 1', '2': 'answer to fill blank "
"space in statement 2', '3': 'answer to fill blank space in statement 3', "
"'4': 'answer to fill blank space in statement 4', '5': 'answer to fill blank space in statement 5',"
" '6': 'answer to fill blank space in statement 6'}}}",
}
]
elif QuestionType.LISTENING_SECTION_3 == question_type:
return [
{
"role": "user",
"content": "You are a IELTS program that generates questions for the exams.",
},
{
"role": "user",
"content": "Provide me with a transcript similar to the ones in ielts exam Listening section 3. After the transcript, please "
"generate 4 multiple choice questions related to the text content. Finally, "
"provide the answers for the exercise. The response must be a json following this format: "
"{ 'transcript': 'generated transcript similar to the ones in ielts exam Listening section 3', "
"'exercise': { 'questions': [ { 'question': "
"'question 1', 'options': ['option 1', 'option 2', 'option 3', 'option 4'], 'answer': 1}, "
"{'question': 'question 2', 'options': ['option 1', 'option 2', 'option 3', 'option 4'], "
"'answer': 3}, {'question': 'question 3', 'options': ['option 1', 'option 2', 'option 3', "
"'option 4'], 'answer': 0}, {'question': 'question 4', 'options': ['option 1', 'option 2', "
"'option 3', 'option 4'], 'answer': 2}]}}",
}
]
elif QuestionType.LISTENING_SECTION_4 == question_type:
return [
{
"role": "user",
"content": "You are a IELTS program that generates questions for the exams.",
},
{
"role": "user",
"content": "Provide me with a transcript similar to the ones in ielts exam Listening section 4. After the transcript, please "
"generate 4 completion-type questions related to the text content to complete with 1 word. Finally, "
"provide the answers for the exercise. The response must be a json following this format: "
"{ 'transcript': 'generated transcript similar to the ones in ielts exam Listening section 4', "
"'exercise': [ { 'question': 'question 1', 'answer': 'answer 1'}, "
"{'question': 'question 2', 'answer': 'answer 2'}, {'question': 'question 3', 'answer': 'answer 3'}, "
"{'question': 'question 4', 'answer': 'answer 4'}]}",
}
]
elif QuestionType.WRITING_TASK_2 == question_type:
return [
{
"role": "user",
"content": "You are a IELTS program that generates questions for the exams.",
},
{
"role": "user",
"content": "The question you have to generate is of type Writing Task 2.",
},
{
"role": "user",
"content": "It is mandatory for you to provide your response with the question "
"just with the following json format: {'question': 'question'}",
},
{
"role": "user",
"content": "Example output: { 'question': 'We are becoming increasingly dependent on computers. "
"They are used in businesses, hospitals, crime detection and even to fly planes. What things will "
"they be used for in the future? Is this dependence on computers a good thing or should we he more "
"auspicious of their benefits?'}",
},
{
"role": "user",
"content": "Generate a question for IELTS exam Writing Task 2.",
},
]
elif QuestionType.SPEAKING_1 == question_type:
return [
{
"role": "user",
"content": "You are a IELTS program that generates questions for the exams.",
},
{
"role": "user",
"content": "The question you have to generate is of type Speaking Task Part 1.",
},
{
"role": "user",
"content": "It is mandatory for you to provide your response with the question "
"just with the following json format: {'question': 'question'}",
},
{
"role": "user",
"content": "Example output: { 'question': 'Lets talk about your home town or village. "
"What kind of place is it? Whats the most interesting part of your town/village? "
"What kind of jobs do the people in your town/village do? "
"Would you say its a good place to live? (Why?)'}",
},
{
"role": "user",
"content": "Generate a question for IELTS exam Speaking Task.",
},
]
elif QuestionType.SPEAKING_2 == question_type:
return [
{
"role": "user",
"content": "You are a IELTS program that generates questions for the exams.",
},
{
"role": "user",
"content": "The question you have to generate is of type Speaking Task Part 2.",
},
{
"role": "user",
"content": "It is mandatory for you to provide your response with the question "
"just with the following json format: {'question': 'question'}",
},
{
"role": "user",
"content": "Example output: { 'question': 'Describe something you own which is very important to you. "
"You should say: where you got it from how long you have had it what you use it for and "
"explain why it is important to you.'}",
},
{
"role": "user",
"content": "Generate a question for IELTS exam Speaking Task.",
},
]
else:
raise Exception("Question type not implemented: " + question_type.value)
def get_question_tips(question: str, answer: str, correct_answer: str, context: str = None):
messages = [
{
"role": "user",
"content": "You are a IELTS exam program that analyzes incorrect answers to questions and gives tips to "
"help students understand why it was a wrong answer and gives helpful insight for the future. "
"The tip should refer to the context and question.",
}
]
if not (context is None or context == ""):
messages.append({
"role": "user",
"content": f"This is the context for the question: {context}",
})
messages.extend([
{
"role": "user",
"content": f"This is the question: {question}",
},
{
"role": "user",
"content": f"This is the answer: {answer}",
},
{
"role": "user",
"content": f"This is the correct answer: {correct_answer}",
}
])
return messages

661
helper/constants.py Normal file
View File

@@ -0,0 +1,661 @@
AUDIO_FILES_PATH = 'download-audio/'
FIREBASE_LISTENING_AUDIO_FILES_PATH = 'listening_recordings/'
VIDEO_FILES_PATH = 'download-video/'
FIREBASE_SPEAKING_VIDEO_FILES_PATH = 'speaking_videos/'
GRADING_TEMPERATURE = 0.1
TIPS_TEMPERATURE = 0.2
GEN_QUESTION_TEMPERATURE = 0.7
GPT_3_5_TURBO = "gpt-3.5-turbo"
GPT_4_TURBO = "gpt-4-turbo"
GPT_4_O = "gpt-4o"
GPT_3_5_TURBO_16K = "gpt-3.5-turbo-16k"
GPT_3_5_TURBO_INSTRUCT = "gpt-3.5-turbo-instruct"
GPT_4_PREVIEW = "gpt-4-turbo-preview"
GRADING_FIELDS = ['comment', 'overall', 'task_response']
GEN_FIELDS = ['topic']
GEN_TEXT_FIELDS = ['title']
LISTENING_GEN_FIELDS = ['transcript', 'exercise']
READING_EXERCISE_TYPES = ['fillBlanks', 'writeBlanks', 'trueFalse', 'paragraphMatch']
READING_3_EXERCISE_TYPES = ['fillBlanks', 'writeBlanks', 'trueFalse', 'paragraphMatch', 'ideaMatch']
LISTENING_EXERCISE_TYPES = ['multipleChoice', 'writeBlanksQuestions', 'writeBlanksFill', 'writeBlanksForm']
LISTENING_1_EXERCISE_TYPES = ['multipleChoice', 'writeBlanksQuestions', 'writeBlanksFill', 'writeBlanksFill',
'writeBlanksForm', 'writeBlanksForm', 'writeBlanksForm', 'writeBlanksForm']
LISTENING_2_EXERCISE_TYPES = ['multipleChoice', 'writeBlanksQuestions']
LISTENING_3_EXERCISE_TYPES = ['multipleChoice3Options', 'writeBlanksQuestions']
LISTENING_4_EXERCISE_TYPES = ['multipleChoice', 'writeBlanksQuestions', 'writeBlanksFill', 'writeBlanksForm']
TOTAL_READING_PASSAGE_1_EXERCISES = 13
TOTAL_READING_PASSAGE_2_EXERCISES = 13
TOTAL_READING_PASSAGE_3_EXERCISES = 14
TOTAL_LISTENING_SECTION_1_EXERCISES = 10
TOTAL_LISTENING_SECTION_2_EXERCISES = 10
TOTAL_LISTENING_SECTION_3_EXERCISES = 10
TOTAL_LISTENING_SECTION_4_EXERCISES = 10
LISTENING_MIN_TIMER_DEFAULT = 30
WRITING_MIN_TIMER_DEFAULT = 60
SPEAKING_MIN_TIMER_DEFAULT = 14
BLACKLISTED_WORDS = ["jesus", "sex", "gay", "lesbian", "homosexual", "god", "angel", "pornography", "beer", "wine",
"cocaine", "alcohol", "nudity", "lgbt", "casino", "gambling", "catholicism",
"discrimination", "politic", "christianity", "islam", "christian", "christians",
"jews", "jew", "discrimination", "discriminatory"]
EN_US_VOICES = [
{'Gender': 'Female', 'Id': 'Salli', 'LanguageCode': 'en-US', 'LanguageName': 'US English', 'Name': 'Salli',
'SupportedEngines': ['neural', 'standard']},
{'Gender': 'Male', 'Id': 'Matthew', 'LanguageCode': 'en-US', 'LanguageName': 'US English', 'Name': 'Matthew',
'SupportedEngines': ['neural', 'standard']},
{'Gender': 'Female', 'Id': 'Kimberly', 'LanguageCode': 'en-US', 'LanguageName': 'US English', 'Name': 'Kimberly',
'SupportedEngines': ['neural', 'standard']},
{'Gender': 'Female', 'Id': 'Kendra', 'LanguageCode': 'en-US', 'LanguageName': 'US English', 'Name': 'Kendra',
'SupportedEngines': ['neural', 'standard']},
{'Gender': 'Male', 'Id': 'Justin', 'LanguageCode': 'en-US', 'LanguageName': 'US English', 'Name': 'Justin',
'SupportedEngines': ['neural', 'standard']},
{'Gender': 'Male', 'Id': 'Joey', 'LanguageCode': 'en-US', 'LanguageName': 'US English', 'Name': 'Joey',
'SupportedEngines': ['neural', 'standard']},
{'Gender': 'Female', 'Id': 'Joanna', 'LanguageCode': 'en-US', 'LanguageName': 'US English', 'Name': 'Joanna',
'SupportedEngines': ['neural', 'standard']},
{'Gender': 'Female', 'Id': 'Ivy', 'LanguageCode': 'en-US', 'LanguageName': 'US English', 'Name': 'Ivy',
'SupportedEngines': ['neural', 'standard']}]
EN_GB_VOICES = [
{'Gender': 'Female', 'Id': 'Emma', 'LanguageCode': 'en-GB', 'LanguageName': 'British English', 'Name': 'Emma',
'SupportedEngines': ['neural', 'standard']},
{'Gender': 'Male', 'Id': 'Brian', 'LanguageCode': 'en-GB', 'LanguageName': 'British English', 'Name': 'Brian',
'SupportedEngines': ['neural', 'standard']},
{'Gender': 'Female', 'Id': 'Amy', 'LanguageCode': 'en-GB', 'LanguageName': 'British English', 'Name': 'Amy',
'SupportedEngines': ['neural', 'standard']}]
EN_GB_WLS_VOICES = [
{'Gender': 'Male', 'Id': 'Geraint', 'LanguageCode': 'en-GB-WLS', 'LanguageName': 'Welsh English', 'Name': 'Geraint',
'SupportedEngines': ['standard']}]
EN_AU_VOICES = [{'Gender': 'Male', 'Id': 'Russell', 'LanguageCode': 'en-AU', 'LanguageName': 'Australian English',
'Name': 'Russell', 'SupportedEngines': ['standard']},
{'Gender': 'Female', 'Id': 'Nicole', 'LanguageCode': 'en-AU', 'LanguageName': 'Australian English',
'Name': 'Nicole', 'SupportedEngines': ['standard']}]
ALL_VOICES = EN_US_VOICES + EN_GB_VOICES + EN_GB_WLS_VOICES + EN_AU_VOICES
NEURAL_EN_US_VOICES = [
{'Gender': 'Female', 'Id': 'Danielle', 'LanguageCode': 'en-US', 'LanguageName': 'US English', 'Name': 'Danielle',
'SupportedEngines': ['neural']},
{'Gender': 'Male', 'Id': 'Gregory', 'LanguageCode': 'en-US', 'LanguageName': 'US English', 'Name': 'Gregory',
'SupportedEngines': ['neural']},
{'Gender': 'Male', 'Id': 'Kevin', 'LanguageCode': 'en-US', 'LanguageName': 'US English', 'Name': 'Kevin',
'SupportedEngines': ['neural']},
{'Gender': 'Female', 'Id': 'Ruth', 'LanguageCode': 'en-US', 'LanguageName': 'US English', 'Name': 'Ruth',
'SupportedEngines': ['neural']},
{'Gender': 'Male', 'Id': 'Stephen', 'LanguageCode': 'en-US', 'LanguageName': 'US English', 'Name': 'Stephen',
'SupportedEngines': ['neural']}]
NEURAL_EN_GB_VOICES = [
{'Gender': 'Male', 'Id': 'Arthur', 'LanguageCode': 'en-GB', 'LanguageName': 'British English', 'Name': 'Arthur',
'SupportedEngines': ['neural']}]
NEURAL_EN_AU_VOICES = [
{'Gender': 'Female', 'Id': 'Olivia', 'LanguageCode': 'en-AU', 'LanguageName': 'Australian English',
'Name': 'Olivia', 'SupportedEngines': ['neural']}]
NEURAL_EN_ZA_VOICES = [
{'Gender': 'Female', 'Id': 'Ayanda', 'LanguageCode': 'en-ZA', 'LanguageName': 'South African English',
'Name': 'Ayanda', 'SupportedEngines': ['neural']}]
NEURAL_EN_NZ_VOICES = [
{'Gender': 'Female', 'Id': 'Aria', 'LanguageCode': 'en-NZ', 'LanguageName': 'New Zealand English', 'Name': 'Aria',
'SupportedEngines': ['neural']}]
NEURAL_EN_IN_VOICES = [
{'Gender': 'Female', 'Id': 'Kajal', 'LanguageCode': 'en-IN', 'LanguageName': 'Indian English', 'Name': 'Kajal',
'SupportedEngines': ['neural']}]
NEURAL_EN_IE_VOICES = [
{'Gender': 'Female', 'Id': 'Niamh', 'LanguageCode': 'en-IE', 'LanguageName': 'Irish English', 'Name': 'Niamh',
'SupportedEngines': ['neural']}]
ALL_NEURAL_VOICES = NEURAL_EN_US_VOICES + NEURAL_EN_GB_VOICES + NEURAL_EN_AU_VOICES + NEURAL_EN_ZA_VOICES + NEURAL_EN_NZ_VOICES + NEURAL_EN_IE_VOICES
MALE_VOICES = [item for item in ALL_VOICES if item.get('Gender') == 'Male']
FEMALE_VOICES = [item for item in ALL_VOICES if item.get('Gender') == 'Female']
MALE_NEURAL_VOICES = [item for item in ALL_NEURAL_VOICES if item.get('Gender') == 'Male']
FEMALE_NEURAL_VOICES = [item for item in ALL_NEURAL_VOICES if item.get('Gender') == 'Female']
difficulties = ["easy", "medium", "hard"]
mti_topics = [
"Education",
"Technology",
"Environment",
"Health and Fitness",
"Engineering",
"Work and Careers",
"Travel and Tourism",
"Culture and Traditions",
"Social Issues",
"Arts and Entertainment",
"Climate Change",
"Social Media",
"Sustainable Development",
"Health Care",
"Immigration",
"Artificial Intelligence",
"Consumerism",
"Online Shopping",
"Energy",
"Oil and Gas",
"Poverty and Inequality",
"Cultural Diversity",
"Democracy and Governance",
"Mental Health",
"Ethics and Morality",
"Population Growth",
"Science and Innovation",
"Poverty Alleviation",
"Cybersecurity and Privacy",
"Human Rights",
"Food and Agriculture",
"Cyberbullying and Online Safety",
"Linguistic Diversity",
"Urbanization",
"Artificial Intelligence in Education",
"Youth Empowerment",
"Disaster Management",
"Mental Health Stigma",
"Internet Censorship",
"Sustainable Fashion",
"Indigenous Rights",
"Water Scarcity",
"Social Entrepreneurship",
"Privacy in the Digital Age",
"Sustainable Transportation",
"Gender Equality",
"Automation and Job Displacement",
"Digital Divide",
"Education Inequality"
]
topics = [
"Art and Creativity",
"History of Ancient Civilizations",
"Environmental Conservation",
"Space Exploration",
"Artificial Intelligence",
"Climate Change",
"The Human Brain",
"Renewable Energy",
"Cultural Diversity",
"Modern Technology Trends",
"Sustainable Agriculture",
"Natural Disasters",
"Cybersecurity",
"Philosophy of Ethics",
"Robotics",
"Health and Wellness",
"Literature and Classics",
"World Geography",
"Social Media Impact",
"Food Sustainability",
"Economics and Markets",
"Human Evolution",
"Political Systems",
"Mental Health Awareness",
"Quantum Physics",
"Biodiversity",
"Education Reform",
"Animal Rights",
"The Industrial Revolution",
"Future of Work",
"Film and Cinema",
"Genetic Engineering",
"Climate Policy",
"Space Travel",
"Renewable Energy Sources",
"Cultural Heritage Preservation",
"Modern Art Movements",
"Sustainable Transportation",
"The History of Medicine",
"Artificial Neural Networks",
"Climate Adaptation",
"Philosophy of Existence",
"Augmented Reality",
"Yoga and Meditation",
"Literary Genres",
"World Oceans",
"Social Networking",
"Sustainable Fashion",
"Prehistoric Era",
"Democracy and Governance",
"Postcolonial Literature",
"Geopolitics",
"Psychology and Behavior",
"Nanotechnology",
"Endangered Species",
"Education Technology",
"Renaissance Art",
"Renewable Energy Policy",
"Modern Architecture",
"Climate Resilience",
"Artificial Life",
"Fitness and Nutrition",
"Classic Literature Adaptations",
"Ethical Dilemmas",
"Internet of Things (IoT)",
"Meditation Practices",
"Literary Symbolism",
"Marine Conservation",
"Sustainable Tourism",
"Ancient Philosophy",
"Cold War Era",
"Behavioral Economics",
"Space Colonization",
"Clean Energy Initiatives",
"Cultural Exchange",
"Modern Sculpture",
"Climate Mitigation",
"Mindfulness",
"Literary Criticism",
"Wildlife Conservation",
"Renewable Energy Innovations",
"History of Mathematics",
"Human-Computer Interaction",
"Global Health",
"Cultural Appropriation",
"Traditional cuisine and culinary arts",
"Local music and dance traditions",
"History of the region and historical landmarks",
"Traditional crafts and artisanal skills",
"Wildlife and conservation efforts",
"Local sports and athletic competitions",
"Fashion trends and clothing styles",
"Education systems and advancements",
"Healthcare services and medical innovations",
"Family values and social dynamics",
"Travel destinations and tourist attractions",
"Environmental sustainability projects",
"Technological developments and innovations",
"Entrepreneurship and business ventures",
"Youth empowerment initiatives",
"Art exhibitions and cultural events",
"Philanthropy and community development projects"
]
two_people_scenarios = [
"Booking a table at a restaurant",
"Making a doctor's appointment",
"Asking for directions to a tourist attraction",
"Inquiring about public transportation options",
"Discussing weekend plans with a friend",
"Ordering food at a café",
"Renting a bicycle for a day",
"Arranging a meeting with a colleague",
"Talking to a real estate agent about renting an apartment",
"Discussing travel plans for an upcoming vacation",
"Checking the availability of a hotel room",
"Talking to a car rental service",
"Asking for recommendations at a library",
"Inquiring about opening hours at a museum",
"Discussing the weather forecast",
"Shopping for groceries",
"Renting a movie from a video store",
"Booking a flight ticket",
"Discussing a school assignment with a classmate",
"Making a reservation for a spa appointment",
"Talking to a customer service representative about a product issue",
"Discussing household chores with a family member",
"Planning a surprise party for a friend",
"Talking to a coworker about a project deadline",
"Inquiring about a gym membership",
"Discussing the menu options at a fast-food restaurant",
"Talking to a neighbor about a community event",
"Asking for help with computer problems",
"Discussing a recent sports game with a sports enthusiast",
"Talking to a pet store employee about buying a pet",
"Asking for information about a local farmer's market",
"Discussing the details of a home renovation project",
"Talking to a coworker about office supplies",
"Making plans for a family picnic",
"Inquiring about admission requirements at a university",
"Discussing the features of a new smartphone with a salesperson",
"Talking to a mechanic about car repairs",
"Making arrangements for a child's birthday party",
"Discussing a new diet plan with a nutritionist",
"Asking for information about a music concert",
"Talking to a hairdresser about getting a haircut",
"Inquiring about a language course at a language school",
"Discussing plans for a weekend camping trip",
"Talking to a bank teller about opening a new account",
"Ordering a drink at a coffee shop",
"Discussing a new book with a book club member",
"Talking to a librarian about library services",
"Asking for advice on finding a job",
"Discussing plans for a garden makeover with a landscaper",
"Talking to a travel agent about a cruise vacation",
"Inquiring about a fitness class at a gym",
"Ordering flowers for a special occasion",
"Discussing a new exercise routine with a personal trainer",
"Talking to a teacher about a child's progress in school",
"Asking for information about a local art exhibition",
"Discussing a home improvement project with a contractor",
"Talking to a babysitter about childcare arrangements",
"Making arrangements for a car service appointment",
"Inquiring about a photography workshop at a studio",
"Discussing plans for a family reunion with a relative",
"Talking to a tech support representative about computer issues",
"Asking for recommendations on pet grooming services",
"Discussing weekend plans with a significant other",
"Talking to a counselor about personal issues",
"Inquiring about a music lesson with a music teacher",
"Ordering a pizza for delivery",
"Making a reservation for a taxi",
"Discussing a new recipe with a chef",
"Talking to a fitness trainer about weight loss goals",
"Inquiring about a dance class at a dance studio",
"Ordering a meal at a food truck",
"Discussing plans for a weekend getaway with a partner",
"Talking to a florist about wedding flower arrangements",
"Asking for advice on home decorating",
"Discussing plans for a charity fundraiser event",
"Talking to a pet sitter about taking care of pets",
"Making arrangements for a spa day with a friend",
"Asking for recommendations on home improvement stores",
"Discussing weekend plans with a travel enthusiast",
"Talking to a car mechanic about car maintenance",
"Inquiring about a cooking class at a culinary school",
"Ordering a sandwich at a deli",
"Discussing plans for a family holiday party",
"Talking to a personal assistant about organizing tasks",
"Asking for information about a local theater production",
"Discussing a new DIY project with a home improvement expert",
"Talking to a wine expert about wine pairing",
"Making arrangements for a pet adoption",
"Asking for advice on planning a wedding"
]
social_monologue_contexts = [
"A guided tour of a historical museum",
"An introduction to a new city for tourists",
"An orientation session for new university students",
"A safety briefing for airline passengers",
"An explanation of the process of recycling",
"A lecture on the benefits of a healthy diet",
"A talk on the importance of time management",
"A monologue about wildlife conservation",
"An overview of local public transportation options",
"A presentation on the history of cinema",
"An introduction to the art of photography",
"A discussion about the effects of climate change",
"An overview of different types of cuisine",
"A lecture on the principles of financial planning",
"A monologue about sustainable energy sources",
"An explanation of the process of online shopping",
"A guided tour of a botanical garden",
"An introduction to a local wildlife sanctuary",
"A safety briefing for hikers in a national park",
"A talk on the benefits of physical exercise",
"A lecture on the principles of effective communication",
"A monologue about the impact of social media",
"An overview of the history of a famous landmark",
"An introduction to the world of fashion design",
"A discussion about the challenges of global poverty",
"An explanation of the process of organic farming",
"A presentation on the history of space exploration",
"An overview of traditional music from different cultures",
"A lecture on the principles of effective leadership",
"A monologue about the influence of technology",
"A guided tour of a famous archaeological site",
"An introduction to a local wildlife rehabilitation center",
"A safety briefing for visitors to a science museum",
"A talk on the benefits of learning a new language",
"A lecture on the principles of architectural design",
"A monologue about the impact of renewable energy",
"An explanation of the process of online banking",
"A presentation on the history of a famous art movement",
"An overview of traditional clothing from various regions",
"A lecture on the principles of sustainable agriculture",
"A discussion about the challenges of urban development",
"A monologue about the influence of social norms",
"A guided tour of a historical battlefield",
"An introduction to a local animal shelter",
"A safety briefing for participants in a charity run",
"A talk on the benefits of community involvement",
"A lecture on the principles of sustainable tourism",
"A monologue about the impact of alternative medicine",
"An explanation of the process of wildlife tracking",
"A presentation on the history of a famous inventor",
"An overview of traditional dance forms from different cultures",
"A lecture on the principles of ethical business practices",
"A discussion about the challenges of healthcare access",
"A monologue about the influence of cultural traditions",
"A guided tour of a famous lighthouse",
"An introduction to a local astronomy observatory",
"A safety briefing for participants in a team-building event",
"A talk on the benefits of volunteering",
"A lecture on the principles of wildlife protection",
"A monologue about the impact of space exploration",
"An explanation of the process of wildlife photography",
"A presentation on the history of a famous musician",
"An overview of traditional art forms from different cultures",
"A lecture on the principles of effective education",
"A discussion about the challenges of sustainable development",
"A monologue about the influence of cultural diversity",
"A guided tour of a famous national park",
"An introduction to a local marine conservation project",
"A safety briefing for participants in a hot air balloon ride",
"A talk on the benefits of cultural exchange programs",
"A lecture on the principles of wildlife conservation",
"A monologue about the impact of technological advancements",
"An explanation of the process of wildlife rehabilitation",
"A presentation on the history of a famous explorer",
"A lecture on the principles of effective marketing",
"A discussion about the challenges of environmental sustainability",
"A monologue about the influence of social entrepreneurship",
"A guided tour of a famous historical estate",
"An introduction to a local marine life research center",
"A safety briefing for participants in a zip-lining adventure",
"A talk on the benefits of cultural preservation",
"A lecture on the principles of wildlife ecology",
"A monologue about the impact of space technology",
"An explanation of the process of wildlife conservation",
"A presentation on the history of a famous scientist",
"An overview of traditional crafts and artisans from different cultures",
"A lecture on the principles of effective intercultural communication"
]
four_people_scenarios = [
"A university lecture on history",
"A physics class discussing Newton's laws",
"A medical school seminar on anatomy",
"A training session on computer programming",
"A business school lecture on marketing strategies",
"A chemistry lab experiment and discussion",
"A language class practicing conversational skills",
"A workshop on creative writing techniques",
"A high school math lesson on calculus",
"A training program for customer service representatives",
"A lecture on environmental science and sustainability",
"A psychology class exploring human behavior",
"A music theory class analyzing compositions",
"A nursing school simulation for patient care",
"A computer science class on algorithms",
"A workshop on graphic design principles",
"A law school lecture on constitutional law",
"A geology class studying rock formations",
"A vocational training program for electricians",
"A history seminar focusing on ancient civilizations",
"A biology class dissecting specimens",
"A financial literacy course for adults",
"A literature class discussing classic novels",
"A training session for emergency response teams",
"A sociology lecture on social inequality",
"An art class exploring different painting techniques",
"A medical school seminar on diagnosis",
"A programming bootcamp teaching web development",
"An economics class analyzing market trends",
"A chemistry lab experiment on chemical reactions",
"A language class practicing pronunciation",
"A workshop on public speaking skills",
"A high school physics lesson on electromagnetism",
"A training program for IT professionals",
"A lecture on climate change and its effects",
"A psychology class studying cognitive psychology",
"A music class composing original songs",
"A nursing school simulation for patient assessment",
"A computer science class on data structures",
"A workshop on 3D modeling and animation",
"A law school lecture on contract law",
"A geography class examining world maps",
"A vocational training program for plumbers",
"A history seminar discussing revolutions",
"A biology class exploring genetics",
"A financial literacy course for teens",
"A literature class analyzing poetry",
"A training session for public speaking coaches",
"A sociology lecture on cultural diversity",
"An art class creating sculptures",
"A medical school seminar on surgical techniques",
"A programming bootcamp teaching app development",
"An economics class on global trade policies",
"A chemistry lab experiment on chemical bonding",
"A language class discussing idiomatic expressions",
"A workshop on conflict resolution",
"A high school biology lesson on evolution",
"A training program for project managers",
"A lecture on renewable energy sources",
"A psychology class on abnormal psychology",
"A music class rehearsing for a performance",
"A nursing school simulation for emergency response",
"A computer science class on cybersecurity",
"A workshop on digital marketing strategies",
"A law school lecture on intellectual property",
"A geology class analyzing seismic activity",
"A vocational training program for carpenters",
"A history seminar on the Renaissance",
"A chemistry class synthesizing compounds",
"A financial literacy course for seniors",
"A literature class interpreting Shakespearean plays",
"A training session for negotiation skills",
"A sociology lecture on urbanization",
"An art class creating digital art",
"A medical school seminar on patient communication",
"A programming bootcamp teaching mobile app development",
"An economics class on fiscal policy",
"A physics lab experiment on electromagnetism",
"A language class on cultural immersion",
"A workshop on time management",
"A high school chemistry lesson on stoichiometry",
"A training program for HR professionals",
"A lecture on space exploration and astronomy",
"A psychology class on human development",
"A music class practicing for a recital",
"A nursing school simulation for triage",
"A computer science class on web development frameworks",
"A workshop on team-building exercises",
"A law school lecture on criminal law",
"A geography class studying world cultures",
"A vocational training program for HVAC technicians",
"A history seminar on ancient civilizations",
"A biology class examining ecosystems",
"A financial literacy course for entrepreneurs",
"A literature class analyzing modern literature",
"A training session for leadership skills",
"A sociology lecture on gender studies",
"An art class exploring multimedia art",
"A medical school seminar on patient diagnosis",
"A programming bootcamp teaching software architecture"
]
academic_subjects = [
"Astrophysics",
"Microbiology",
"Political Science",
"Environmental Science",
"Literature",
"Biochemistry",
"Sociology",
"Art History",
"Geology",
"Economics",
"Psychology",
"History of Architecture",
"Linguistics",
"Neurobiology",
"Anthropology",
"Quantum Mechanics",
"Urban Planning",
"Philosophy",
"Marine Biology",
"International Relations",
"Medieval History",
"Geophysics",
"Finance",
"Educational Psychology",
"Graphic Design",
"Paleontology",
"Macroeconomics",
"Cognitive Psychology",
"Renaissance Art",
"Archaeology",
"Microeconomics",
"Social Psychology",
"Contemporary Art",
"Meteorology",
"Political Philosophy",
"Space Exploration",
"Cognitive Science",
"Classical Music",
"Oceanography",
"Public Health",
"Gender Studies",
"Baroque Art",
"Volcanology",
"Business Ethics",
"Music Composition",
"Environmental Policy",
"Media Studies",
"Ancient History",
"Seismology",
"Marketing",
"Human Development",
"Modern Art",
"Astronomy",
"International Law",
"Developmental Psychology",
"Film Studies",
"American History",
"Soil Science",
"Entrepreneurship",
"Clinical Psychology",
"Contemporary Dance",
"Space Physics",
"Political Economy",
"Cognitive Neuroscience",
"20th Century Literature",
"Public Administration",
"European History",
"Atmospheric Science",
"Supply Chain Management",
"Social Work",
"Japanese Literature",
"Planetary Science",
"Labor Economics",
"Industrial-Organizational Psychology",
"French Philosophy",
"Biogeochemistry",
"Strategic Management",
"Educational Sociology",
"Postmodern Literature",
"Public Relations",
"Middle Eastern History",
"Oceanography",
"International Development",
"Human Resources Management",
"Educational Leadership",
"Russian Literature",
"Quantum Chemistry",
"Environmental Economics",
"Environmental Psychology",
"Ancient Philosophy",
"Immunology",
"Comparative Politics",
"Child Development",
"Fashion Design",
"Geological Engineering",
"Macroeconomic Policy",
"Media Psychology",
"Byzantine Art",
"Ecology",
"International Business"
]

6
helper/exam_variant.py Normal file
View File

@@ -0,0 +1,6 @@
from enum import Enum
class ExamVariant(Enum):
FULL = "full"
PARTIAL = "partial"

2152
helper/exercises.py Normal file

File diff suppressed because it is too large Load Diff

17
helper/file_helper.py Normal file
View File

@@ -0,0 +1,17 @@
import datetime
import os
from pathlib import Path
def delete_files_older_than_one_day(directory):
current_time = datetime.datetime.now()
for entry in os.scandir(directory):
if entry.is_file():
file_path = Path(entry)
file_name = file_path.name
file_modified_time = datetime.datetime.fromtimestamp(file_path.stat().st_mtime)
time_difference = current_time - file_modified_time
if time_difference.days > 1 and "placeholder" not in file_name:
file_path.unlink()
print(f"Deleted file: {file_path}")

65
helper/firebase_helper.py Normal file
View File

@@ -0,0 +1,65 @@
import logging
from google.cloud import storage
from pymongo.database import Database
def download_firebase_file(bucket_name, source_blob_name, destination_file_name):
# Downloads a file from Firebase Storage.
storage_client = storage.Client()
bucket = storage_client.bucket(bucket_name)
blob = bucket.blob(source_blob_name)
blob.download_to_filename(destination_file_name)
logging.info(f"File downloaded to {destination_file_name}")
return destination_file_name
def upload_file_firebase(bucket_name, destination_blob_name, source_file_name):
# Uploads a file to Firebase Storage.
storage_client = storage.Client()
bucket = storage_client.bucket(bucket_name)
try:
blob = bucket.blob(destination_blob_name)
blob.upload_from_filename(source_file_name)
logging.info(f"File uploaded to {destination_blob_name}")
return True
except Exception as e:
import app
app.app.logger.error("Error uploading file to Google Cloud Storage: " + str(e))
return False
def upload_file_firebase_get_url(bucket_name, destination_blob_name, source_file_name):
# Uploads a file to Firebase Storage.
storage_client = storage.Client()
bucket = storage_client.bucket(bucket_name)
try:
blob = bucket.blob(destination_blob_name)
blob.upload_from_filename(source_file_name)
logging.info(f"File uploaded to {destination_blob_name}")
# Make the file public
blob.make_public()
# Get the public URL
url = blob.public_url
return url
except Exception as e:
import app
app.app.logger.error("Error uploading file to Google Cloud Storage: " + str(e))
return None
def save_to_db_with_id(mongo_db: Database, collection: str, item, id: str):
collection_ref = mongo_db[collection]
document_ref = collection_ref.insert_one({"id": id, **item})
if document_ref:
logging.info(f"Document added with ID: {document_ref.inserted_id}")
return (True, document_ref.inserted_id)
else:
return (False, None)
def get_all(mongo_db: Database, collection: str):
return list(mongo_db[collection].find())

View File

@@ -1,6 +1,6 @@
import jwt
import os
import jwt
from dotenv import load_dotenv
load_dotenv()

50
helper/gpt_zero.py Normal file
View File

@@ -0,0 +1,50 @@
from logging import getLogger
from typing import Dict, Optional
import requests
class GPTZero:
_GPT_ZERO_ENDPOINT = 'https://api.gptzero.me/v2/predict/text'
def __init__(self, gpt_zero_key: str):
self._logger = getLogger(__name__)
if gpt_zero_key is None:
self._logger.warning('GPT Zero key was not included! Skipping ai detection when grading.')
self._gpt_zero_key = gpt_zero_key
self._header = {
'x-api-key': gpt_zero_key
}
def run_detection(self, text: str):
if self._gpt_zero_key is None:
return None
data = {
'document': text,
'version': '',
'multilingual': False
}
response = requests.post(self._GPT_ZERO_ENDPOINT, headers=self._header, json=data)
if response.status_code != 200:
self._logger.error(f'GPT\'s Zero Endpoint returned with {response.status_code}: {response.json()}')
return None
return self._parse_detection(response.json())
def _parse_detection(self, response: Dict) -> Optional[Dict]:
try:
text_scan = response["documents"][0]
filtered_sentences = [
{
"sentence": item["sentence"],
"highlight_sentence_for_ai": item["highlight_sentence_for_ai"]
}
for item in text_scan["sentences"]
]
return {
"class_probabilities": text_scan["class_probabilities"],
"confidence_category": text_scan["confidence_category"],
"predicted_class": text_scan["predicted_class"],
"sentences": filtered_sentences
}
except Exception as e:
self._logger.error(f'Failed to parse GPT\'s Zero response: {str(e)}')
return None

179
helper/heygen_api.py Normal file
View File

@@ -0,0 +1,179 @@
import os
import random
import time
from logging import getLogger
import requests
from dotenv import load_dotenv
from helper.constants import *
from helper.firebase_helper import upload_file_firebase_get_url, save_to_db_with_id
from heygen.AvatarEnum import AvatarEnum
load_dotenv()
logger = getLogger(__name__)
# Get HeyGen token
TOKEN = os.getenv("HEY_GEN_TOKEN")
FIREBASE_BUCKET = os.getenv('FIREBASE_BUCKET')
# POST TO CREATE VIDEO
CREATE_VIDEO_URL = 'https://api.heygen.com/v1/template.generate'
GET_VIDEO_URL = 'https://api.heygen.com/v1/video_status.get'
POST_HEADER = {
'X-Api-Key': TOKEN,
'Content-Type': 'application/json'
}
GET_HEADER = {
'X-Api-Key': TOKEN
}
def create_videos_and_save_to_db(exercises, template, id):
avatar = random.choice(list(AvatarEnum))
# Speaking 1
# Using list comprehension to find the element with the desired value in the 'type' field
found_exercises_1 = [element for element in exercises if element.get('type') == 1]
# Check if any elements were found
if found_exercises_1:
exercise_1 = found_exercises_1[0]
sp1_questions = []
logger.info('Creating video for speaking part 1')
for question in exercise_1["questions"]:
sp1_result = create_video(question, avatar)
if sp1_result is not None:
sound_file_path = VIDEO_FILES_PATH + sp1_result
firebase_file_path = FIREBASE_SPEAKING_VIDEO_FILES_PATH + sp1_result
url = upload_file_firebase_get_url(FIREBASE_BUCKET, firebase_file_path, sound_file_path)
video = {
"text": question,
"video_path": firebase_file_path,
"video_url": url
}
sp1_questions.append(video)
else:
logger.error("Failed to create video for part 1 question: " + exercise_1["question"])
template["exercises"][0]["prompts"] = sp1_questions
template["exercises"][0]["first_title"] = exercise_1["first_topic"]
template["exercises"][0]["second_title"] = exercise_1["second_topic"]
# Speaking 2
# Using list comprehension to find the element with the desired value in the 'type' field
found_exercises_2 = [element for element in exercises if element.get('type') == 2]
# Check if any elements were found
if found_exercises_2:
exercise_2 = found_exercises_2[0]
logger.info('Creating video for speaking part 2')
sp2_result = create_video(exercise_2["question"], avatar)
if sp2_result is not None:
sound_file_path = VIDEO_FILES_PATH + sp2_result
firebase_file_path = FIREBASE_SPEAKING_VIDEO_FILES_PATH + sp2_result
url = upload_file_firebase_get_url(FIREBASE_BUCKET, firebase_file_path, sound_file_path)
sp2_video_path = firebase_file_path
sp2_video_url = url
template["exercises"][1]["prompts"] = exercise_2["prompts"]
template["exercises"][1]["text"] = exercise_2["question"]
template["exercises"][1]["title"] = exercise_2["topic"]
template["exercises"][1]["video_url"] = sp2_video_url
template["exercises"][1]["video_path"] = sp2_video_path
else:
logger.error("Failed to create video for part 2 question: " + exercise_2["question"])
# Speaking 3
# Using list comprehension to find the element with the desired value in the 'type' field
found_exercises_3 = [element for element in exercises if element.get('type') == 3]
# Check if any elements were found
if found_exercises_3:
exercise_3 = found_exercises_3[0]
sp3_questions = []
logger.info('Creating videos for speaking part 3')
for question in exercise_3["questions"]:
result = create_video(question, avatar)
if result is not None:
sound_file_path = VIDEO_FILES_PATH + result
firebase_file_path = FIREBASE_SPEAKING_VIDEO_FILES_PATH + result
url = upload_file_firebase_get_url(FIREBASE_BUCKET, firebase_file_path, sound_file_path)
video = {
"text": question,
"video_path": firebase_file_path,
"video_url": url
}
sp3_questions.append(video)
else:
logger.error("Failed to create video for part 3 question: " + question)
template["exercises"][2]["prompts"] = sp3_questions
template["exercises"][2]["title"] = exercise_3["topic"]
if not found_exercises_3:
template["exercises"].pop(2)
if not found_exercises_2:
template["exercises"].pop(1)
if not found_exercises_1:
template["exercises"].pop(0)
save_to_db_with_id("speaking", template, id)
logger.info('Saved speaking to DB with id ' + id + " : " + str(template))
def create_video(text, avatar):
# POST TO CREATE VIDEO
create_video_url = 'https://api.heygen.com/v2/template/' + avatar + '/generate'
data = {
"test": False,
"caption": False,
"title": "video_title",
"variables": {
"script_here": {
"name": "script_here",
"type": "text",
"properties": {
"content": text
}
}
}
}
response = requests.post(create_video_url, headers=POST_HEADER, json=data)
logger.info(response.status_code)
logger.info(response.json())
# GET TO CHECK STATUS AND GET VIDEO WHEN READY
video_id = response.json()["data"]["video_id"]
params = {
'video_id': response.json()["data"]["video_id"]
}
response = {}
status = "processing"
error = None
while status != "completed" and error is None:
response = requests.get(GET_VIDEO_URL, headers=GET_HEADER, params=params)
response_data = response.json()
status = response_data["data"]["status"]
error = response_data["data"]["error"]
if status != "completed" and error is None:
logger.info(f"Status: {status}")
time.sleep(10) # Wait for 10 second before the next request
logger.info(response.status_code)
logger.info(response.json())
# DOWNLOAD VIDEO
download_url = response.json()['data']['video_url']
output_directory = 'download-video/'
output_filename = video_id + '.mp4'
response = requests.get(download_url)
if response.status_code == 200:
os.makedirs(output_directory, exist_ok=True) # Create the directory if it doesn't exist
output_path = os.path.join(output_directory, output_filename)
with open(output_path, 'wb') as f:
f.write(response.content)
logger.info(f"File '{output_filename}' downloaded successfully.")
return output_filename
else:
logger.error(f"Failed to download file. Status code: {response.status_code}")
return None

View File

@@ -1,53 +1,250 @@
import json
import openai
import os
import re
from dotenv import load_dotenv
from openai import OpenAI
from helper.constants import BLACKLISTED_WORDS, GPT_3_5_TURBO
from helper.token_counter import count_tokens
load_dotenv()
openai.api_key = os.getenv("OPENAI_API_KEY")
client = OpenAI(api_key=os.getenv('OPENAI_API_KEY'))
MAX_TOKENS = 4097
TOP_P = 0.9
FREQUENCY_PENALTY = 0.5
TRY_LIMIT = 1
TRY_LIMIT = 2
try_count = 0
def process_response(input_string):
json_obj = {}
parsed_string = input_string.replace("'", "\"")
parsed_string = parsed_string.replace("\n\n", " ")
try:
json_obj = json.loads(parsed_string)
except json.JSONDecodeError:
print("Invalid JSON string!")
return json_obj
# GRADING SUMMARY
chat_config = {'max_tokens': 1000, 'temperature': 0.2}
section_keys = ['reading', 'listening', 'writing', 'speaking', 'level']
grade_top_limit = 9
tools = [{
"type": "function",
"function": {
"name": "save_evaluation_and_suggestions",
"description": "Saves the evaluation and suggestions requested by input.",
"parameters": {
"type": "object",
"properties": {
"evaluation": {
"type": "string",
"description": "A comment on the IELTS section grade obtained in the specific section and what it could mean without suggestions.",
},
"suggestions": {
"type": "string",
"description": "A small paragraph text with suggestions on how to possibly get a better grade than the one obtained.",
},
"bullet_points": {
"type": "string",
"description": "Text with four bullet points to improve the english speaking ability. Only include text for the bullet points separated by a paragraph. ",
},
},
"required": ["evaluation", "suggestions"],
},
}
}]
def check_fields(obj, fields):
return all(field in obj for field in fields)
def make_openai_call(messages, token_count, fields_to_check, temperature):
def make_openai_call(model, messages, token_count, fields_to_check, temperature, check_blacklisted=True):
global try_count
result = openai.ChatCompletion.create(
model="gpt-3.5-turbo",
result = client.chat.completions.create(
model=model,
max_tokens=int(MAX_TOKENS - token_count - 300),
temperature=float(temperature),
top_p=float(TOP_P),
frequency_penalty=float(FREQUENCY_PENALTY),
messages=messages
messages=messages,
response_format={"type": "json_object"}
)
processed_response = process_response(result["choices"][0]["message"]["content"])
if check_fields(processed_response, fields_to_check) is False and try_count < TRY_LIMIT:
result = result.choices[0].message.content
if check_blacklisted:
found_blacklisted_word = get_found_blacklisted_words(result)
if found_blacklisted_word is not None and try_count < TRY_LIMIT:
from app import app
app.logger.warning("Result contains blacklisted words: " + str(found_blacklisted_word))
try_count = try_count + 1
return make_openai_call(model, messages, token_count, fields_to_check, temperature)
elif found_blacklisted_word is not None and try_count >= TRY_LIMIT:
return ""
if fields_to_check is None:
return json.loads(result)
if check_fields(result, fields_to_check) is False and try_count < TRY_LIMIT:
try_count = try_count + 1
return make_openai_call(messages, token_count, fields_to_check)
return make_openai_call(model, messages, token_count, fields_to_check, temperature)
elif try_count >= TRY_LIMIT:
try_count = 0
return result["choices"][0]["message"]["content"]
return json.loads(result)
else:
try_count = 0
return processed_response
return json.loads(result)
# GRADING SUMMARY
def calculate_grading_summary(body):
extracted_sections = extract_existing_sections_from_body(body, section_keys)
ret = []
for section in extracted_sections:
openai_response_dict = calculate_section_grade_summary(section)
ret = ret + [{'code': section['code'], 'name': section['name'], 'grade': section['grade'],
'evaluation': openai_response_dict['evaluation'],
'suggestions': openai_response_dict['suggestions'],
'bullet_points': parse_bullet_points(openai_response_dict['bullet_points'], section['grade'])}]
return {'sections': ret}
def calculate_section_grade_summary(section):
messages = [
{
"role": "user",
"content": "You are a IELTS test section grade evaluator. You will receive a IELTS test section name and the grade obtained in the section. You should offer a evaluation comment on this grade and separately suggestions on how to possibly get a better grade.",
},
{
"role": "user",
"content": "Section: " + str(section['name']) + " Grade: " + str(section['grade']),
},
{"role": "user", "content": "Speak in third person."},
{"role": "user",
"content": "Don't offer suggestions in the evaluation comment. Only in the suggestions section."},
{"role": "user",
"content": "Your evaluation comment on the grade should enunciate the grade, be insightful, be speculative, be one paragraph long. "},
{"role": "user", "content": "Please save the evaluation comment and suggestions generated."},
{"role": "user", "content": f"Offer bullet points to improve the english {str(section['name'])} ability."},
]
if section['code'] == "level":
messages[2:2] = [{
"role": "user",
"content": "This section is comprised of multiple choice questions that measure the user's overall english level. These multiple choice questions are about knowledge on vocabulary, syntax, grammar rules, and contextual usage. The grade obtained measures the ability in these areas and english language overall."
}]
elif section['code'] == "speaking":
messages[2:2] = [{"role": "user",
"content": "This section is s designed to assess the English language proficiency of individuals who want to study or work in English-speaking countries. The speaking section evaluates a candidate's ability to communicate effectively in spoken English."}]
res = client.chat.completions.create(
model="gpt-3.5-turbo",
max_tokens=chat_config['max_tokens'],
temperature=chat_config['temperature'],
tools=tools,
messages=messages)
return parse_openai_response(res)
def parse_openai_response(response):
if 'choices' in response and len(response['choices']) > 0 and 'message' in response['choices'][
0] and 'tool_calls' in response['choices'][0]['message'] and isinstance(
response['choices'][0]['message']['tool_calls'], list) and len(
response['choices'][0]['message']['tool_calls']) > 0 and \
response['choices'][0]['message']['tool_calls'][0]['function']['arguments']:
return json.loads(response['choices'][0]['message']['tool_calls'][0]['function']['arguments'])
else:
return {'evaluation': "", 'suggestions': "", 'bullet_points': []}
def extract_existing_sections_from_body(my_dict, keys_to_extract):
if 'sections' in my_dict and isinstance(my_dict['sections'], list) and len(my_dict['sections']) > 0:
return list(filter(
lambda item: 'code' in item and item['code'] in keys_to_extract and 'grade' in item and 'name' in item,
my_dict['sections']))
def parse_bullet_points(bullet_points_str, grade):
max_grade_for_suggestions = 9
if isinstance(bullet_points_str, str) and grade < max_grade_for_suggestions:
# Split the string by '\n'
lines = bullet_points_str.split('\n')
# Remove '-' and trim whitespace from each line
cleaned_lines = [line.replace('-', '').strip() for line in lines]
# Add '.' to lines that don't end with it
return [line + '.' if line and not line.endswith('.') else line for line in cleaned_lines]
else:
return []
def get_fixed_text(text):
messages = [
{"role": "system", "content": ('You are a helpful assistant designed to output JSON on this format: '
'{"fixed_text": "fixed test with no misspelling errors"}')
},
{"role": "user", "content": (
'Fix the errors in the given text and put it in a JSON. Do not complete the answer, only replace what '
'is wrong. \n The text: "' + text + '"')
}
]
token_count = count_total_tokens(messages)
response = make_openai_call(GPT_3_5_TURBO, messages, token_count, ["fixed_text"], 0.2, False)
return response["fixed_text"]
def get_speaking_corrections(text):
messages = [
{"role": "system", "content": ('You are a helpful assistant designed to output JSON on this format: '
'{"fixed_text": "fixed transcription with no misspelling errors"}')
},
{"role": "user", "content": (
'Fix the errors in the provided transcription and put it in a JSON. Do not complete the answer, only '
'replace what is wrong. \n The text: "' + text + '"')
}
]
token_count = count_total_tokens(messages)
response = make_openai_call(GPT_3_5_TURBO, messages, token_count, ["fixed_text"], 0.2, False)
return response["fixed_text"]
def has_blacklisted_words(text: str):
text_lower = text.lower()
return any(word in text_lower for word in BLACKLISTED_WORDS)
def get_found_blacklisted_words(text: str):
text_lower = text.lower()
for word in BLACKLISTED_WORDS:
if re.search(r'\b' + re.escape(word) + r'\b', text_lower):
return word
return None
def remove_special_characters_from_beginning(string):
cleaned_string = string.lstrip('\n')
if string.startswith("'") or string.startswith('"'):
cleaned_string = string[1:]
if cleaned_string.endswith('"'):
return cleaned_string[:-1]
else:
return cleaned_string
def replace_expression_in_object(obj, expression, replacement):
if isinstance(obj, dict):
for key in obj:
if isinstance(obj[key], str):
obj[key] = obj[key].replace(expression, replacement)
elif isinstance(obj[key], list):
obj[key] = [replace_expression_in_object(item, expression, replacement) for item in obj[key]]
elif isinstance(obj[key], dict):
obj[key] = replace_expression_in_object(obj[key], expression, replacement)
return obj
def count_total_tokens(messages):
total_tokens = 0
for message in messages:
total_tokens += count_tokens(message["content"])["n_tokens"]
return total_tokens

1237
helper/question_templates.py Normal file

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,138 @@
import os
import random
import boto3
import nltk
import whisper
nltk.download('words')
from nltk.corpus import words
from helper.constants import *
def speech_to_text(file_path):
if os.path.exists(file_path):
model = whisper.load_model("base")
result = model.transcribe(file_path, fp16=False, language='English', verbose=False)
return result["text"]
else:
print("File not found:", file_path)
raise Exception("File " + file_path + " not found.")
def text_to_speech(text: str, file_name: str):
# Initialize the Amazon Polly client
client = boto3.client(
'polly',
region_name='eu-west-1',
aws_access_key_id=os.getenv("AWS_ACCESS_KEY_ID"),
aws_secret_access_key=os.getenv("AWS_SECRET_ACCESS_KEY")
)
voice = random.choice(ALL_NEURAL_VOICES)['Id']
# Initialize an empty list to store audio segments
audio_segments = []
for part in divide_text(text):
tts_response = client.synthesize_speech(
Engine="neural",
Text=part,
OutputFormat="mp3",
VoiceId=voice
)
audio_segments.append(tts_response['AudioStream'].read())
# Add finish message
audio_segments.append(client.synthesize_speech(
Engine="neural",
Text="This audio recording, for the listening exercise, has finished.",
OutputFormat="mp3",
VoiceId="Stephen"
)['AudioStream'].read())
# Combine the audio segments into a single audio file
combined_audio = b"".join(audio_segments)
# Save the combined audio to a single file
with open(file_name, "wb") as f:
f.write(combined_audio)
print("Speech segments saved to " + file_name)
def conversation_text_to_speech(conversation: list, file_name: str):
# Initialize the Amazon Polly client
client = boto3.client(
'polly',
region_name='eu-west-1',
aws_access_key_id=os.getenv("AWS_ACCESS_KEY_ID"),
aws_secret_access_key=os.getenv("AWS_SECRET_ACCESS_KEY")
)
# Initialize an empty list to store audio segments
audio_segments = []
# Iterate through the text segments, convert to audio segments, and store them
for segment in conversation:
response = client.synthesize_speech(
Engine="neural",
Text=segment["text"],
OutputFormat="mp3",
VoiceId=segment["voice"]
)
audio_segments.append(response['AudioStream'].read())
# Add finish message
audio_segments.append(client.synthesize_speech(
Engine="neural",
Text="This audio recording, for the listening exercise, has finished.",
OutputFormat="mp3",
VoiceId="Stephen"
)['AudioStream'].read())
# Combine the audio segments into a single audio file
combined_audio = b"".join(audio_segments)
# Save the combined audio to a single file
with open(file_name, "wb") as f:
f.write(combined_audio)
print("Speech segments saved to " + file_name)
def has_words(text: str):
if not has_common_words(text):
return False
english_words = set(words.words())
words_in_input = text.split()
return any(word.lower() in english_words for word in words_in_input)
def has_x_words(text: str, quantity):
if not has_common_words(text):
return False
english_words = set(words.words())
words_in_input = text.split()
english_word_count = sum(1 for word in words_in_input if word.lower() in english_words)
return english_word_count >= quantity
def has_common_words(text: str):
english_words = {"the", "be", "to", "of", "and", "a", "in", "that", "have", "i"}
words_in_input = text.split()
english_word_count = sum(1 for word in words_in_input if word.lower() in english_words)
return english_word_count >= 10
def divide_text(text, max_length=3000):
if len(text) <= max_length:
return [text]
divisions = []
current_position = 0
while current_position < len(text):
next_position = min(current_position + max_length, len(text))
next_period_position = text.rfind('.', current_position, next_position)
if next_period_position != -1 and next_period_position > current_position:
divisions.append(text[current_position:next_period_position + 1])
current_position = next_period_position + 1
else:
# If no '.' found in the next chunk, split at max_length
divisions.append(text[current_position:next_position])
current_position = next_position
return divisions

View File

@@ -1,5 +1,4 @@
# This is a work in progress. There are still bugs. Once it is production-ready this will become a full repo.
import os
def count_tokens(text, model_name="gpt-3.5-turbo", debug=False):
@@ -86,4 +85,4 @@ class TokenBuffer:
self.buffer = self.buffer.split(" ", removed_tokens)[-1]
def get_buffer(self):
return self.buffer
return self.buffer

11
heygen/AvatarEnum.py Normal file
View File

@@ -0,0 +1,11 @@
from enum import Enum
class AvatarEnum(Enum):
MATTHEW_NOAH = "5912afa7c77c47d3883af3d874047aaf"
VERA_CERISE = "9e58d96a383e4568a7f1e49df549e0e4"
EDWARD_TONY = "d2cdd9c0379a4d06ae2afb6e5039bd0c"
TANYA_MOLLY = "045cb5dcd00042b3a1e4f3bc1c12176b"
KAYLA_ABBI = "1ae1e5396cc444bfad332155fdb7a934"
JEROME_RYAN = "0ee6aa7cc1084063a630ae514fccaa31"
TYLER_CHRISTOPHER = "5772cff935844516ad7eeff21f839e43"

8572
heygen/avatars.json Normal file

File diff suppressed because it is too large Load Diff

3313
heygen/english_voices.json Normal file

File diff suppressed because it is too large Load Diff

18
heygen/filter_json.py Normal file
View File

@@ -0,0 +1,18 @@
import json
# Read JSON from a file
input_filename = "english_voices.json"
output_filename = "free_english_voices.json"
with open(input_filename, "r") as json_file:
data = json.load(json_file)
# Filter entries based on "language": "English"
filtered_list = [entry for entry in data["data"]["list"] if not entry["is_paid"]]
data["data"]["list"] = filtered_list
# Write filtered JSON to a new file
with open(output_filename, "w") as json_file:
json.dump(data, json_file, indent=2)
print(f"Filtered JSON written to '{output_filename}'.")

File diff suppressed because it is too large Load Diff

13777
heygen/voices.json Normal file

File diff suppressed because it is too large Load Diff

5
modules/__init__.py Normal file
View File

@@ -0,0 +1,5 @@
from .gpt import GPT
__all__ = [
"GPT"
]

View File

@@ -0,0 +1,5 @@
from .service import BatchUsers
__all__ = [
"BatchUsers"
]

View File

@@ -0,0 +1,31 @@
import uuid
from typing import Optional
from pydantic import BaseModel, Field
from datetime import datetime
class DemographicInfo(BaseModel):
phone: str
passport_id: Optional[str] = None
country: Optional[str] = None
class UserDTO(BaseModel):
id: uuid.UUID = Field(default_factory=uuid.uuid4)
email: str
name: str
type: str
passport_id: str
passwordHash: str
passwordSalt: str
groupName: Optional[str] = None
corporate: Optional[str] = None
studentID: Optional[str | int] = None
expiryDate: Optional[str] = None
demographicInformation: Optional[DemographicInfo] = None
class BatchUsersDTO(BaseModel):
makerID: str
users: list[UserDTO]

View File

@@ -0,0 +1,275 @@
import os
import subprocess
import time
import uuid
from datetime import datetime
from logging import getLogger
import pandas as pd
from typing import Dict
import shortuuid
from pymongo.database import Database
from modules.batch_users.batch_users import BatchUsersDTO, UserDTO
from modules.helper.file_helper import FileHelper
class BatchUsers:
_DEFAULT_DESIRED_LEVELS = {
"reading": 9,
"listening": 9,
"writing": 9,
"speaking": 9,
}
_DEFAULT_LEVELS = {
"reading": 0,
"listening": 0,
"writing": 0,
"speaking": 0,
}
def __init__(self, mongo: Database):
self._db: Database = mongo
self._logger = getLogger(__name__)
def batch_users(self, request_data: Dict):
batch_dto = self._map_to_batch(request_data)
file_name = f'{uuid.uuid4()}.csv'
path = f'./tmp/{file_name}'
self._generate_firebase_auth_csv(batch_dto, path)
result = self._upload_users('./tmp', file_name)
if result.returncode != 0:
error_msg = f"Couldn't upload users. Failed to run command firebase auth import -> ```cmd {result.stdout}```"
self._logger.error(error_msg)
return error_msg
self._init_users(batch_dto)
FileHelper.remove_file(path)
return {"ok": True}
@staticmethod
def _map_to_batch(request_data: Dict) -> BatchUsersDTO:
users_list = [{**user} for user in request_data["users"]]
for user in users_list:
user["studentID"] = str(user["studentID"])
users: list[UserDTO] = [UserDTO(**user) for user in users_list]
return BatchUsersDTO(makerID=request_data["makerID"], users=users)
@staticmethod
def _generate_firebase_auth_csv(batch_dto: BatchUsersDTO, path: str):
# https://firebase.google.com/docs/cli/auth#file_format
columns = [
'UID', 'Email', 'Email Verified', 'Password Hash', 'Password Salt', 'Name',
'Photo URL', 'Google ID', 'Google Email', 'Google Display Name', 'Google Photo URL',
'Facebook ID', 'Facebook Email', 'Facebook Display Name', 'Facebook Photo URL',
'Twitter ID', 'Twitter Email', 'Twitter Display Name', 'Twitter Photo URL',
'GitHub ID', 'GitHub Email', 'GitHub Display Name', 'GitHub Photo URL',
'User Creation Time', 'Last Sign-In Time', 'Phone Number'
]
users_data = []
current_time = int(time.time() * 1000)
for user in batch_dto.users:
user_data = {
'UID': str(user.id),
'Email': user.email,
'Email Verified': False,
'Password Hash': user.passwordHash,
'Password Salt': user.passwordSalt,
'Name': '',
'Photo URL': '',
'Google ID': '',
'Google Email': '',
'Google Display Name': '',
'Google Photo URL': '',
'Facebook ID': '',
'Facebook Email': '',
'Facebook Display Name': '',
'Facebook Photo URL': '',
'Twitter ID': '',
'Twitter Email': '',
'Twitter Display Name': '',
'Twitter Photo URL': '',
'GitHub ID': '',
'GitHub Email': '',
'GitHub Display Name': '',
'GitHub Photo URL': '',
'User Creation Time': current_time,
'Last Sign-In Time': '',
'Phone Number': ''
}
users_data.append(user_data)
df = pd.DataFrame(users_data, columns=columns)
df.to_csv(path, index=False, header=False)
@staticmethod
def _upload_users(directory: str, file_name: str):
command = (
f'firebase auth:import {file_name} '
f'--hash-algo=SCRYPT '
f'--hash-key={os.getenv("FIREBASE_SCRYPT_B64_SIGNER_KEY")} '
f'--salt-separator={os.getenv("FIREBASE_SCRYPT_B64_SALT_SEPARATOR")} '
f'--rounds={os.getenv("FIREBASE_SCRYPT_ROUNDS")} '
f'--mem-cost={os.getenv("FIREBASE_SCRYPT_MEM_COST")} '
f'--project={os.getenv("FIREBASE_PROJECT_ID")} '
)
result = subprocess.run(command, shell=True, cwd=directory, capture_output=True, text=True)
return result
def _init_users(self, batch_users: BatchUsersDTO):
maker_id = batch_users.makerID
for user in batch_users.users:
self._insert_new_user(user)
code = self._create_code(user, maker_id)
if user.type == "corporate":
self._set_corporate_default_groups(user)
if user.corporate:
self._assign_corporate_to_user(user, code)
if user.groupName and len(user.groupName.strip()) > 0:
self._assign_user_to_group_by_name(user, maker_id)
def _insert_new_user(self, user: UserDTO):
new_user = {
**user.dict(exclude={
'passport_id', 'groupName', 'expiryDate',
'corporate', 'passwordHash', 'passwordSalt'
}),
'id': str(user.id),
'bio': "",
'focus': "academic",
'status': "active",
'desiredLevels': self._DEFAULT_DESIRED_LEVELS,
'profilePicture': "/defaultAvatar.png",
'levels': self._DEFAULT_LEVELS,
'isFirstLogin': False,
'isVerified': True,
'registrationDate': datetime.now(),
'subscriptionExpirationDate': user.expiryDate
}
self._db.users.insert_one(new_user)
def _create_code(self, user: UserDTO, maker_id: str) -> str:
code = shortuuid.ShortUUID().random(length=6)
self._db.codes.insert_one({
'id': code,
'code': code,
'creator': maker_id,
'expiryDate': user.expiryDate,
'type': user.type,
'creationDate': datetime.now(),
'userId': str(user.id),
'email': user.email,
'name': user.name,
'passport_id': user.passport_id
})
return code
def _set_corporate_default_groups(self, user: UserDTO):
user_id = str(user.id)
default_groups = [
{
'admin': user_id,
'id': str(uuid.uuid4()),
'name': "Teachers",
'participants': [],
'disableEditing': True,
},
{
'admin': user_id,
'id': str(uuid.uuid4()),
'name': "Students",
'participants': [],
'disableEditing': True,
},
{
'admin': user_id,
'id': str(uuid.uuid4()),
'name': "Corporate",
'participants': [],
'disableEditing': True,
}
]
for group in default_groups:
self._db.groups.insert_one(group)
def _assign_corporate_to_user(self, user: UserDTO, code: str):
user_id = str(user.id)
corporate_user = self._db.users.find_one(
{"email": user.corporate}
)
if corporate_user:
self._db.codes.update_one(
{"id": code},
{"$set": {"creator": corporate_user["id"]}},
upsert=True
)
group_type = "Students" if user.type == "student" else "Teachers"
group = self._db.groups.find_one(
{
"admin": corporate_user["id"],
"name": group_type
}
)
if group:
participants = group['participants']
if user_id not in participants:
participants.append(user_id)
self._db.groups.update_one(
{"id": group["id"]},
{"$set": {"participants": participants}}
)
else:
group = {
'admin': corporate_user["id"],
'id': str(uuid.uuid4()),
'name': group_type,
'participants': [user_id],
'disableEditing': True,
}
self._db.groups.insert_one(group)
def _assign_user_to_group_by_name(self, user: UserDTO, maker_id: str):
user_id = str(user.id)
groups = list(self._db.groups.find(
{
"admin": maker_id,
"name": user.groupName.strip()
}
))
if len(groups) == 0:
new_group = {
'id': str(uuid.uuid4()),
'admin': maker_id,
'name': user.groupName.strip(),
'participants': [user_id],
'disableEditing': False,
}
self._db.groups.insert_one(new_group)
else:
group = groups[0]
participants = group["participants"]
if user_id not in participants:
participants.append(user_id)
self._db.groups.update_one(
{"id": group["id"]},
{"$set": {"participants": participants}}
)

66
modules/gpt.py Normal file
View File

@@ -0,0 +1,66 @@
import json
from logging import getLogger
from typing import List, Optional, Callable, TypeVar
from openai.types.chat import ChatCompletionMessageParam
from pydantic import BaseModel
T = TypeVar('T', bound=BaseModel)
class GPT:
def __init__(self, openai_client):
self._client = openai_client
self._default_model = "gpt-4o-2024-08-06"
self._logger = getLogger(__name__)
def prediction(
self,
messages: List[ChatCompletionMessageParam],
map_to_model: Callable,
json_scheme: str,
*,
model: Optional[str] = None,
temperature: Optional[float] = None,
max_retries: int = 3
) -> List[T] | T | None:
params = {
"messages": messages,
"response_format": {"type": "json_object"},
"model": model if model else self._default_model
}
if temperature:
params["temperature"] = temperature
attempt = 0
while attempt < max_retries:
result = self._client.chat.completions.create(**params)
result_content = result.choices[0].message.content
try:
result_json = json.loads(result_content)
return map_to_model(result_json)
except Exception as e:
attempt += 1
self._logger.info(f"GPT returned malformed response: {result_content}\n {str(e)}")
params["messages"] = [
{
"role": "user",
"content": (
"Your previous response wasn't in the json format I've explicitly told you to output. "
f"In your next response, you will fix it and return me just the json I've asked."
)
},
{
"role": "user",
"content": (
f"Previous response: {result_content}\n"
f"JSON format: {json_scheme}"
)
}
]
if attempt >= max_retries:
self._logger.error(f"Max retries exceeded!")
return None

View File

@@ -0,0 +1,5 @@
from .logger import LoggerHelper
__all__ = [
"LoggerHelper"
]

View File

@@ -0,0 +1,97 @@
import base64
import io
import os
import shutil
import subprocess
import uuid
from typing import Optional, Tuple
import numpy as np
import pypandoc
from PIL import Image
class FileHelper:
# Supposedly pandoc covers a wide range of file extensions only tested with docx
@staticmethod
def convert_file_to_pdf(input_path: str, output_path: str):
pypandoc.convert_file(input_path, 'pdf', outputfile=output_path, extra_args=[
'-V', 'geometry:paperwidth=5.5in',
'-V', 'geometry:paperheight=8.5in',
'-V', 'geometry:margin=0.5in',
'-V', 'pagestyle=empty'
])
@staticmethod
def convert_file_to_html(input_path: str, output_path: str):
pypandoc.convert_file(input_path, 'html', outputfile=output_path)
@staticmethod
def pdf_to_png(path_id: str):
to_png = f"pdftoppm -png exercises.pdf page"
result = subprocess.run(to_png, shell=True, cwd=f'./tmp/{path_id}', capture_output=True, text=True)
if result.returncode != 0:
raise Exception(
f"Couldn't convert pdf to png. Failed to run command '{to_png}' -> ```cmd {result.stderr}```")
@staticmethod
def is_page_blank(image_bytes: bytes, image_threshold=10) -> bool:
with Image.open(io.BytesIO(image_bytes)) as img:
img_gray = img.convert('L')
img_array = np.array(img_gray)
non_white_pixels = np.sum(img_array < 255)
return non_white_pixels <= image_threshold
@classmethod
def _encode_image(cls, image_path: str, image_threshold=10) -> Optional[str]:
with open(image_path, "rb") as image_file:
image_bytes = image_file.read()
if cls.is_page_blank(image_bytes, image_threshold):
return None
return base64.b64encode(image_bytes).decode('utf-8')
@classmethod
def b64_pngs(cls, path_id: str, files: list[str]):
png_messages = []
for filename in files:
b64_string = cls._encode_image(os.path.join(f'./tmp/{path_id}', filename))
if b64_string:
png_messages.append({
"type": "image_url",
"image_url": {
"url": f"data:image/png;base64,{b64_string}"
}
})
return png_messages
@staticmethod
def remove_directory(path):
try:
if os.path.exists(path):
if os.path.isdir(path):
shutil.rmtree(path)
except Exception as e:
print(f"An error occurred while trying to remove {path}: {str(e)}")
@staticmethod
def remove_file(file_path):
try:
if os.path.exists(file_path):
if os.path.isfile(file_path):
os.remove(file_path)
except Exception as e:
print(f"An error occurred while trying to remove the file {file_path}: {str(e)}")
@staticmethod
def save_upload(file) -> Tuple[str, str]:
ext = file.filename.split('.')[-1]
path_id = str(uuid.uuid4())
os.makedirs(f'./tmp/{path_id}', exist_ok=True)
tmp_filename = f'./tmp/{path_id}/uploaded.{ext}'
file.save(tmp_filename)
return ext, path_id

23
modules/helper/logger.py Normal file
View File

@@ -0,0 +1,23 @@
import logging
from functools import wraps
class LoggerHelper:
@staticmethod
def suppress_loggers():
def decorator(f):
@wraps(f)
def wrapped(*args, **kwargs):
root_logger = logging.getLogger()
original_level = root_logger.level
root_logger.setLevel(logging.ERROR)
try:
return f(*args, **kwargs)
finally:
root_logger.setLevel(original_level)
return wrapped
return decorator

View File

@@ -0,0 +1,7 @@
from .kb import TrainingContentKnowledgeBase
from .service import TrainingContentService
__all__ = [
"TrainingContentService",
"TrainingContentKnowledgeBase"
]

View File

@@ -0,0 +1,29 @@
from pydantic import BaseModel
from typing import List
class QueryDTO(BaseModel):
category: str
text: str
class DetailsDTO(BaseModel):
exam_id: str
date: int
performance_comment: str
detailed_summary: str
class WeakAreaDTO(BaseModel):
area: str
comment: str
class TrainingContentDTO(BaseModel):
details: List[DetailsDTO]
weak_areas: List[WeakAreaDTO]
queries: List[QueryDTO]
class TipsDTO(BaseModel):
tip_ids: List[str]

View File

@@ -0,0 +1,85 @@
import json
import os
from logging import getLogger
from typing import Dict, List
import faiss
import pickle
class TrainingContentKnowledgeBase:
def __init__(self, embeddings, path: str = 'pathways_2_rw_with_ids.json'):
self._embedding_model = embeddings
self._tips = None # self._read_json(path)
self._category_metadata = None
self._indices = None
self._logger = getLogger(__name__)
@staticmethod
def _read_json(path: str) -> Dict[str, any]:
with open(path, 'r', encoding="utf-8") as json_file:
return json.loads(json_file.read())
def print_category_count(self):
category_tips = {}
for unit in self._tips['units']:
for page in unit['pages']:
for tip in page['tips']:
category = tip['category'].lower().replace(" ", "_")
if category not in category_tips:
category_tips[category] = 0
else:
category_tips[category] = category_tips[category] + 1
print(category_tips)
def create_embeddings_and_save_them(self) -> None:
category_embeddings = {}
category_metadata = {}
for unit in self._tips['units']:
for page in unit['pages']:
for tip in page['tips']:
category = tip['category'].lower().replace(" ", "_")
if category not in category_embeddings:
category_embeddings[category] = []
category_metadata[category] = []
category_embeddings[category].append(tip['embedding'])
category_metadata[category].append({"id": tip['id'], "text": tip['text']})
category_indices = {}
for category, embeddings in category_embeddings.items():
embeddings_array = self._embedding_model.encode(embeddings)
index = faiss.IndexFlatL2(embeddings_array.shape[1])
index.add(embeddings_array)
category_indices[category] = index
faiss.write_index(index, f"./faiss/{category}_tips_index.faiss")
with open("./faiss/tips_metadata.pkl", "wb") as f:
pickle.dump(category_metadata, f)
def load_indices_and_metadata(
self,
directory: str = './faiss',
suffix: str = '_tips_index.faiss',
metadata_path: str = './faiss/tips_metadata.pkl'
):
files = os.listdir(directory)
self._indices = {}
for file in files:
if file.endswith(suffix):
self._indices[file[:-len(suffix)]] = faiss.read_index(f'{directory}/{file}')
self._logger.info(f'Loaded embeddings for {file[:-len(suffix)]} category.')
with open(metadata_path, 'rb') as f:
self._category_metadata = pickle.load(f)
self._logger.info("Loaded tips metadata")
def query_knowledge_base(self, query: str, category: str, top_k: int = 5) -> List[Dict[str, str]]:
query_embedding = self._embedding_model.encode([query])
index = self._indices[category]
D, I = index.search(query_embedding, top_k)
results = [self._category_metadata[category][i] for i in I[0]]
return results

View File

@@ -0,0 +1,407 @@
import json
import uuid
from datetime import datetime
from logging import getLogger
from typing import Dict, List
from pymongo.database import Database
from modules.training_content.dtos import TrainingContentDTO, WeakAreaDTO, QueryDTO, DetailsDTO, TipsDTO
class TrainingContentService:
TOOLS = [
'critical_thinking',
'language_for_writing',
'reading_skills',
'strategy',
'words',
'writing_skills'
]
# strategy word_link ct_focus reading_skill word_partners writing_skill language_for_writing
def __init__(self, kb, openai, mongo: Database):
self._training_content_module = kb
self._db: Database = mongo
self._logger = getLogger(__name__)
self._llm = openai
def get_tips(self, training_content):
user, stats = training_content["userID"], training_content["stats"]
exam_data, exam_map = self._sort_out_solutions(stats)
training_content = self._get_exam_details_and_tips(exam_data)
tips = self._query_kb(training_content.queries)
usefull_tips = self._get_usefull_tips(exam_data, tips)
exam_map = self._merge_exam_map_with_details(exam_map, training_content.details)
weak_areas = {"weak_areas": []}
for area in training_content.weak_areas:
weak_areas["weak_areas"].append(area.dict())
new_id = str(uuid.uuid4())
training_doc = {
'id': new_id,
'created_at': int(datetime.now().timestamp() * 1000),
**exam_map,
**usefull_tips.dict(),
**weak_areas,
"user": user
}
self._db.training.insert_one(training_doc)
return {
"id": new_id
}
@staticmethod
def _merge_exam_map_with_details(exam_map: Dict[str, any], details: List[DetailsDTO]):
new_exam_map = {"exams": []}
for detail in details:
new_exam_map["exams"].append({
"id": detail.exam_id,
"date": detail.date,
"performance_comment": detail.performance_comment,
"detailed_summary": detail.detailed_summary,
**exam_map[detail.exam_id]
})
return new_exam_map
def _query_kb(self, queries: List[QueryDTO]):
map_categories = {
"critical_thinking": "ct_focus",
"language_for_writing": "language_for_writing",
"reading_skills": "reading_skill",
"strategy": "strategy",
"writing_skills": "writing_skill"
}
tips = {"tips": []}
for query in queries:
if query.category == "words":
tips["tips"].extend(
self._training_content_module.query_knowledge_base(query.text, "word_link")
)
tips["tips"].extend(
self._training_content_module.query_knowledge_base(query.text, "word_partners")
)
else:
if query.category in map_categories:
tips["tips"].extend(
self._training_content_module.query_knowledge_base(query.text, map_categories[query.category])
)
else:
self._logger.info(f"GTP tried to query knowledge base for {query.category} and it doesn't exist.")
return tips
def _get_exam_details_and_tips(self, exam_data: Dict[str, any]) -> TrainingContentDTO:
json_schema = (
'{ "details": [{"exam_id": "", "date": 0, "performance_comment": "", "detailed_summary": ""}],'
' "weak_areas": [{"area": "", "comment": ""}], "queries": [{"text": "", "category": ""}] }'
)
messages = [
{
"role": "user",
"content": (
f"I'm going to provide you with exam data, you will take the exam data and fill this json "
f'schema : {json_schema}. "performance_comment" is a short sentence that describes the '
'students\'s performance and main mistakes in a single exam, "detailed_summary" is a detailed '
'summary of the student\'s performance, "weak_areas" are identified areas'
' across all exams which need to be improved upon, for example, area "Grammar and Syntax" comment "Issues'
' with sentence structure and punctuation.", the "queries" field is where you will write queries '
'for tips that will be displayed to the student, the category attribute is a collection of '
'embeddings and the text will be the text used to query the knowledge base. The categories are '
f'the following [{", ".join(self.TOOLS)}]. The exam data will be a json where the key of the field '
'"exams" is the exam id, an exam can be composed of multiple modules or single modules. The student'
' will see your response so refrain from using phrasing like "The student" did x, y and z. If the '
'field "answer" in a question is an empty array "[]", then the student didn\'t answer any question '
'and you must address that in your response. Also questions aren\'t modules, the only modules are: '
'level, speaking, writing, reading and listening. The details array needs to be tailored to the '
'exam attempt, even if you receive the same exam you must treat as different exams by their id.'
'Don\'t make references to an exam by it\'s id, the GUI will handle that so the student knows '
'which is the exam your comments and summary are referencing too. Even if the student hasn\'t '
'submitted no answers for an exam, you must still fill the details structure addressing that fact.'
)
},
{
"role": "user",
"content": f'Exam Data: {str(exam_data)}'
}
]
return self._llm.prediction(messages, self._map_gpt_response, json_schema)
def _get_usefull_tips(self, exam_data: Dict[str, any], tips: Dict[str, any]) -> TipsDTO:
json_schema = (
'{ "tip_ids": [] }'
)
messages = [
{
"role": "user",
"content": (
f"I'm going to provide you with tips and I want you to return to me the tips that "
f"can be usefull for the student that made the exam that I'm going to send you, return "
f"me the tip ids in this json format {json_schema}."
)
},
{
"role": "user",
"content": f'Exam Data: {str(exam_data)}'
},
{
"role": "user",
"content": f'Tips: {str(tips)}'
}
]
return self._llm.prediction(messages, lambda response: TipsDTO(**response), json_schema)
@staticmethod
def _map_gpt_response(response: Dict[str, any]) -> TrainingContentDTO:
parsed_response = {
"details": [DetailsDTO(**detail) for detail in response["details"]],
"weak_areas": [WeakAreaDTO(**area) for area in response["weak_areas"]],
"queries": [QueryDTO(**query) for query in response["queries"]]
}
return TrainingContentDTO(**parsed_response)
def _sort_out_solutions(self, stats):
grouped_stats = {}
for stat in stats:
session_key = f'{str(stat["date"])}-{stat["user"]}'
module = stat["module"]
exam_id = stat["exam"]
if session_key not in grouped_stats:
grouped_stats[session_key] = {}
if module not in grouped_stats[session_key]:
grouped_stats[session_key][module] = {
"stats": [],
"exam_id": exam_id
}
grouped_stats[session_key][module]["stats"].append(stat)
exercises = {}
exam_map = {}
for session_key, modules in grouped_stats.items():
exercises[session_key] = {}
for module, module_stats in modules.items():
exercises[session_key][module] = {}
exam_id = module_stats["exam_id"]
if exam_id not in exercises[session_key][module]:
exercises[session_key][module][exam_id] = {"date": None, "exercises": []}
exam_total_questions = 0
exam_total_correct = 0
for stat in module_stats["stats"]:
exam_total_questions += stat["score"]["total"]
exam_total_correct += stat["score"]["correct"]
exercises[session_key][module][exam_id]["date"] = stat["date"]
if session_key not in exam_map:
exam_map[session_key] = {"stat_ids": [], "score": 0}
exam_map[session_key]["stat_ids"].append(stat["id"])
exam = self._get_doc_by_id(module, exam_id)
if module == "listening":
exercises[session_key][module][exam_id]["exercises"].extend(
self._get_listening_solutions(stat, exam))
elif module == "reading":
exercises[session_key][module][exam_id]["exercises"].extend(
self._get_reading_solutions(stat, exam))
elif module == "writing":
exercises[session_key][module][exam_id]["exercises"].extend(
self._get_writing_prompts_and_answers(stat, exam)
)
elif module == "speaking":
exercises[session_key][module][exam_id]["exercises"].extend(
self._get_speaking_solutions(stat, exam)
)
elif module == "level":
exercises[session_key][module][exam_id]["exercises"].extend(
self._get_level_solutions(stat, exam)
)
exam_map[session_key]["score"] = round((exam_total_correct / exam_total_questions) * 100)
exam_map[session_key]["module"] = module
return {"exams": exercises}, exam_map
def _get_writing_prompts_and_answers(self, stat, exam):
result = []
try:
exercises = []
for solution in stat['solutions']:
answer = solution['solution']
exercise_id = solution['id']
exercises.append({
"exercise_id": exercise_id,
"answer": answer
})
for exercise in exercises:
for exam_exercise in exam["exercises"]:
if exam_exercise["id"] == exercise["exercise_id"]:
result.append({
"exercise": exam_exercise["prompt"],
"answer": exercise["answer"]
})
except KeyError as e:
self._logger.warning(f"Malformed stat object: {str(e)}")
return result
@staticmethod
def _get_mc_question(exercise, stat):
shuffle_maps = stat.get("shuffleMaps", [])
answer = stat["solutions"] if len(shuffle_maps) == 0 else []
if len(shuffle_maps) != 0:
for solution in stat["solutions"]:
shuffle_map = [
item["map"] for item in shuffle_maps
if item["questionID"] == solution["question"]
]
answer.append({
"question": solution["question"],
"option": shuffle_map[solution["option"]]
})
return {
"question": exercise["prompt"],
"exercise": exercise["questions"],
"answer": stat["solutions"]
}
@staticmethod
def _swap_key_name(d, original_key, new_key):
d[new_key] = d.pop(original_key)
return d
def _get_level_solutions(self, stat, exam):
result = []
try:
for part in exam["parts"]:
for exercise in part["exercises"]:
if exercise["id"] == stat["exercise"]:
if stat["type"] == "fillBlanks":
result.append({
"prompt": exercise["prompt"],
"template": exercise["text"],
"words": exercise["words"],
"solutions": exercise["solutions"],
"answer": [
self._swap_key_name(item, 'solution', 'option')
for item in stat["solutions"]
]
})
elif stat["type"] == "multipleChoice":
result.append(self._get_mc_question(exercise, stat))
except KeyError as e:
self._logger.warning(f"Malformed stat object: {str(e)}")
return result
def _get_listening_solutions(self, stat, exam):
result = []
try:
for part in exam["parts"]:
for exercise in part["exercises"]:
if exercise["id"] == stat["exercise"]:
if stat["type"] == "writeBlanks":
result.append({
"question": exercise["prompt"],
"template": exercise["text"],
"solution": exercise["solutions"],
"answer": stat["solutions"]
})
elif stat["type"] == "fillBlanks":
result.append({
"question": exercise["prompt"],
"template": exercise["text"],
"words": exercise["words"],
"solutions": exercise["solutions"],
"answer": stat["solutions"]
})
elif stat["type"] == "multipleChoice":
result.append(self._get_mc_question(exercise, stat))
except KeyError as e:
self._logger.warning(f"Malformed stat object: {str(e)}")
return result
@staticmethod
def _find_shuffle_map(shuffle_maps, question_id):
return next((item["map"] for item in shuffle_maps if item["questionID"] == question_id), None)
def _get_speaking_solutions(self, stat, exam):
result = {}
try:
result = {
"comments": {
key: value['comment'] for key, value in stat['solutions'][0]['evaluation']['task_response'].items()}
,
"exercises": {}
}
for exercise in exam["exercises"]:
if exercise["id"] == stat["exercise"]:
if stat["type"] == "interactiveSpeaking":
for i in range(len(exercise["prompts"])):
result["exercises"][f"exercise_{i+1}"] = {
"question": exercise["prompts"][i]["text"]
}
for i in range(len(exercise["prompts"])):
answer = stat['solutions'][0]["evaluation"].get(f'transcript_{i+1}', '')
result["exercises"][f"exercise_{i+1}"]["answer"] = answer
elif stat["type"] == "speaking":
result["exercises"]["exercise_1"] = {
"question": exercise["text"],
"answer": stat['solutions'][0]["evaluation"].get(f'transcript', '')
}
except KeyError as e:
self._logger.warning(f"Malformed stat object: {str(e)}")
return [result]
def _get_reading_solutions(self, stat, exam):
result = []
try:
for part in exam["parts"]:
text = part["text"]
for exercise in part["exercises"]:
if exercise["id"] == stat["exercise"]:
if stat["type"] == "fillBlanks":
result.append({
"text": text,
"question": exercise["prompt"],
"template": exercise["text"],
"words": exercise["words"],
"solutions": exercise["solutions"],
"answer": stat["solutions"]
})
elif stat["type"] == "writeBlanks":
result.append({
"text": text,
"question": exercise["prompt"],
"template": exercise["text"],
"solutions": exercise["solutions"],
"answer": stat["solutions"]
})
elif stat["type"] == "trueFalse":
result.append({
"text": text,
"questions": exercise["questions"],
"answer": stat["solutions"]
})
elif stat["type"] == "matchSentences":
result.append({
"text": text,
"question": exercise["prompt"],
"sentences": exercise["sentences"],
"options": exercise["options"],
"answer": stat["solutions"]
})
except KeyError as e:
self._logger.warning(f"Malformed stat object: {str(e)}")
return result
def _get_doc_by_id(self, collection: str, doc_id: str):
doc = self._db[collection].find_one({"id": doc_id})
return doc

View File

@@ -0,0 +1,67 @@
# Adding new training content
If you're ever tasked with the grueling task of adding more tips from manuals, my condolences.
There are 4 components of a training content tip: the tip itself, the question, the additional and the segment.
The tip is the actual tip, if the manual doesn't have an exercise that relates to that tip fill this out:
```json
{
"category": "<the category of the tip that will be used to categorize the embeddings and also used in the tip header>",
"embedding": "<the relevant part of the tip that is needed to make the embedding (clean the tip of useless info that might mislead the queries)>",
"text": "<The text that the llm will use to assess whether the tip is relevant according to the performance of the student (most of the time just include all the text of the tip)>",
"html": "<The html that will be rendered in the tip component>",
"id": "<a uuid4>",
"verified": <this is just to keep track of the tips that were manually confirmed by you>,
"standalone": <if the tip doesn't have an exercise this is true else it's false>
}
```
If the manual does have an exercise that relates to the tip:
```json
{
// ...
"question": "<the exercise question(s) html>",
"additional": "<context of the question html>",
"segments": [
{
"html": "<the html of a segment, you MUST wrap the html in a single <div> >",
"wordDelay": <the speed at which letters will be placed on the segment, 200ms is a good one>,
"holdDelay": <the total time that the segment will be paused before moving onto the next segment, 5000ms is a good one>,
"highlight": [
{
"targets": ["<the target of the highlight can be: question, additional, segment, all>"],
"phrases": ["<the words/phrases/raw html you want to highlight>"]
}
],
"insertHTML": [
{
"target": "<the target of the insert can be: question, additional>",
"targetId": "<the id of an html element>",
"position": "<the position of the inserted html can be: replace, prepend and append. Most of the time you will only use replace>",
"html": "<the html to replace the element with targetId>"
},
]
}
]
}
```
In order to create these structures you will have to mannually screenshot the tips, exercises, context and send them to an llm (gpt-4o or claude)
with a prompt like "get me the html for this", you will have to check whether the html is properly structured and then
paste them in the prompt.txt file of this directory and send it
back to an llm.
Afterwards you will have to check whether the default styles in /src/components/TrainingContent/FormatTip.ts are adequate, divs
(except for the wrapper div of a segment) and span styles are not overriden but you should aim to use the least ammount of
styles in the tip itself and create custom reusable html elements
in FormatTip.ts.
After checking all of the tips render you will have to create new embeddings in the backend, you CAN'T change ids of existing tips since there
might be training tips that are already stored in firebase.
This is a very tedious task here's a recommendation for [background noise](https://www.youtube.com/watch?v=lDnva_3fcTc).
GL HF

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,62 @@
I am going to give you an exercise and a tip, explain how to solve the exercise and how the tip is beneficial,
your response must be with this format:
{
"segments": [
{
"html": "",
"wordDelay": 0,
"holdDelay"; 0,
"highlight": [
{
"targets": [],
"phrases": []
}
],
"insertHTML": [
{
"target": "",
"targetId": "",
"position": "replace",
"html": ""
}
]
}
]
}
Basically you are going to produce multiple objects and place it in data with the format above to integrate with a react component that highlights passages and inserts html,
these objects are segments of your explanation that will be presented to a student.
In the html field place a segment of your response that will be streamed to the component with a delay of "wordDelay" ms and in the end of that segment stream the phrases or words inside
"highlight" will be highlighted for "holdDelay" ms, and the cycle repeats until the whole data array is iterated. Make it so
that the delays are reasonable for the student have time to process the message your trying to send. Take note that
"wordDelay" is the time between words to display (always 200), and "holdDelay" (no less than 5000) is the total time the highlighter will highlight what you put
inside "highlight".
There are 3 target areas:
- "question": where the question is placed
- "additional": where additional content is placed required to answer the question (this section is optional)
- "segment": a particular segment
You can use these targets in highlight and insertHTML. In order for insertHTML to work, you will have to place an html element with an "id" attribute
in the targets you will reference and provide the id via the "targetId", by this I mean if you want to use insert you will need to provide me the
html I've sent you with either a placeholder element with an id set or set an id in an existent element.
If there are already id's in the html I'm giving you then you must use insertHtml.
Each segment html will be rendered in a div that as margins, you should condense the information don't give me just single short phrases that occupy a whole div.
As previously said this wil be seen by a student so show some train of thought to solve the exercise.
All the segment's html must be wrapped in a div element, and again since this div element will be rendered with some margins make proper use of the segments html.
Try to make bulletpoints.
Dont explicitely mention the tip right away at the beginning, aim more towards the end.
Tip:
Target: "question"
Target: "additional"

View File

@@ -0,0 +1,34 @@
import json
import os
from dotenv import load_dotenv
from pymongo import MongoClient
load_dotenv()
# staging: encoach-staging.json
# prod: storied-phalanx-349916.json
mongo_db = MongoClient(os.getenv('MONGODB_URI'))[os.getenv('MONGODB_DB')]
if __name__ == "__main__":
with open('pathways_2_rw.json', 'r', encoding='utf-8') as file:
book = json.load(file)
tips = []
for unit in book["units"]:
for page in unit["pages"]:
for tip in page["tips"]:
new_tip = {
"id": tip["id"],
"standalone": tip["standalone"],
"tipCategory": tip["category"],
"tipHtml": tip["html"]
}
if not tip["standalone"]:
new_tip["exercise"] = tip["exercise"]
tips.append(new_tip)
for tip in tips:
doc_ref = mongo_db.walkthrough.insert_one(tip)

View File

@@ -0,0 +1,5 @@
from .service import UploadLevelService
__all__ = [
"UploadLevelService"
]

View File

@@ -0,0 +1,57 @@
from pydantic import BaseModel, Field
from typing import List, Dict, Union, Optional, Any
from uuid import uuid4, UUID
class Option(BaseModel):
id: str
text: str
class MultipleChoiceQuestion(BaseModel):
id: str
prompt: str
variant: str = "text"
solution: str
options: List[Option]
class MultipleChoiceExercise(BaseModel):
id: UUID = Field(default_factory=uuid4)
type: str = "multipleChoice"
prompt: str = "Select the appropriate option."
questions: List[MultipleChoiceQuestion]
userSolutions: List = Field(default_factory=list)
class FillBlanksWord(BaseModel):
id: str
options: Dict[str, str]
class FillBlanksSolution(BaseModel):
id: str
solution: str
class FillBlanksExercise(BaseModel):
id: UUID = Field(default_factory=uuid4)
type: str = "fillBlanks"
variant: str = "mc"
prompt: str = "Click a blank to select the appropriate word for it."
text: str
solutions: List[FillBlanksSolution]
words: List[FillBlanksWord]
userSolutions: List = Field(default_factory=list)
Exercise = Union[MultipleChoiceExercise, FillBlanksExercise]
class Part(BaseModel):
exercises: List[Exercise]
context: Optional[str] = Field(default=None)
class Exam(BaseModel):
parts: List[Part]

View File

@@ -0,0 +1,66 @@
from typing import Dict, Any
from pydantic import ValidationError
from modules.upload_level.exam_dtos import (
MultipleChoiceExercise,
FillBlanksExercise,
Part, Exam
)
from modules.upload_level.sheet_dtos import Sheet, Option, MultipleChoiceQuestion, FillBlanksWord
class ExamMapper:
@staticmethod
def map_to_exam_model(response: Dict[str, Any]) -> Exam:
parts = []
for part in response['parts']:
part_exercises = part['exercises']
context = part.get('context', None)
exercises = []
for exercise in part_exercises:
exercise_type = exercise['type']
if exercise_type == 'multipleChoice':
exercise_model = MultipleChoiceExercise(**exercise)
elif exercise_type == 'fillBlanks':
exercise_model = FillBlanksExercise(**exercise)
else:
raise ValidationError(f"Unknown exercise type: {exercise_type}")
exercises.append(exercise_model)
part_kwargs = {"exercises": exercises}
if context is not None:
part_kwargs["context"] = context
part_model = Part(**part_kwargs)
parts.append(part_model)
return Exam(parts=parts)
@staticmethod
def map_to_sheet(response: Dict[str, Any]) -> Sheet:
components = []
for item in response["components"]:
component_type = item["type"]
if component_type == "multipleChoice":
options = [Option(id=opt["id"], text=opt["text"]) for opt in item["options"]]
components.append(MultipleChoiceQuestion(
id=item["id"],
prompt=item["prompt"],
variant=item.get("variant", "text"),
options=options
))
elif component_type == "fillBlanks":
components.append(FillBlanksWord(
id=item["id"],
options=item["options"]
))
else:
components.append(item)
return Sheet(components=components)

View File

@@ -0,0 +1,385 @@
import json
import os
import uuid
from logging import getLogger
from typing import Dict, Any, Tuple, Callable
import pdfplumber
from modules import GPT
from modules.helper.file_helper import FileHelper
from modules.helper import LoggerHelper
from modules.upload_level.exam_dtos import Exam
from modules.upload_level.mapper import ExamMapper
from modules.upload_level.sheet_dtos import Sheet
class UploadLevelService:
def __init__(self, openai: GPT):
self._logger = getLogger(__name__)
self._llm = openai
def generate_level_from_file(self, file) -> Dict[str, Any] | None:
ext, path_id = FileHelper.save_upload(file)
FileHelper.convert_file_to_pdf(
f'./tmp/{path_id}/uploaded.{ext}', f'./tmp/{path_id}/exercises.pdf'
)
file_has_images = self._check_pdf_for_images(f'./tmp/{path_id}/exercises.pdf')
if not file_has_images:
FileHelper.convert_file_to_html(f'./tmp/{path_id}/uploaded.{ext}', f'./tmp/{path_id}/exercises.html')
completion: Callable[[str], Exam] = self._png_completion if file_has_images else self._html_completion
response = completion(path_id)
FileHelper.remove_directory(f'./tmp/{path_id}')
if response:
return self.fix_ids(response.dict(exclude_none=True))
return None
@staticmethod
@LoggerHelper.suppress_loggers()
def _check_pdf_for_images(pdf_path: str) -> bool:
with pdfplumber.open(pdf_path) as pdf:
for page in pdf.pages:
if page.images:
return True
return False
def _level_json_schema(self):
return {
"parts": [
{
"context": "<this attribute is optional you may exclude it if not required>",
"exercises": [
self._multiple_choice_html(),
self._passage_blank_space_html()
]
}
]
}
def _html_completion(self, path_id: str) -> Exam:
with open(f'./tmp/{path_id}/exercises.html', 'r', encoding='utf-8') as f:
html = f.read()
return self._llm.prediction(
[self._gpt_instructions_html(),
{
"role": "user",
"content": html
}
],
ExamMapper.map_to_exam_model,
str(self._level_json_schema())
)
def _gpt_instructions_html(self):
return {
"role": "system",
"content": (
'You are GPT Scraper and your job is to clean dirty html into clean usable JSON formatted data.'
'Your current task is to scrape html english questions sheets.\n\n'
'In the question sheet you will only see 4 types of question:\n'
'- blank space multiple choice\n'
'- underline multiple choice\n'
'- reading passage blank space multiple choice\n'
'- reading passage multiple choice\n\n'
'For the first two types of questions the template is the same but the question prompts differ, '
'whilst in the blank space multiple choice you must include in the prompt the blank spaces with '
'multiple "_", in the underline you must include in the prompt the <u></u> to '
'indicate the underline and the options a, b, c, d must be the ordered underlines in the prompt.\n\n'
'For the reading passage exercise you must handle the formatting of the passages. If it is a '
'reading passage with blank spaces you will see blanks represented with (question id) followed by a '
'line and your job is to replace the brackets with the question id and line with "{{question id}}" '
'with 2 newlines between paragraphs. For the reading passages without blanks you must remove '
'any numbers that may be there to specify paragraph numbers or line numbers, and place 2 newlines '
'between paragraphs.\n\n'
'IMPORTANT: Note that for the reading passages, the html might not reflect the actual paragraph '
'structure, don\'t format the reading passages paragraphs only by the <p></p> tags, try to figure '
'out the best paragraph separation possible.'
'You will place all the information in a single JSON: {"parts": [{"exercises": [{...}], "context": ""}]}\n '
'Where {...} are the exercises templates for each part of a question sheet and the optional field '
'context.'
'IMPORTANT: The question sheet may be divided by sections but you need to only consider the parts, '
'so that you can group the exercises by the parts that are in the html, this is crucial since only '
'reading passage multiple choice require context and if the context is included in parts where it '
'is not required the UI will be messed up. Some make sure to correctly group the exercises by parts.\n'
'The templates for the exercises are the following:\n'
'- blank space multiple choice, underline multiple choice and reading passage multiple choice: '
f'{self._multiple_choice_html()}\n'
f'- reading passage blank space multiple choice: {self._passage_blank_space_html()}\n'
'IMPORTANT: For the reading passage multiple choice the context field must be set with the reading '
'passages without paragraphs or line numbers, with 2 newlines between paragraphs, for the other '
'exercises exclude the context field.'
)
}
@staticmethod
def _multiple_choice_html():
return {
"type": "multipleChoice",
"prompt": "Select the appropriate option.",
"questions": [
{
"id": "<the question id>",
"prompt": "<the question>",
"solution": "<the option id solution>",
"options": [
{
"id": "A",
"text": "<the a option>"
},
{
"id": "B",
"text": "<the b option>"
},
{
"id": "C",
"text": "<the c option>"
},
{
"id": "D",
"text": "<the d option>"
}
]
}
]
}
@staticmethod
def _passage_blank_space_html():
return {
"type": "fillBlanks",
"variant": "mc",
"prompt": "Click a blank to select the appropriate word for it.",
"text": (
"<The whole text for the exercise with replacements for blank spaces and their "
"ids with {{<question id>}} with 2 newlines between paragraphs>"
),
"solutions": [
{
"id": "<question id>",
"solution": "<the option that holds the solution>"
}
],
"words": [
{
"id": "<question id>",
"options": {
"A": "<a option>",
"B": "<b option>",
"C": "<c option>",
"D": "<d option>"
}
}
]
}
def _png_completion(self, path_id: str) -> Exam:
FileHelper.pdf_to_png(path_id)
tmp_files = os.listdir(f'./tmp/{path_id}')
pages = [f for f in tmp_files if f.startswith('page-') and f.endswith('.png')]
pages.sort(key=lambda f: int(f.split('-')[1].split('.')[0]))
json_schema = {
"components": [
{"type": "part", "part": "<name or number of the part>"},
self._multiple_choice_png(),
{"type": "blanksPassage", "text": (
"<The whole text for the exercise with replacements for blank spaces and their "
"ids with {{<question id>}} with 2 newlines between paragraphs>"
)},
{"type": "passage", "context": (
"<reading passages without paragraphs or line numbers, with 2 newlines between paragraphs>"
)},
self._passage_blank_space_png()
]
}
components = []
for i in range(len(pages)):
current_page = pages[i]
next_page = pages[i + 1] if i + 1 < len(pages) else None
batch = [current_page, next_page] if next_page else [current_page]
sheet = self._png_batch(path_id, batch, json_schema)
sheet.batch = i + 1
components.append(sheet.dict())
batches = {"batches": components}
with open('output.json', 'w') as json_file:
json.dump(batches, json_file, indent=4)
return self._batches_to_exam_completion(batches)
def _png_batch(self, path_id: str, files: list[str], json_schema) -> Sheet:
return self._llm.prediction(
[self._gpt_instructions_png(),
{
"role": "user",
"content": [
*FileHelper.b64_pngs(path_id, files)
]
}
],
ExamMapper.map_to_sheet,
str(json_schema)
)
def _gpt_instructions_png(self):
return {
"role": "system",
"content": (
'You are GPT OCR and your job is to scan image text data and format it to JSON format.'
'Your current task is to scan english questions sheets.\n\n'
'You will place all the information in a single JSON: {"components": [{...}]} where {...} is a set of '
'sheet components you will retrieve from the images, the components and their corresponding JSON '
'templates are as follows:\n'
'- Part, a standalone part or part of a section of the question sheet: '
'{"type": "part", "part": "<name or number of the part>"}\n'
'- Multiple Choice Question, there are three types of multiple choice questions that differ on '
'the prompt field of the template: blanks, underlines and normal. '
'In the blanks prompt you must leave 5 underscores to represent the blank space. '
'In the underlines questions the objective is to pick the words that are incorrect in the given '
'sentence, for these questions you must wrap the answer to the question with the html tag <u></u>, '
'choose 3 other words to wrap in <u></u>, place them in the prompt field and use the underlined words '
'in the order they appear in the question for the options A to D, disreguard options that might be '
'included underneath the underlines question and use the ones you wrapped in <u></u>.'
'In normal you just leave the question as is. '
f'The template for multiple choice questions is the following: {self._multiple_choice_png()}.\n'
'- Reading Passages, there are two types of reading passages. Reading passages where you will see '
'blanks represented by a (question id) followed by a line, you must format these types of reading '
'passages to be only the text with the brackets that have the question id and line replaced with '
'"{{question id}}", also place 2 newlines between paragraphs. For the reading passages without blanks '
'you must remove any numbers that may be there to specify paragraph numbers or line numbers, '
'and place 2 newlines between paragraphs. '
'For the reading passages with blanks the template is: {"type": "blanksPassage", '
'"text": "<The whole text for the exercise with replacements for blank spaces and their '
'ids that are enclosed in brackets with {{<question id>}} also place 2 newlines between paragraphs>"}. '
'For the reading passage without blanks is: {"type": "passage", "context": "<reading passages without '
'paragraphs or line numbers, with 2 newlines between paragraphs>"}\n'
'- Blanks Options, options for a blanks reading passage exercise, this type of component is a group of '
'options with the question id and the options from a to d. The template is: '
f'{self._passage_blank_space_png()}\n'
'IMPORTANT: You must place the components in the order that they were given to you. If an exercise or '
'reading passages are cut off don\'t include them in the JSON.'
)
}
def _multiple_choice_png(self):
multiple_choice = self._multiple_choice_html()["questions"][0]
multiple_choice["type"] = "multipleChoice"
multiple_choice.pop("solution")
return multiple_choice
def _passage_blank_space_png(self):
passage_blank_space = self._passage_blank_space_html()["words"][0]
passage_blank_space["type"] = "fillBlanks"
return passage_blank_space
def _batches_to_exam_completion(self, batches: Dict[str, Any]) -> Exam:
return self._llm.prediction(
[self._gpt_instructions_html(),
{
"role": "user",
"content": str(batches)
}
],
ExamMapper.map_to_exam_model,
str(self._level_json_schema())
)
def _gpt_instructions_batches(self):
return {
"role": "system",
"content": (
'You are helpfull assistant. Your task is to merge multiple batches of english question sheet '
'components and solve the questions. Each batch may contain overlapping content with the previous '
'batch, or close enough content which needs to be excluded. The components are as follows:'
'- Part, a standalone part or part of a section of the question sheet: '
'{"type": "part", "part": "<name or number of the part>"}\n'
'- Multiple Choice Question, there are three types of multiple choice questions that differ on '
'the prompt field of the template: blanks, underlines and normal. '
'In a blanks question, the prompt has underscores to represent the blank space, you must select the '
'appropriate option to solve it.'
'In a underlines question, the prompt has 4 underlines represented by the html tags <u></u>, you must '
'select the option that makes the prompt incorrect to solve it. If the options order doesn\'t reflect '
'the order in which the underlines appear in the prompt you will need to fix it.'
'In a normal question there isn\'t either blanks or underlines in the prompt, you should just '
'select the appropriate solution.'
f'The template for these questions is the same: {self._multiple_choice_png()}\n'
'- Reading Passages, there are two types of reading passages with different templates. The one with '
'type "blanksPassage" where the text field holds the passage and a blank is represented by '
'{{<some number>}} and the other one with type "passage" that has the context field with just '
'reading passages. For both of these components you will have to remove any additional data that might '
'be related to a question description and also remove some "(<question id>)" and "_" from blanksPassage'
' if there are any. These components are used in conjunction with other ones.'
'- Blanks Options, options for a blanks reading passage exercise, this type of component is a group of '
'options with the question id and the options from a to d. The template is: '
f'{self._passage_blank_space_png()}\n\n'
'Now that you know the possible components here\'s what I want you to do:\n'
'1. Remove duplicates. A batch will have duplicates of other batches and the components of '
'the next batch should always take precedence over the previous one batch, what I mean by this is that '
'if batch 1 has, for example, multiple choice question with id 10 and the next one also has id 10, '
'you pick the next one.\n'
'2. Solve the exercises. There are 4 types of exercises, the 3 multipleChoice variants + a fill blanks '
'exercise. For the multiple choice question follow the previous instruction to solve them and place '
f'them in this format: {self._multiple_choice_html()}. For the fill blanks exercises you need to match '
'the correct blanksPassage to the correct fillBlanks options and then pick the correct option. Here is '
f'the template for this exercise: {self._passage_blank_space_html()}.\n'
f'3. Restructure the JSON to match this template: {self._level_json_schema()}. You must group the exercises by '
'the parts in the order they appear in the batches components. The context field of a part is the '
'context of a passage component that has text relevant to normal multiple choice questions.\n'
'Do your utmost to fullfill the requisites, make sure you include all non-duplicate questions'
'in your response and correctly structure the JSON.'
)
}
@staticmethod
def fix_ids(response):
counter = 1
for part in response["parts"]:
for exercise in part["exercises"]:
if exercise["type"] == "multipleChoice":
for question in exercise["questions"]:
question["id"] = counter
counter += 1
if exercise["type"] == "fillBlanks":
for i in range(len(exercise["words"])):
exercise["words"][i]["id"] = counter
exercise["solutions"][i]["id"] = counter
counter += 1
return response

View File

@@ -0,0 +1,29 @@
from pydantic import BaseModel
from typing import List, Dict, Union, Any, Optional
class Option(BaseModel):
id: str
text: str
class MultipleChoiceQuestion(BaseModel):
type: str = "multipleChoice"
id: str
prompt: str
variant: str = "text"
options: List[Option]
class FillBlanksWord(BaseModel):
type: str = "fillBlanks"
id: str
options: Dict[str, str]
Component = Union[MultipleChoiceQuestion, FillBlanksWord, Dict[str, Any]]
class Sheet(BaseModel):
batch: Optional[int] = None
components: List[Component]

File diff suppressed because one or more lines are too long

Binary file not shown.

5
run.py
View File

@@ -1,5 +0,0 @@
from streamlit.web import bootstrap
real_s_script = 'sp1_playground.py'
real_w_script = 'wt2_playground.py'
bootstrap.run(real_s_script, f'run.py {real_s_script}', [], {})

View File

@@ -1,109 +0,0 @@
import openai
import os
from dotenv import load_dotenv
import whisper
load_dotenv()
openai.api_key = os.getenv("OPENAI_API_KEY")
def correct_answer(
max_tokens,
temperature,
top_p,
frequency_penalty,
question_type,
question,
answer_path
):
model = whisper.load_model("base")
# result = model.transcribe("audio-samples/mynameisjeff.wav", fp16=False, language='English', verbose=True)
if os.path.exists(answer_path):
result = model.transcribe(answer_path, fp16=False, language='English', verbose=True)
answer = result["text"]
print(answer)
res = openai.ChatCompletion.create(
model="gpt-3.5-turbo",
max_tokens=int(max_tokens),
temperature=float(temperature),
top_p=float(top_p),
frequency_penalty=float(frequency_penalty),
messages=
[
{
"role": "system",
"content": "You are a IELTS examiner.",
},
{
"role": "system",
"content": f"The question you have to grade is of type {question_type} and is the following: {question}",
},
{
"role": "system",
"content": "Please provide a JSON object response with the overall grade and breakdown grades, "
"formatted as follows: {'overall': 7.0, 'task_response': {'Fluency and Coherence': 8.0, "
"'Lexical Resource': 6.5, 'Grammatical Range and Accuracy': 7.5, 'Pronunciation': "
"6.0}}",
},
{
"role": "system",
"content": "Don't give explanations for the grades, just provide the json with the grades.",
},
{
"role": "system",
"content": "If the answer is unrelated to the question give it the minimum grade.",
},
{
"role": "user",
"content": f"Evaluate this answer according to ielts grading system: {answer}",
},
],
)
return res["choices"][0]["message"]["content"]
else:
print("File not found:", answer_path)
import streamlit as st
# Set the application title
st.title("GPT-3.5 IELTS Examiner")
# Selection box to select the question type
question_type = st.selectbox(
"What is the question type?",
(
"Speaking Part 1",
"Speaking Part 2",
"Speaking Part 3"
),
)
# Provide the input area for question to be answered
# PT-1: How do you usually spend your weekends? Why?
# PT-2: Describe someone you know who does something well. You should say who this person is, how do you know this person, what they do well and explain why you think this person is so good at doing this.
question = st.text_area("Enter the question:", height=100)
# Provide the input area for text to be summarized
# audio-samples/mynameisjeff.wav
answer_path = st.text_area("Enter the answer path:", height=100)
# Initiate two columns for section to be side-by-side
# col1, col2 = st.columns(2)
# Slider to control the model hyperparameter
# with col1:
token = st.slider("Token", min_value=0.0, max_value=2000.0, value=1000.0, step=1.0)
temp = st.slider("Temperature", min_value=0.0, max_value=1.0, value=0.7, step=0.01)
top_p = st.slider("Top_p", min_value=0.0, max_value=1.0, value=0.9, step=0.01)
f_pen = st.slider("Frequency Penalty", min_value=-1.0, max_value=1.0, value=0.5, step=0.01)
# Showing the current parameter used for the model
# with col2:
with st.expander("Current Parameter"):
st.write("Current Token :", token)
st.write("Current Temperature :", temp)
st.write("Current Nucleus Sampling :", top_p)
st.write("Current Frequency Penalty :", f_pen)
# Creating button for execute the text summarization
if st.button("Grade"):
st.write(correct_answer(token, temp, top_p, f_pen, question_type, question, answer_path))

View File

@@ -1,102 +0,0 @@
import openai
import os
from dotenv import load_dotenv
load_dotenv()
openai.api_key = os.getenv("OPENAI_API_KEY")
def generate_summarizer(
max_tokens,
temperature,
top_p,
frequency_penalty,
question_type,
question,
answer
):
res = openai.ChatCompletion.create(
model="gpt-3.5-turbo",
max_tokens=int(max_tokens),
temperature=float(temperature),
top_p=float(top_p),
frequency_penalty=float(frequency_penalty),
messages=
[
{
"role": "system",
"content": "You are a IELTS examiner.",
},
{
"role": "system",
"content": f"The question you have to grade is of type {question_type} and is the following: {question}",
},
{
"role": "system",
"content": "Please provide a JSON object response with the overall grade and breakdown grades, "
"formatted as follows: {'overall': 7.0, 'task_response': {'Task Achievement': 8.0, "
"'Coherence and Cohesion': 6.5, 'Lexical Resource': 7.5, 'Grammatical Range and Accuracy': "
"6.0}}",
},
{
"role": "system",
"content": "Don't give explanations for the grades, just provide the json with the grades.",
},
{
"role": "user",
"content": f"Evaluate this answer according to ielts grading system: {answer}",
},
],
)
return res["choices"][0]["message"]["content"]
import streamlit as st
# Set the application title
st.title("GPT-3.5 IELTS Examiner")
# qt_col, q_col = st.columns(2)
# Selection box to select the question type
# with qt_col:
question_type = st.selectbox(
"What is the question type?",
(
"Listening",
"Reading",
"Writing Task 1",
"Writing Task 2",
"Speaking Part 1",
"Speaking Part 2"
),
)
# Provide the input area for question to be answered
# with q_col:
question = st.text_area("Enter the question:", height=100)
# Provide the input area for text to be summarized
answer = st.text_area("Enter the answer:", height=100)
# Initiate two columns for section to be side-by-side
# col1, col2 = st.columns(2)
# Slider to control the model hyperparameter
# with col1:
token = st.slider("Token", min_value=0.0, max_value=2000.0, value=1000.0, step=1.0)
temp = st.slider("Temperature", min_value=0.0, max_value=1.0, value=0.7, step=0.01)
top_p = st.slider("Top_p", min_value=0.0, max_value=1.0, value=0.9, step=0.01)
f_pen = st.slider("Frequency Penalty", min_value=-1.0, max_value=1.0, value=0.5, step=0.01)
# Showing the current parameter used for the model
# with col2:
with st.expander("Current Parameter"):
st.write("Current Token :", token)
st.write("Current Temperature :", temp)
st.write("Current Nucleus Sampling :", top_p)
st.write("Current Frequency Penalty :", f_pen)
# Creating button for execute the text summarization
if st.button("Grade"):
st.write(generate_summarizer(token, temp, top_p, f_pen, question_type, question, answer))

View File

@@ -1,37 +0,0 @@
from functools import reduce
from helper.token_counter import count_tokens
# model = whisper.load_model("base")
# file_path = "audio-samples/mynameisjeff.wav"
# audio_file = AudioSegment.from_file(file_path)
# if os.path.exists(file_path):
# result = model.transcribe(file_path, fp16=False, language='English', verbose=True)
# print(result["text"])
# else:
# print("File not found:", file_path)
messages = [
{
"role": "system",
"content": "You are a IELTS examiner.",
},
{
"role": "system",
"content": f"The question you have to grade is of type and is the following: ",
},
{
"role": "system",
"content": "Please provide a JSON object response with the overall grade and breakdown grades, "
"formatted as follows: {'overall': 7.0, 'task_response': {'Task Achievement': 8.0, "
"'Coherence and Cohesion': 6.5, 'Lexical Resource': 7.5, 'Grammatical Range and Accuracy': "
"6.0}}",
},
{
"role": "user",
"content": f"Evaluate this answer according to ielts grading system:",
},
]
token_count = reduce(lambda count, item: count + count_tokens(item)['n_tokens'],
map(lambda x: x["content"], filter(lambda x: "content" in x, messages)), 0)
print(token_count)

View File

@@ -1,5 +0,0 @@
QUESTION,ANSWER,GRADE
"News editors decide what to broadcast on television and what to print in newspapers. What factors do you think influence these decisions? Do we become used to bad news? Would it he better if more good news was reported?","It has often been said that “Good news is bad news” because it does not sell newspapers. A radio station that once decided to present only good news soon found that it had gone out of business for lack of listeners. Bad news on the other hand is so common that in order to cope with it, we often simply ignore it. We have become immune to bad news and the newspapers and radio stations are aware of this. While newspapers and TV stations may aim to report world events accurately, be they natural or human disasters, political events or the horrors of war, it is also true that their main objective is to sell newspapers and attract listeners and viewers to their stations. For this reason TV and radio stations attempt to reflect the flavour of their station by providing news broadcasts tailor-made to suit their listeners preferences. Programmes specialising in pop music or TV soap operas focus more on local news, home issues and up-to-date traffic reports. The more serious stations and newspapers like to provide “so called” objective news reports with editorial comment aimed at analysing the situation. If it is true, then, that newspapers and TV stations are tailoring their news to their readers and viewers requirements, how can they possibly be reporting real world events in an honest and objective light? Many radio and TV stations do, in fact, report items of good news but they no longer call this news. They refer to these as human interest stories and package them in programmes specialising, for instance, in consumer affairs or local issues. Good news now comes to us in the form of documentaries the fight against childrens cancer or AIDS, or the latest developments in the fight to save the planet from environmental pollution.",6
"We are becoming increasingly dependent on computers. They are used in businesses, hospitals, crime detection and even to fly planes. What things will they be used for in the future? Is this dependence on computers a good thing or should we he more auspicious of their benefits?","Computers are a relatively new invention. The first computers were built fifty years ago and it is only in the last thirty or so years that their influence has affected our everyday life. Personal computers were introduced as recently as the early eighties. In this short time they have made a tremendous impact on our lives. We are now so dependent on computers that it is hard to imagine what things would be like today without them. You have only got to go into a bank when their main computer is broken to appreciate the chaos that would occur if computers were suddenly removed world-wide. In the future computers will be used to create bigger and even more sophisticated computers. The prospects for this are quite alarming. They will be so complex that no individual could hope to understand how they work. They will bring a lot of benefits but they will also increase the potential for unimaginable chaos. They will, for example, be able to fly planes and they will be able to co ordinate the movements of several planes in the vicinity of an airport. Providing all the computers are working correctly nothing can go wrong. If one small program fails — disaster. There is a certain inevitability that technology will progress and become increasingly complex. We should, however, ensure that we are still in a position where we are able to control technology. It will be all too easy to suddenly discover that technology is controlling us. By then it might be too late I believe that it is very important to be suspicious of the benefits that computers will bring and to make sure that we never become totally dependent on a completely technological world.",6
"The chart below shows the amount of money per week spent on fast foods in Britain. The graph shows the trends in consumption of fast-foods. Write a report for a university lecturer describing the information shown below.","The chart shows that high income earners consumed considerably more fast foods than the other income groups, spending more than twice as much on hamburgers (43 pence per person per week) than on fish and chips or pizza (both under 20 pence). Average income earners also favoured hamburgers, spending 33 pence per person per week, followed by fish and chips at 24 pence, then pizza at 11 pence. Low income earners appear to spend less than other income groups on fast foods, though fish and chips remains their most popular fast food, followed by hamburgers and then pizza. From the graph we can see that in 1970, fish and chips were twice as popular as burgers, pizza being at that time the least popular fast food. The consumption of hamburgers and pizza has risen steadily over the 20 year period to 1990 while the consumption of fish and chips has been in decline over that same period with a slight increase in popularity since 1985.",7.5
"You have had a bank account for a few years. Recently you received a letter from the hank stating that your account is $240 overdrawn and that you will he charged $70 which will he taken directly from your account. You know that this information is incorrect. Write a letter to the bank. Explain what has happened and say what you would like them to do about it.","Dear Sir, I am writing in reply to a letter I received from you a few days ago. In your letter you state that I am $240 overdrawn and that you will be charging me $70.I would like to point out that the reason I am overdrawn is because of a mistake made by your bank. If you look through your records you will see that I wrote several weeks ago explaining the situation. For the last twelve months, I have been paying $300 a month for a car I bought last summer. The monthly payments were taken directly from my bank account. However, two months ago I sold the car and I wrote to you instructing you to stop paying the monthly instalments. I received a letter from you acknowledging my request, but, for some reason, nothing was done about it. Another $300 instalment has been paid this month and this is the reason why I am overdrawn. I would like you to contact the garage where I bought the car explaining your error. I would also like you to ask them to return the money. Yours faithfully, P Stoft",8
1 QUESTION ANSWER GRADE
2 News editors decide what to broadcast on television and what to print in newspapers. What factors do you think influence these decisions? Do we become used to bad news? Would it he better if more good news was reported? It has often been said that “Good news is bad news” because it does not sell newspapers. A radio station that once decided to present only good news soon found that it had gone out of business for lack of listeners. Bad news on the other hand is so common that in order to cope with it, we often simply ignore it. We have become immune to bad news and the newspapers and radio stations are aware of this. While newspapers and TV stations may aim to report world events accurately, be they natural or human disasters, political events or the horrors of war, it is also true that their main objective is to sell newspapers and attract listeners and viewers to their stations. For this reason TV and radio stations attempt to reflect the flavour of their station by providing news broadcasts tailor-made to suit their listeners’ preferences. Programmes specialising in pop music or TV soap operas focus more on local news, home issues and up-to-date traffic reports. The more serious stations and newspapers like to provide “so called” objective news reports with editorial comment aimed at analysing the situation. If it is true, then, that newspapers and TV stations are tailoring their news to their readers’ and viewers’ requirements, how can they possibly be reporting real world events in an honest and objective light? Many radio and TV stations do, in fact, report items of good news but they no longer call this news. They refer to these as human interest stories and package them in programmes specialising, for instance, in consumer affairs or local issues. Good news now comes to us in the form of documentaries the fight against children’s cancer or AIDS, or the latest developments in the fight to save the planet from environmental pollution. 6
3 We are becoming increasingly dependent on computers. They are used in businesses, hospitals, crime detection and even to fly planes. What things will they be used for in the future? Is this dependence on computers a good thing or should we he more auspicious of their benefits? Computers are a relatively new invention. The first computers were built fifty years ago and it is only in the last thirty or so years that their influence has affected our everyday life. Personal computers were introduced as recently as the early eighties. In this short time they have made a tremendous impact on our lives. We are now so dependent on computers that it is hard to imagine what things would be like today without them. You have only got to go into a bank when their main computer is broken to appreciate the chaos that would occur if computers were suddenly removed world-wide. In the future computers will be used to create bigger and even more sophisticated computers. The prospects for this are quite alarming. They will be so complex that no individual could hope to understand how they work. They will bring a lot of benefits but they will also increase the potential for unimaginable chaos. They will, for example, be able to fly planes and they will be able to co ordinate the movements of several planes in the vicinity of an airport. Providing all the computers are working correctly nothing can go wrong. If one small program fails — disaster. There is a certain inevitability that technology will progress and become increasingly complex. We should, however, ensure that we are still in a position where we are able to control technology. It will be all too easy to suddenly discover that technology is controlling us. By then it might be too late I believe that it is very important to be suspicious of the benefits that computers will bring and to make sure that we never become totally dependent on a completely technological world. 6
4 The chart below shows the amount of money per week spent on fast foods in Britain. The graph shows the trends in consumption of fast-foods. Write a report for a university lecturer describing the information shown below. The chart shows that high income earners consumed considerably more fast foods than the other income groups, spending more than twice as much on hamburgers (43 pence per person per week) than on fish and chips or pizza (both under 20 pence). Average income earners also favoured hamburgers, spending 33 pence per person per week, followed by fish and chips at 24 pence, then pizza at 11 pence. Low income earners appear to spend less than other income groups on fast foods, though fish and chips remains their most popular fast food, followed by hamburgers and then pizza. From the graph we can see that in 1970, fish and chips were twice as popular as burgers, pizza being at that time the least popular fast food. The consumption of hamburgers and pizza has risen steadily over the 20 year period to 1990 while the consumption of fish and chips has been in decline over that same period with a slight increase in popularity since 1985. 7.5
5 You have had a bank account for a few years. Recently you received a letter from the hank stating that your account is $240 overdrawn and that you will he charged $70 which will he taken directly from your account. You know that this information is incorrect. Write a letter to the bank. Explain what has happened and say what you would like them to do about it. Dear Sir, I am writing in reply to a letter I received from you a few days ago. In your letter you state that I am $240 overdrawn and that you will be charging me $70.I would like to point out that the reason I am overdrawn is because of a mistake made by your bank. If you look through your records you will see that I wrote several weeks ago explaining the situation. For the last twelve months, I have been paying $300 a month for a car I bought last summer. The monthly payments were taken directly from my bank account. However, two months ago I sold the car and I wrote to you instructing you to stop paying the monthly instalments. I received a letter from you acknowledging my request, but, for some reason, nothing was done about it. Another $300 instalment has been paid this month and this is the reason why I am overdrawn. I would like you to contact the garage where I bought the car explaining your error. I would also like you to ask them to return the money. Yours faithfully, P Stoft 8

View File

@@ -1,5 +0,0 @@
QUESTION,ANSWER,GRADE
"The table below shows the consumer durables (telephone, refrigerator, etc.) owned in Britain from 1972 to 1983. Write a report for a university lecturer describing the information shown below","The chart shows that the percentage of British households with a range of consumer durables steadily in- creased between 1972 and 1983. The greatest increase was in telephone ownership, rising from 42% in 1972 to 77% in 1983. Next came central heating ownership, rising from 37% of households in 1972 to 64% in 1983. The percentage of households with a refrigerator rose 21% over the same period and of those with a washing machine by 14%. Households with vacuum-cleaners, televisions and dishwash- ers increased by 8%, 5% and 2% respectively. In 1983, the year of their introduction, 18% of households had a video recorder. The significant social changes reflected in the statistics are that over the period the proportion of British houses with central heating rose from one to two thirds, and of those with a phone from under a half to over three-quarters. Together with the big increases in the ownership of washing machines and refriger- ators, they are evidence of both rising living standards and the trend to lifestyles based on comfort and convenience.",8
"Fatherhood ought to be emphasised as much as motherhood. The idea that women are solely responsible for deciding whether or not to have babies leads on to the idea that they are also responsible for bringing the children up. To what extent do you agree or disagree?","I believe that child-rearing should be the responsibility of both parents and that, whilst the roles within that partnership may be different, they are nevertheless equal in importance. In some societies, it has been made easier over the years for single parents to raise children on their own. However, this does not mean that the traditional family, with both parents providing emotional support and role-models for their children, is not the most satisfactory way of bringing up children. Of crucial importance, in my opinion, is how we define 'responsible for bringing the children up'. At its simplest, it could mean giving the financial support necessary to provide a home, food and clothes and making sure the child is safe and receives an adequate education. This would be the basic definition. There is, however, another possible way of defining that part of the quotation. That would say it is not just the father's responsibility to provide the basics for his children, while his wife involves herself in the everyday activity of bringing them up. Rather, he should share those daily duties, spend as much time as his job allows with his children, play with them, read to them, help directly with their educa- tion, participate very fully in their lives and encourage them to share his. It is this second, fuller, concept of 'fatherhood' that I am in favour of, although I also realise how difficult it is to achieve sometimes. The economic and employment situation in many countries means that jobs are getting more, not less, stressful, requiring long hours and perhaps long journeys to work as well. Therefore it may remain for many a desirable ideal rather than an achievable reality.",8
"The chart below shows the amount of leisure time enjoyed by men and women of different employment status. Write a report for a university lecturer describing the information shown below.","The chart shows the number of hours of leisure enjoyed by men and women in a typical week in 1998-9, according to gender and employment status. Among those employed full-time, men on average had fifty hours of leisure, whereas women had ap- proximately thirty-seven hours. There were no figures given for male part-time workers, but female part-timers had forty hours of leisure time, only slightly more than women in full-time employment. perhaps reflecting their work in the home. In the unemployed and retired categories, leisure time showed an increase for both sexes, as might have been expected. Here too, men enjoyed more leisure time-over eighty hours, compared with seventy hours for women, perhaps once again reflecting the fact that women spend more time working in the home than men. Lastly, housewives enjoyed approximately fifty-four hours of leisure, on average. There were no figures given for househusbands! Overall, the chart demonstrates that in the categories for which statistics on male leisure time were available, men enjoyed at least ten hours of extra leisure time.",6
"Prevention is better than cure. Out of a country's health budget, a large proportion should be diverted from treatment to spending on health education and preventative measures. To what extent do you agree or disagree with this statement?","Of course it goes without saying that prevention is better than cure. That is why, in recent years, there has been a growing body of opinion in favour of putting more resources into health education and preventive measures. The argument is that ignorance of, for example, basic hygiene or the dangers of an unhealthy diet or lifestyle needs to be combatted by special nationwide publicity campaigns, as well as longer-term health education. Obviously, there is a strong human argument for catching any medical condition as early as possible. There is also an economic argument for doing so. Statistics demonstrate the cost-effectiveness of treat- ing a condition in the early stages, rather than delaying until more expensive and prolonged treatment is necessary. Then there are social or economic costs, perhaps in terms of loss of earnings for the family concerned or unemployed benefit paid by the state. So far so good, but the difficulties start when we try to define what the 'proportion of the budget should be, particularly if the funds will be diverted from treatment. Decisions on exactly how much of the total health budget should be spent in this way are not a matter for the non-specialist, but should be made on the basis of an accepted health service model. This is the point at which real problems occur the formulation of the model. How do we accurately measure which health education campaigns are effective in both medical and financial terms? How do we agree about the medical efficacy of various screening programmes, for example, when the medical establishment itself does not agree? A very rigorous process of evaluation is called for, so that we can make informed decisions.",6.5
1 QUESTION ANSWER GRADE
2 The table below shows the consumer durables (telephone, refrigerator, etc.) owned in Britain from 1972 to 1983. Write a report for a university lecturer describing the information shown below The chart shows that the percentage of British households with a range of consumer durables steadily in- creased between 1972 and 1983. The greatest increase was in telephone ownership, rising from 42% in 1972 to 77% in 1983. Next came central heating ownership, rising from 37% of households in 1972 to 64% in 1983. The percentage of households with a refrigerator rose 21% over the same period and of those with a washing machine by 14%. Households with vacuum-cleaners, televisions and dishwash- ers increased by 8%, 5% and 2% respectively. In 1983, the year of their introduction, 18% of households had a video recorder. The significant social changes reflected in the statistics are that over the period the proportion of British houses with central heating rose from one to two thirds, and of those with a phone from under a half to over three-quarters. Together with the big increases in the ownership of washing machines and refriger- ators, they are evidence of both rising living standards and the trend to lifestyles based on comfort and convenience. 8
3 Fatherhood ought to be emphasised as much as motherhood. The idea that women are solely responsible for deciding whether or not to have babies leads on to the idea that they are also responsible for bringing the children up. To what extent do you agree or disagree? I believe that child-rearing should be the responsibility of both parents and that, whilst the roles within that partnership may be different, they are nevertheless equal in importance. In some societies, it has been made easier over the years for single parents to raise children on their own. However, this does not mean that the traditional family, with both parents providing emotional support and role-models for their children, is not the most satisfactory way of bringing up children. Of crucial importance, in my opinion, is how we define 'responsible for bringing the children up'. At its simplest, it could mean giving the financial support necessary to provide a home, food and clothes and making sure the child is safe and receives an adequate education. This would be the basic definition. There is, however, another possible way of defining that part of the quotation. That would say it is not just the father's responsibility to provide the basics for his children, while his wife involves herself in the everyday activity of bringing them up. Rather, he should share those daily duties, spend as much time as his job allows with his children, play with them, read to them, help directly with their educa- tion, participate very fully in their lives and encourage them to share his. It is this second, fuller, concept of 'fatherhood' that I am in favour of, although I also realise how difficult it is to achieve sometimes. The economic and employment situation in many countries means that jobs are getting more, not less, stressful, requiring long hours and perhaps long journeys to work as well. Therefore it may remain for many a desirable ideal rather than an achievable reality. 8
4 The chart below shows the amount of leisure time enjoyed by men and women of different employment status. Write a report for a university lecturer describing the information shown below. The chart shows the number of hours of leisure enjoyed by men and women in a typical week in 1998-9, according to gender and employment status. Among those employed full-time, men on average had fifty hours of leisure, whereas women had ap- proximately thirty-seven hours. There were no figures given for male part-time workers, but female part-timers had forty hours of leisure time, only slightly more than women in full-time employment. perhaps reflecting their work in the home. In the unemployed and retired categories, leisure time showed an increase for both sexes, as might have been expected. Here too, men enjoyed more leisure time-over eighty hours, compared with seventy hours for women, perhaps once again reflecting the fact that women spend more time working in the home than men. Lastly, housewives enjoyed approximately fifty-four hours of leisure, on average. There were no figures given for househusbands! Overall, the chart demonstrates that in the categories for which statistics on male leisure time were available, men enjoyed at least ten hours of extra leisure time. 6
5 Prevention is better than cure. Out of a country's health budget, a large proportion should be diverted from treatment to spending on health education and preventative measures. To what extent do you agree or disagree with this statement? Of course it goes without saying that prevention is better than cure. That is why, in recent years, there has been a growing body of opinion in favour of putting more resources into health education and preventive measures. The argument is that ignorance of, for example, basic hygiene or the dangers of an unhealthy diet or lifestyle needs to be combatted by special nationwide publicity campaigns, as well as longer-term health education. Obviously, there is a strong human argument for catching any medical condition as early as possible. There is also an economic argument for doing so. Statistics demonstrate the cost-effectiveness of treat- ing a condition in the early stages, rather than delaying until more expensive and prolonged treatment is necessary. Then there are social or economic costs, perhaps in terms of loss of earnings for the family concerned or unemployed benefit paid by the state. So far so good, but the difficulties start when we try to define what the 'proportion of the budget should be, particularly if the funds will be diverted from treatment. Decisions on exactly how much of the total health budget should be spent in this way are not a matter for the non-specialist, but should be made on the basis of an accepted health service model. This is the point at which real problems occur the formulation of the model. How do we accurately measure which health education campaigns are effective in both medical and financial terms? How do we agree about the medical efficacy of various screening programmes, for example, when the medical establishment itself does not agree? A very rigorous process of evaluation is called for, so that we can make informed decisions. 6.5

View File

@@ -1,6 +0,0 @@
QUESTION,ANSWER,GRADE,COMMENT
"You live in a room in college which you share with another student. However, there are many problems with this arrangement and you find it very difficult to work. Write a letter to the accommodation officer at the college. In your letter describe the situation, explain your problems and why it is difficult to work, say what kind of accommodation you would prefer","Dear Sir/Madam, I am writing to express my dissatisfaction with my room-mate. As you know we share one room, I can not study in the room at all any more if I still stay there. She always has friend visiting and has parties in the room. They make lots of noise and switch on the radio very loudly, for me this environment is very difficult to study and I need a quiet room. Even borrows my things without asking, it is very impolite. I request you can give me a new room next term because I have been asked her has parties in other place many times they still have parties in the room. I really can not stay in the same room with her. I would be grateful if you could change me a single room. Your faithfully, Catherine",5,"The answer is below the word limit and there is some repetition of the task rubric. (Length is a common problem in General Training scripts.) Answers that are short lose marks because of inadequate content and may also lose marks because there is insufficient material in the answer for the examiner to give credit for accuracy and coherence. Despite these problems, the introduction to the letter is appropriate and the purpose of the writer is clear. The points are not always linked together well and punctuation is sometimes faulty. The sentences are kept quite simple and mistakes occur as soon as more complex structures are attempted."
"In Britain, when someone gets old they often go to live in a home with other old people where there are nurses to look after them. Sometimes the government has to pay for this care.","Who should be responsible for our people. It is true that the old Peoples situation gets worse in the many countries. The first question must be what they wants and what they needs? Especially their necessity are more benefit more respect more quiet life. If they have been working for a long time in the any company or in the Public Sector and when they get old thats means during their retires time company or Government must be responsible of their welfare, it is just my opinion. They should take care of them. In addition to company or Government. If they have good money they can look after themselves. We can do something to make easier their life for example an organization or a voluntary association, unions. The families or Relatives responsibility depends on their wealthy situations. If they could do they should do anything. Governments or their former place could supply them with life insurance and a good Social Security Policy. The Social community center or old age pensioner like in the Britain are very useful for them. For all of them life is hard and gets harder, in the their old ages. They expect more attention and good life. The old people, if dont want lost them. We should do anything that what we able to do.",5,"There are quite a lot of relevant ideas in the answer but they are not always well supported and sometimes they are unclear.There are some areas in the answer where the organisation becomes weak and the reader finds the message difficult to follow. Nevertheless, the writers view is apparent and there is a logical flow to th points given. There are a lot of mistakes in the answer and some parts, such as the conclusion, are very hard to follow because of these errors. Although there is some appropriate vocabulary, sentence control is very weak."
"These days, more and more people move away from the area where they were born and brought up when they become adults. Do the advantages of this development outweigh the disadvantages?","It is certainly the case in my local area that many young people choose to leave their home village or town as soon as they finish college or when they first get full-time employment. There are several advantages to this. Firstly, it gives the individuals better opportunities to find more suitable jobs. This means they have much greater flexibility in the careers they can choose and are no longer forced to take the work available in the local area. A second benefit is that they have the chance to meet and work alongside a wider variety of people, which enriches their social and professional lives. Another relevant point is that moving to a place where they are anonymous allows people greater freedom to behave as they wish, without worrying about what those around them think. However, there are a number of drawbacks to this development, the most serious being loss of support. It is important for humans to feel that they are part of a community and can rely on family and friends for help, on a day-to-day basis. In a place where individuals know few people it is easy to become isolated and lonely. Related to this point is the fact that when people know very little about their neighbours, it is hard for mutual trust to develop. When people have lived in the same place or village all their lives, their personal and family backgrounds are widely known and this information can help others make reliable judgements, building personal and business relationships. On balance, I feel that this trend brings more negative outcomes than advantages and that it is leading to real problems of isolation and erosion of identity",6.5,"Overall, your answer demonstrates a clear understanding of the question and presents relevant ideas and arguments. Your essay structure is well-organized with an introduction, body paragraphs, and a conclusion. You have effectively addressed both the advantages and disadvantages of people moving away from their birthplace when they become adults."
"The average standard of people's health is likely to be lower in the future than it is now. To what extent do you agree or disagree with this statement?","I completly disagree with the written statment. I believe that most of the people in the world have more information about their health and also about how they can improve their healthy conditions. Nowadays, information about how harmful is to smoke for our bodies can be seen in many packets of cigars. This is a clear example how things can change from our recent past. There is a clear trend in the diminishing of smokers and if this continues it will have a positive impact in our health. On the other hand, the alimentation habbits are changing all over the world and this can affect peoples health. However every one can choose what to eat every day. Mostly everybody, from developed societies, know the importance of having a healthy diet. Advances such as the information showed in the menus of fast food restaurants will help people to have a clever choice before they choose what to eat. Another important issue that I would like to mention is how medicine is changing. There are new discovers and treatments almost every week and that is an inequivoque sintom of how things are changing in order to improve the worlds health.",5.5,"A clear position is presented from the outset, supported by relevant ideas. These would require further development to achieve a higher score. The response is under-length, however. Information and ideas are generally arranged coherently and there is a clear overall progression. Cohesive devices are used effectively, but paragraphing is not always logical. A range of vocabulary is attempted, although there are some errors in spelling, word choice and word formation. There also appears to be some interference from the test takers first language, e.g. alimentation, but these features do not make the answer difficult to understand. There is a mix of sentence forms, but the level of error is too high to achieve a higher band score."
"The average standard of people's health is likely to be lower in the future than it is now. To what extent do you agree or disagree with this statement?","Recently, there have been a lot of discussions about health and whether it is going to improve or not. In my opinion, I think that people will become unhealthier in the future than they are now. There are many reasons that support the idea of people becoming unhealthy in the future. Firstly, one reason is that of food. People tend to eat more fast food nowadays. They tend to treat themselves with sweets and chocolate whenever they want. This appears to be because people are busier now than they used to be. So, people dont have a chance to cook or even learn the art of cookery. Also, having a lot of unhealthy food can lead to obesity and it could be a serious issue in the future. Another reason is that technology is developing everyday. Young people enjoy buying new gadgets and the latest devices. This has a negative impact on their health, especially when they enjoy video games. Spending long hours looking at a screen can lead to bad eyesight and obesity as well. Yet another reason is that laziness is a big issue. Different forms of exercise might disappear in the future because people dont like sports. Also, people prefer spending most of their time on the internet and the internet is growing every single day. Other people might disagree and say that health will improve in the future. They believe that new sports and new ways to exercise will appear in the future. However, I dont think it can happen since the majority of people spend less time outdoors. Moreover, other people believe that technology will try and help people improve their health. For example, there have been some games released on the Wii console that makes people exercise but technology is developing more in a negative way. For instance, many phone industries are developing new applications everyday and todays generation likes to follow every trend. This prevents people to go outside to exercise. They like to spend more time on the internet downloading new programmes or reading gossips about celebraties. This affects peoples health badly. In conclusion, I believe that peoples health is affected negatively by fast food, technology and sports and it will be a problem in the future.",7.5,"The test taker presents a clear position at the outset and explores some ideas to support this. An alternative position is also considered, but rejected. This is a strong response, but there is rather too much emphasis on technology: other aspects of the proposition could also be considered, e.g. less physical work and more sedentary work, greater reliance on cars meaning less exercise, aging populations in some countries leading to more complex health issues. Ideas are organised logically and there is a clear progression throughout the response, with good use of cohesive devices and logical paragraphing. The response could perhaps be improved by breaking down paragraphs 2 and 3. There is a wide range of vocabulary with good use of less common items as well as evidence of higher level features, such as softening, e.g. They tend to, This appears to be, and might disagree. Errors in spelling and word formation are rare. There is also a variety of complex structures with frequent error-free sentences, though some errors do occur and there is some overuse of rather short sentence forms."
1 QUESTION ANSWER GRADE COMMENT
2 You live in a room in college which you share with another student. However, there are many problems with this arrangement and you find it very difficult to work. Write a letter to the accommodation officer at the college. In your letter describe the situation, explain your problems and why it is difficult to work, say what kind of accommodation you would prefer Dear Sir/Madam, I am writing to express my dissatisfaction with my room-mate. As you know we share one room, I can not study in the room at all any more if I still stay there. She always has friend visiting and has parties in the room. They make lots of noise and switch on the radio very loudly, for me this environment is very difficult to study and I need a quiet room. Even borrows my things without asking, it is very impolite. I request you can give me a new room next term because I have been asked her has parties in other place many times they still have parties in the room. I really can not stay in the same room with her. I would be grateful if you could change me a single room. Your faithfully, Catherine 5 The answer is below the word limit and there is some repetition of the task rubric. (Length is a common problem in General Training scripts.) Answers that are short lose marks because of inadequate content and may also lose marks because there is insufficient material in the answer for the examiner to give credit for accuracy and coherence. Despite these problems, the introduction to the letter is appropriate and the purpose of the writer is clear. The points are not always linked together well and punctuation is sometimes faulty. The sentences are kept quite simple and mistakes occur as soon as more complex structures are attempted.
3 In Britain, when someone gets old they often go to live in a home with other old people where there are nurses to look after them. Sometimes the government has to pay for this care. Who should be responsible for our people. It is true that the old Peoples situation gets worse in the many countries. The first question must be what they want’s and what they needs? Especially their necessity are more benefit more respect more quiet life. If they have been working for a long time in the any company or in the Public Sector and when they get old that’s means during their retire’s time company or Government must be responsible of their welfare, it is just my opinion. They should take care of them. In addition to company or Government. If they have good money they can look after themselves. We can do something to make easier their life for example an organization or a voluntary association, unions. The families or Relative’s responsibility depends on their wealthy situations. If they could do they should do anything. Government’s or their former place could supply them with life insurance and a good Social Security Policy. The Social community center or old age pensioner like in the Britain are very useful for them. For all of them life is hard and gets harder, in the their old ages. They expect more attention and good life. The old people, if don’t want lost them. We should do anything that what we able to do. 5 There are quite a lot of relevant ideas in the answer but they are not always well supported and sometimes they are unclear.There are some areas in the answer where the organisation becomes weak and the reader finds the message difficult to follow. Nevertheless, the writer’s view is apparent and there is a logical flow to th points given. There are a lot of mistakes in the answer and some parts, such as the conclusion, are very hard to follow because of these errors. Although there is some appropriate vocabulary, sentence control is very weak.
4 These days, more and more people move away from the area where they were born and brought up when they become adults. Do the advantages of this development outweigh the disadvantages? It is certainly the case in my local area that many young people choose to leave their home village or town as soon as they finish college or when they first get full-time employment. There are several advantages to this. Firstly, it gives the individuals better opportunities to find more suitable jobs. This means they have much greater flexibility in the careers they can choose and are no longer forced to take the work available in the local area. A second benefit is that they have the chance to meet and work alongside a wider variety of people, which enriches their social and professional lives. Another relevant point is that moving to a place where they are anonymous allows people greater freedom to behave as they wish, without worrying about what those around them think. However, there are a number of drawbacks to this development, the most serious being loss of support. It is important for humans to feel that they are part of a community and can rely on family and friends for help, on a day-to-day basis. In a place where individuals know few people it is easy to become isolated and lonely. Related to this point is the fact that when people know very little about their neighbours, it is hard for mutual trust to develop. When people have lived in the same place or village all their lives, their personal and family backgrounds are widely known and this information can help others make reliable judgements, building personal and business relationships. On balance, I feel that this trend brings more negative outcomes than advantages and that it is leading to real problems of isolation and erosion of identity 6.5 Overall, your answer demonstrates a clear understanding of the question and presents relevant ideas and arguments. Your essay structure is well-organized with an introduction, body paragraphs, and a conclusion. You have effectively addressed both the advantages and disadvantages of people moving away from their birthplace when they become adults.
5 The average standard of people's health is likely to be lower in the future than it is now. To what extent do you agree or disagree with this statement? I completly disagree with the written statment. I believe that most of the people in the world have more information about their health and also about how they can improve their healthy conditions. Nowadays, information about how harmful is to smoke for our bodies can be seen in many packets of cigars. This is a clear example how things can change from our recent past. There is a clear trend in the diminishing of smokers and if this continues it will have a positive impact in our health. On the other hand, the alimentation habbits are changing all over the world and this can affect people’s health. However every one can choose what to eat every day. Mostly everybody, from developed societies, know the importance of having a healthy diet. Advances such as the information showed in the menus of fast food restaurants will help people to have a clever choice before they choose what to eat. Another important issue that I would like to mention is how medicine is changing. There are new discovers and treatments almost every week and that is an inequivoque sintom of how things are changing in order to improve the world’s health. 5.5 A clear position is presented from the outset, supported by relevant ideas. These would require further development to achieve a higher score. The response is under-length, however. Information and ideas are generally arranged coherently and there is a clear overall progression. Cohesive devices are used effectively, but paragraphing is not always logical. A range of vocabulary is attempted, although there are some errors in spelling, word choice and word formation. There also appears to be some interference from the test taker’s first language, e.g. ‘alimentation’, but these features do not make the answer difficult to understand. There is a mix of sentence forms, but the level of error is too high to achieve a higher band score.
6 The average standard of people's health is likely to be lower in the future than it is now. To what extent do you agree or disagree with this statement? Recently, there have been a lot of discussions about health and whether it is going to improve or not. In my opinion, I think that people will become unhealthier in the future than they are now. There are many reasons that support the idea of people becoming unhealthy in the future. Firstly, one reason is that of food. People tend to eat more fast food nowadays. They tend to treat themselves with sweets and chocolate whenever they want. This appears to be because people are busier now than they used to be. So, people don’t have a chance to cook or even learn the art of cookery. Also, having a lot of unhealthy food can lead to obesity and it could be a serious issue in the future. Another reason is that technology is developing everyday. Young people enjoy buying new gadgets and the latest devices. This has a negative impact on their health, especially when they enjoy video games. Spending long hours looking at a screen can lead to bad eyesight and obesity as well. Yet another reason is that laziness is a big issue. Different forms of exercise might disappear in the future because people don’t like sports. Also, people prefer spending most of their time on the internet and the internet is growing every single day. Other people might disagree and say that health will improve in the future. They believe that new sports and new ways to exercise will appear in the future. However, I don’t think it can happen since the majority of people spend less time outdoors. Moreover, other people believe that technology will try and help people improve their health. For example, there have been some games released on the Wii console that makes people exercise but technology is developing more in a negative way. For instance, many phone industries are developing new applications everyday and today’s generation likes to follow every trend. This prevents people to go outside to exercise. They like to spend more time on the internet downloading new programmes or reading gossips about celebraties. This affects people’s health badly. In conclusion, I believe that people’s health is affected negatively by fast food, technology and sports and it will be a problem in the future. 7.5 The test taker presents a clear position at the outset and explores some ideas to support this. An alternative position is also considered, but rejected. This is a strong response, but there is rather too much emphasis on technology: other aspects of the proposition could also be considered, e.g. less physical work and more sedentary work, greater reliance on cars meaning less exercise, aging populations in some countries leading to more complex health issues. Ideas are organised logically and there is a clear progression throughout the response, with good use of cohesive devices and logical paragraphing. The response could perhaps be improved by breaking down paragraphs 2 and 3. There is a wide range of vocabulary with good use of less common items as well as evidence of higher level features, such as ‘softening’, e.g. ‘They tend to’, ‘This appears to be’, and ‘might disagree’. Errors in spelling and word formation are rare. There is also a variety of complex structures with frequent error-free sentences, though some errors do occur and there is some overuse of rather short sentence forms.

View File

@@ -1,5 +0,0 @@
PT-1: How do you usually spend your weekends? Why?
A: audio-samples/weekends.m4a
PT-2: Describe someone you know who does something well. You should say who this person is, how do you know this person, what they do well and explain why you think this person is so good at doing this.
A: audio-samples/speakingpt2.m4a

View File

@@ -1,9 +0,0 @@
Q: It is important for children to learn the difference between right and wrong at an early age. Punishment is necessary to help them learn this distinction.
To what extent do you agree or disagree with this opinion?
What sort of punishment should parents and teachers be allowed to use to teach good behaviour to children?
A: In today's world, moral values and ethics play a vital role in shaping the character of an individual. Children are the building blocks of society, and it is important to inculcate a sense of right and wrong in them from an early age. While punishment can be an effective tool in teaching the difference between right and wrong, it should not be the only approach. In my opinion, punishment should be used in moderation, and parents and teachers should focus on positive reinforcement and guidance to teach good behavior.
Punishment can be used to correct behavior and to help children understand the consequences of their actions. However, excessive punishment can be counterproductive and can even have harmful effects on children. Physical punishment, such as hitting or spanking, should be avoided as it can lead to physical and emotional trauma. Instead, parents and teachers should consider alternative forms of punishment such as time-outs, loss of privileges or extra chores. These methods can be effective in conveying the message without causing physical harm.
Furthermore, parents and teachers should focus on positive reinforcement to encourage good behavior. Praising children when they exhibit good behavior can be an effective way to motivate them to continue behaving well. Teachers can use stickers or small rewards to encourage students to work hard and behave well in class. Parents can use similar methods at home to reinforce good behavior.
In conclusion, it is important for children to learn the difference between right and wrong at an early age. Punishment can be a useful tool in teaching this distinction, but it should not be the only approach. Parents and teachers should use positive reinforcement and guidance to encourage good behavior, and should only resort to punishment in moderation and when necessary. Any form of punishment should be non-violent and should not cause physical or emotional harm to the child.

1
tmp/placeholder.txt Normal file
View File

@@ -0,0 +1 @@
THIS FILE ONLY EXISTS TO KEEP THIS FOLDER IN THE REPO

View File

@@ -1,102 +0,0 @@
import openai
import os
from dotenv import load_dotenv
load_dotenv()
openai.api_key = os.getenv("OPENAI_API_KEY")
def generate_summarizer(
max_tokens,
temperature,
top_p,
frequency_penalty,
question_type,
question,
answer
):
res = openai.ChatCompletion.create(
model="gpt-3.5-turbo",
max_tokens=int(max_tokens),
temperature=float(temperature),
top_p=float(top_p),
frequency_penalty=float(frequency_penalty),
messages=
[
{
"role": "system",
"content": "You are a IELTS examiner.",
},
{
"role": "system",
"content": f"The question you have to grade is of type {question_type} and is the following: {question}",
},
{
"role": "system",
"content": "Please provide a JSON object response with the overall grade and breakdown grades, "
"formatted as follows: {'overall': 7.0, 'task_response': {'Task Achievement': 8.0, "
"'Coherence and Cohesion': 6.5, 'Lexical Resource': 7.5, 'Grammatical Range and Accuracy': "
"6.0}}",
},
{
"role": "system",
"content": "Don't give explanations for the grades, just provide the json with the grades.",
},
{
"role": "user",
"content": f"Evaluate this answer according to ielts grading system: {answer}",
},
],
)
return res["choices"][0]["message"]["content"]
import streamlit as st
# Set the application title
st.title("GPT-3.5 IELTS Examiner")
# qt_col, q_col = st.columns(2)
# Selection box to select the question type
# with qt_col:
question_type = st.selectbox(
"What is the question type?",
(
"Listening",
"Reading",
"Writing Task 1",
"Writing Task 2",
"Speaking Part 1",
"Speaking Part 2"
),
)
# Provide the input area for question to be answered
# with q_col:
question = st.text_area("Enter the question:", height=100)
# Provide the input area for text to be summarized
answer = st.text_area("Enter the answer:", height=100)
# Initiate two columns for section to be side-by-side
# col1, col2 = st.columns(2)
# Slider to control the model hyperparameter
# with col1:
token = st.slider("Token", min_value=0.0, max_value=2000.0, value=1000.0, step=1.0)
temp = st.slider("Temperature", min_value=0.0, max_value=1.0, value=0.7, step=0.01)
top_p = st.slider("Top_p", min_value=0.0, max_value=1.0, value=0.9, step=0.01)
f_pen = st.slider("Frequency Penalty", min_value=-1.0, max_value=1.0, value=0.5, step=0.01)
# Showing the current parameter used for the model
# with col2:
with st.expander("Current Parameter"):
st.write("Current Token :", token)
st.write("Current Temperature :", temp)
st.write("Current Nucleus Sampling :", top_p)
st.write("Current Frequency Penalty :", f_pen)
# Creating button for execute the text summarization
if st.button("Grade"):
st.write(generate_summarizer(token, temp, top_p, f_pen, question_type, question, answer))

View File

@@ -1,99 +0,0 @@
import streamlit as st
import openai
import os
from dotenv import load_dotenv
load_dotenv()
openai.api_key = os.getenv("OPENAI_API_KEY")
def generate_summarizer(
max_tokens,
temperature,
top_p,
frequency_penalty,
question_type,
question,
answer
):
res = openai.ChatCompletion.create(
model="gpt-3.5-turbo",
max_tokens=int(max_tokens),
temperature=float(temperature),
top_p=float(top_p),
frequency_penalty=float(frequency_penalty),
messages=[
{
"role": "system",
"content": "You are a IELTS examiner.",
},
{
"role": "system",
"content": f"The question you have to grade is of type {question_type} and is the following: {question}",
},
{
"role": "system",
"content": "Please provide a JSON object response with the overall grade and breakdown grades, "
"formatted as follows: {'overall': 7.0, 'task_response': {'Task Achievement': 8.0, "
"'Coherence and Cohesion': 6.5, 'Lexical Resource': 7.5, 'Grammatical Range and Accuracy': "
"6.0}}",
},
{
"role": "system",
"content": "Don't give explanations for the grades, just provide the json with the grades.",
},
{
"role": "user",
"content": f"Evaluate this answer according to ielts grading system: {answer}",
},
],
)
return res["choices"][0]["message"]["content"]
# Set the application title
st.title("GPT-3.5 IELTS Examiner")
# qt_col, q_col = st.columns(2)
# Selection box to select the question type
# with qt_col:
question_type = st.selectbox(
"What is the question type?",
(
"Writing Task 2",
),
)
# Provide the input area for question to be answered
# with q_col:
question = st.text_area("Enter the question:", height=100)
# Provide the input area for text to be summarized
answer = st.text_area("Enter the answer:", height=100)
# Initiate two columns for section to be side-by-side
# col1, col2 = st.columns(2)
# Slider to control the model hyperparameter
# with col1:
token = st.slider("Token", min_value=0.0,
max_value=2000.0, value=1000.0, step=1.0)
temp = st.slider("Temperature", min_value=0.0,
max_value=1.0, value=0.7, step=0.01)
top_p = st.slider("Top_p", min_value=0.0, max_value=1.0, value=0.9, step=0.01)
f_pen = st.slider("Frequency Penalty", min_value=-1.0,
max_value=1.0, value=0.5, step=0.01)
# Showing the current parameter used for the model
# with col2:
with st.expander("Current Parameter"):
st.write("Current Token :", token)
st.write("Current Temperature :", temp)
st.write("Current Nucleus Sampling :", top_p)
st.write("Current Frequency Penalty :", f_pen)
# Creating button for execute the text summarization
if st.button("Grade"):
st.write(generate_summarizer(token, temp, top_p,
f_pen, question_type, question, answer))